Overview
Document release date | 15.05.2018. |
Created by | |
Reviewed by | |
Purpose | Present document describes
|
Service version | Descriptions are based on STRUVAL version 7.5.2 |
Structural Validation Service
The Structural Validation Service (STRUVAL) performs the structural validation of statistical data files based on a set of pre-defined validation rules, contained in a Data Structure Definition (DSD).
Structural validation performed by STRUVAL is the first step within a sequence of automated data validation activities conducted by Eurostat before statistical processing and dissemination of the collected data. The STRUVAL service returns a validation report to the data provider listing failures detected in the dataset for correction before resubmission.
The service verifies
- that the transmitted file is an accepted and processable format (SDMX-ML, SDMX-CSV);
- that the dataset contains the structures as defined in the DSD, including dataflow definition, code lists, concepts, key families and constraints;
- that the values contained within the dataset follow basic requirements defined in terms of completeness, data format, data consistency and constraints applied.
Messaging
The data provider may receive the following messages throughout the operation of the validation process via the Edamis service.
- Message in Edamis confirming that the data transmission is successful. The message is sent in all cases of data transmission.
- Message in Edamis informing the data provider of a service outage. If experienced, please contact your counterpart at Eurostat.
- Message in Edamis informing the data provider will deliver results outside of the routinely expected timeframe. This may occur in case of service overload due to high traffic. If repeatedly experienced, please contact your counterpart at Eurostat.
- Email message informing the data provider that the validation process concluded, and the validation report is now available via the Edamis user interface.
In case users have to be added to or removed from the list of receivers of any of these messages, please contact your counterpart in Eurostat.
Retrieval of the validation report
The data provider may access the validation report through the Edamis feedback channel. The validation report is never sent directly to data providers due to possible confidentiality constraints. The Edamis service may be, however, configured to send an email to inform users of the availability of the report (and any other message received).
- For organisations where the EDAMIS Web Application (EWA) has been installed, the feedback will by default be sent to the EWA, regardless of whether EWA or the EWP was used for the specific transmission. The validation report is available in EWA under the Send Datafile > Received Feedbacks menu.
It is possible for users to request for the feedback to be sent to the EWP in case the EWP was used to transmit the data. This change must be requested by contacting the EDAMIS support team at ESTAT-SUPPORT_EDAMIS@ec.europa.eu
2. For organisations that do not have an EWA installation, the feedback will be sent to the EDAMIS Web Portal (EWP). The validation report is available in the EWP under the Transmission > Received Feedback menu in Edamis.
Structure of the validation report
The validation report consists of a Header section and an Error Listing section. The Header contains validation process metadata and a general overview of the results of the validation. The Error Listing section contains the details of all unique errors detected.
Header
The Header contains:
Component | Description |
---|---|
Total number of errors found | Number of all error occurrences encountered during the validation process. Note: The validation services apply a cap of 2000 individual occurrences, the service concludes and produces a report once it reaches 2000+1 errors. This means that after the correction and resubmission of the dataset further errors may be identified. If the error cap is reached, this is indicated via a message in the Header. |
Errors per type | Total of all error occurrences broken down by error type. The breakdown is based on Error Code and Message ID. Error Codes refer to a wider classification of possible errors (e.g. technical issues, validation related issues), while Message IDs refer to a specific type of error (e.g. unexpected code, incorrect data format). The aggregation in this section groups errors with identical Message IDs together, meaning the subtotal represented by a line item may stem from a number of different root causes. The purpose of the grouping here is to provide general information on the nature of errors identified, not a detailed breakdown. Note: A single root cause may trigger multiple error types, and these will be listed separately (e.g. a code is unexpected and also violates a length constraint). In case the dataset does not contain errors, the Errors Per Type section has no entries. |
Dataset validated | Name of the input dataset |
SDMX Converted version | Version of the validation service engine used. |
Generated on | Date stamp. |
Structure information | DSD name and version. |
Example error report header, with errors detected:
Error Listing
The Error Listing contains the list of specific errors detected by the validation service, in the order of detection. In case of no errors detected, this section of the validation report remains empty.
The presentation of errors and error metadata follows the business rules below.
Grouping of errors
Errors are grouped under separate sub-headings per the unique root cause they are generated by. The errors are identified as unique according to the following logic:
If a difference on any level is detected between two errors, the root cause of the errors is considered separate and therefore they are listed separately.
Display of errors
The validation report contains all detected errors (unless the 2000 cap is reached, see above).
Error messages and the dimension list are only displayed for the first error occurrence. For any further occurrences only the series key is listed, with a corresponding header with the names of dimensions available.
In case of business need the number of error occurrences displayed in the report may be limited (e.g. 3 per group).
Filtering reports due to confidentiality constraints
Eurostat policy prohibits the inclusion of confidential data in validation reports distributed to external parties, including national statistical institutes. Statistical domains collecting and processing confidential data will receive a report where elements with the possibility of the presence of confidential data are filtered out.
The filtering extends to the following:
- Concept Types measure and attribute, as these may contain confidential information. Dimensions are never filtered out.
- Concept Value in the error header.
- Error Description. The description may contain dynamic elements where confidential data (e.g. an OBS_VALUE) is inserted. The description is completely removed from the report.
Component | Description |
---|---|
Error Code | High level error classification, first level of error typology. Serves diagnostic purposes. |
Message ID | Error type, second level of error typology. Serves diagnostic purposes. |
Concept Name | Error header. Name of concept affected by error. |
Concept Type | Error header. Type of concept affected by error. Value may be: Dimension/Attribute/Measure. |
Concept Value | Error header. Value of concept affected by error. Possibly confidential. |
Error in Concept | Name of concept affected by error. |
Number of Occurrences | Value refers to this unique error (total). |
Error Description | Error message, describing the error. Possibly confidential. |
Detail | Additional information on the error and its resolution. |
Series Key (column view) | Location of the error in the dataset, per dimension name and values of the series key. |
Series key, (horizontal view) | Location of the error in the dataset, per dimension values of the series key. The values and sequence of dimensions presented are identical to those of the column view. |
Example error, unfiltered:
Example error, Measure filtered for confidentiality. Concept Value and Error Description fields removed:
Complete sample reports
Sample Validation Report_No Errors.txt
Sample Validation Report_Filtered.txt
Sample Validation Report_Not Filtered.txt
Error types
The following list contains all possible error types detected and reported by the SRTUVAL service.
Error Code | Message ID | Description of Error | Details of Error |
---|---|---|---|
500 | Internal server error. Validation service not available. | The STRUVAL service is not able to process the inputs due to an internal server error. | |
140 | <Message from XML Parser> | The SDMX-ML file is not a well-formed XML file. It may contain invalid characters, tags that are not closed or are closed out of order. Well formedness of an XML file can be checked using different tools, such as the advanced text editors or online. | |
150 | 003 | The dataset contains a series with a missing concept {0} | The data file contains series with dimensions or attributes which are not defined in DSD. |
150 | 004 | The DSD {0} used does not define a time dimension, required for the time series data. | When building a time-series dataset, one must use a DSD that includes a time dimension. |
150 | 005 | The dataset includes primary measure {0}, not expected by the DSD. | When building a time-series dataset, one must use a DSD that has a primary measure. |
150 | 904-1 | Series key {0} is not defined in DSD (unexpected size). | Dataset contains series keys with unexpected size. |
150 | 904-2 | Series key {0} is not defined in DSD (incorrect codes). | Dataset contains series keys which unexpected size. |
150 | 007 | The dataset contains a concept {0} that is not defined in DSD. | All concepts used in a dataset must be defined in a DSD. |
150 | 008 | Attribute {0} defined as mandatory in DSD is missing from the dataset. | The dataset contains a mandatory series level attribute which is not present in the data file. |
150 | 009 | Series attribute {0} is not defined in DSD. | The encountered attribute at the series level in data file does not exist in the DSD. |
150 | 010 | Attribute {0} defined as mandatory in DSD is missing from the group. | The dataset contains a mandatory group level attribute which is not present in the data file. |
150 | 011 | Attribute {0} is assigned to the incorrect group. | The encountered attribute at the dataset level in data file does not exist in DSD. |
150 | 012 | Attribute {0} defined as mandatory in DSD is missing from the observation. | The dataset contains a mandatory observation level attribute which is not present in the data file. |
150 | 013 | Attribute {0} is not defined in DSD for observation. | The encountered observation attribute is not defined in the DSD. |
150 | 014 | Dataset group {0} is not defined in the DSD. | Dataset contains group keys with unexpected size. |
150 | 015 | Dataset group {0} is not defined in the DSD. | Data Structure Definition does not define a Group. |
150 | 016 | The mandatory concept {0} in DSD is currently missing from the group. | The dataset contains a group missing mandatory concept(s) as defined in the DSD. |
150 | 017 | Concept {0} is assigned to the incorrect group. | The encountered group in the dataset contains a concept which is not defined in the DSD. |
150 | 018 | XML error - The dataset contains an invalid node. | Appears when an unexpected node exists in the dataset file. |
150 | 021 | XML error - Unexpected text ''{0}'' found at node ''{1}'' | Unexpected text is found as children of one SDMX node which does not contain text. SDMX node names are kept in an internal structure and has the names such as Header, Series, OBS or Group. This error message appears when the dataset contains children of these elements. |
150 | 022 | XML error - Dataset header fails to reference a provision agreement, dataflow, or DSD. | Dataset header fails to reference a provision agreement, a dataflow, or a DSD. |
150 | 023 | XML error - Dataset does not contain a header. | Dataset does not contain a header. |
150 | 024 | XML error - Dataset structure reference incomplete. | The message appears if the referenced structure is incomplete, i.e. the agencyId, ID or maintainable ParentId are missing or empty. |
150 | 025 | XML error - Invalid DSD reference. | Dataset structure reference invalid, could not process reference, no RefNode or URN node found. |
150 | 026 | Attribute {0} is not defined in DSD. | An attribute at dataset level is present in data file but it is not defined in the DSD. |
150 | 027 | Expected component {0} must be an attribute but is {1}. | Another component appears as a dataset attribute in data file. |
150 | 028 | Attribute {0} incorrectly attached to {2} instead of to {1}. | The dataset has an attribute with different attachment level. |
150 | 029 | {0} ''{1}'' is reporting value ''{2}'' which is not a valid representation in referenced code list ''{3}''. | An attribute at dataset, series or observation level has a value which is not valid in the referenced code list. |
150 | 030 | {0} {1} is reporting invalid value {3} which is not of expected type {2}. | Appears when reported value of a concept is unexpected. |
150 | 031 | Component {0} in group {1} not defined in DSD {2}. | The dataset contains groups which contains components that are not defined as group components in the DSD. |
150 | 032 | Observation missing time dimension for time series data. | Observation missing the time dimension for time series data. |
150 | 033 | Observations not allowed for this dataset. | Appears if there is a constraint on the dataset which does not allow observations. |
150 | 034 | Observation time ''{0}'' is before the expected reporting period start date "{1}". | Appears if there is a constraint which specify report start date and the observation time is before this date. |
150 | 035 | Observation Time ''{0}'' is after the expected reporting period end date "{1}". | Appears if there is a constraint which specify report end date and the observation time is after this date. |
150 | 036 | Series not allowed in this dataset. | Appears if there is a constraint on the dataset which does not allow series. |
150 | 037 | Series key {0} not allowed. | Appears if the dimension is not allowed in the key due to an existing constraint. |
150 | 038 | Illegal Series key {0} contains invalid value "{1}" not defined in DSD for {2} {3}. | Appears when the series key contains some value which is disallowed by constraints in DSD. |
150 | 039 | Duplicate observation found: {0} | Appears when more than one observation is found in one series. |
150 | 040 | Data validation failed: {0} | It appears when a custom validation rule does not pass. |
150 | 041 | Cross-sectional component {0} is incorrectly attached to {2} instead of to {1}. | The cross-sectional component is attached to a wrong level. |
150 | 042 | Invalid date format "{0}". | Appears if the TIME_PERIOD attribute does not match the TIME_FORMAT. |
150 | 043 | Structure type wrongly references {1} instead of {0}. | If the dataset header contains a URN reference to another artefact than expected. |
100 | 044 | The dataset references dataflow “{0}” which could not be resolved. | The structure file supplied to the STRUVAL service call does not contain a dataflow (identified by agency, name, and version) that is referenced from the dataset. |
100 | 045 | The dataset references DSD “{0}” which could not be resolved. | The structure file supplied to the STRUVAL service call does not contain a DSD (identified by agency, name, and version) that is referenced from the dataset. |
501 | 046 | Component attribute {0} with parent {1} not supported. | The XML attribute is in the wrong element. |
501 | 047 | Cannot read dataset for structure of type: '{0}' | If the dataset has a structure reference which is neither DSD nor Dataflow. |
501 | 048 | The DSD {0} is missing a time dimension. | DSDs that STRUVAL can process must contain a time dimension. |
501 | 049 | Cannot validate the header of format {0}. | Appears when STRUVAL tries to validate a header but the given dataset file is not detected as one of the following formats: COMPACT_2_0, GENERIC_2_0, CROSS_SECTIONAL_2_0, UTILITY_2_0, GENERIC_2_1, GENERIC_2_1_XS,COMPACT_2_1 or COMPACT_2_1_XS. |
150 | 050 | Property not found {0} | Appears when the validation fails, because of missing input or structure file |
140 | 051 | Configuration Error {0} | Appears when Excel Data Reader was not configured correctly. |
140 | 052 | Excel data reader error {0} | Appears when Reading the excel file was not possible. |
140 | 053 | Invalid Parameters detected {0} | Appears when misconfiguration exists inside Parameter Sheet or Mapping Sheet. |
150 | 054 | Error While Processing XML: {0} | Appears when XML structure validation fails. |