Skip to end of metadata
Go to start of metadata



Overview


Document release date

15.05.2018.

Created by

Andras NIKL

Reviewed by

Purpose

Present document describes

  1. the messaging defined for the ESS Structural Validation Services,
  2. structure of the validation report produced by the ESS Structural Validation Services, and
  3. the error types reported by the ESS Structural Validation Services.

Service version

Descriptions are based on STRUVAL version 7.5.2


Structural Validation Service

The Structural Validation Service (STRUVAL) performs the structural validation of statistical data files based on a set of pre-defined validation rules, contained in a Data Structure Definition (DSD).

Structural validation performed by STRUVAL is the first step within a sequence of automated data validation activities conducted by Eurostat before statistical processing and dissemination of the collected data. The STRUVAL service returns a validation report to the data provider listing failures detected in the dataset for correction before resubmission.

The service verifies

  • that the transmitted file is an accepted and processable format (SDMX-ML, SDMX-CSV);
  • that the dataset contains the structures as defined in the DSD, including dataflow definition, code lists, concepts, key families and constraints;
  • that the values contained within the dataset follow basic requirements defined in terms of completeness, data format, data consistency and constraints applied.

Messaging

The data provider may receive the following messages throughout the operation of the validation process via the Edamis service.

  1. Message in Edamis confirming that the data transmission is successful. The message is sent in all cases of data transmission.
  2. Message in Edamis informing the data provider of a service outage. If experienced, please contact your counterpart at Eurostat.
  3. Message in Edamis informing the data provider will deliver results outside of the routinely expected timeframe. This may occur in case of service overload due to high traffic. If repeatedly experienced, please contact your counterpart at Eurostat.
  4. Email message informing the data provider that the validation process concluded, and the validation report is now available via the Edamis user interface.

In case users have to be added to or removed from the list of receivers of any of these messages, please contact your counterpart in Eurostat.


Retrieval of the validation report

  1. The data provider may access the validation report through the Edamis feedback channel. The validation report is never sent directly to data providers due to possible confidentiality constraints. The Edamis service may be, however, configured to send an email to inform users of the availability of the report (and any other message received).

    1. For organisations where the EDAMIS Web Application (EWA) has been installed, the feedback will by default be sent to the EWA, regardless of whether EWA or the EWP was used for the specific transmission. The validation report is available in EWA under the Send Datafile > Received Feedbacks menu.


                               

It is possible for users to request for the feedback to be sent to the EWP in case the EWP was used to transmit the data. This change must be requested by contacting the EDAMIS support team at ESTAT-SUPPORT_EDAMIS@ec.europa.eu


2. For organisations that do not have an EWA installation, the feedback will be sent to the EDAMIS Web Portal (EWP). The validation report is available in the EWP under the Transmission > Received Feedback menu in Edamis.



                                                                                                                    

Structure of the validation report

The validation report consists of a Header section and an Error Listing section. The Header contains validation process metadata and a general overview of the results of the validation. The Error Listing section contains the details of all unique errors detected.


Header

The Header contains:


ComponentDescription
Total number of errors found

Number of all error occurrences encountered during the validation process.

Note: The validation services apply a cap of 2000 individual occurrences, the service concludes and produces a report once it reaches 2000+1 errors. This means that after the correction and resubmission of the dataset further errors may be identified. If the error cap is reached, this is indicated via a message in the Header.

Errors per type

Total of all error occurrences broken down by error type. The breakdown is based on Error Code and Message ID.

Error Codes refer to a wider classification of possible errors (e.g. technical issues, validation related issues), while Message IDs refer to a specific type of error (e.g. unexpected code, incorrect data format).

The aggregation in this section groups errors with identical Message IDs together, meaning the subtotal represented by a line item may stem from a number of different root causes. The purpose of the grouping here is to provide general information on the nature of errors identified, not a detailed breakdown.

Note: A single root cause may trigger multiple error types, and these will be listed separately (e.g. a code is unexpected and also violates a length constraint).

In case the dataset does not contain errors, the Errors Per Type section has no entries.

Dataset validatedName of the input dataset
SDMX Converted versionVersion of the validation service engine used.
Generated onDate stamp.
Structure informationDSD name and version.


Example error report header, with errors detected:




Example error report header, with no errors detected:


Error Listing

The Error Listing contains the list of specific errors detected by the validation service, in the order of detection. In case of no errors detected, this section of the validation report remains empty.

The presentation of errors and error metadata follows the business rules below.


Grouping of errors

Errors are grouped under separate sub-headings per the unique root cause they are generated by. The errors are identified as unique according to the following logic:




If a difference on any level is detected between two errors, the root cause of the errors is considered separate and therefore they are listed separately.


Display of errors

The validation report contains all detected errors (unless the 2000 cap is reached, see above).

Error messages and the dimension list are only displayed for the first error occurrence. For any further occurrences only the series key is listed, with a corresponding header with the names of dimensions available.

In case of business need the number of error occurrences displayed in the report may be limited (e.g. 3 per group).


Filtering reports due to confidentiality constraints

Eurostat policy prohibits the inclusion of confidential data in validation reports distributed to external parties, including national statistical institutes. Statistical domains collecting and processing confidential data will receive a report where elements with the possibility of the presence of confidential data are filtered out.

The filtering extends to the following:

  1. Concept Types measure and attribute, as these may contain confidential information. Dimensions are never filtered out.
  2. Concept Value in the error header.
  3. Error Description. The description may contain dynamic elements where confidential data (e.g. an OBS_VALUE) is inserted. The description is completely removed from the report.

Component

Description

Error Code

High level error classification, first level of error typology. Serves diagnostic purposes.

Message ID

Error type, second level of error typology. Serves diagnostic purposes.

Concept Name

Error header. Name of concept affected by error.

Concept Type

Error header. Type of concept affected by error. Value may be: Dimension/Attribute/Measure.

Concept Value

Error header. Value of concept affected by error. Possibly confidential.

Error in Concept

Name of concept affected by error.

Number of Occurrences

Value refers to this unique error (total).

Error Description

Error message, describing the error. Possibly confidential.

Detail

Additional information on the error and its resolution.

Series Key (column view)

Location of the error in the dataset, per dimension name and values of the series key.

Series key, (horizontal view)

Location of the error in the dataset, per dimension values of the series key. The values and sequence of dimensions presented are identical to those of the column view.


Example error, unfiltered:



Example error, Measure filtered for confidentiality. Concept Value and Error Description fields removed:


Complete sample reports


Sample Validation Report_No Errors.txt

Sample Validation Report_Filtered.txt

Sample Validation Report_Not Filtered.txt

Error types

The following list contains all possible error types detected and reported by the SRTUVAL service.

Error Code

Message ID

Description of Error

Details of Error

500


Internal server error. Validation service not available.

The STRUVAL service is not able to process the inputs due to an internal server error.

140


<Message from XML Parser>

The SDMX-ML file is not a well-formed XML file. It may contain invalid characters, tags that are not closed or are closed out of order. Well formedness of an XML file can be checked using different tools, such as the advanced text editors or online.

150

003

The dataset contains a series with a missing concept {0}

The data file contains series with dimensions or attributes which are not defined in DSD.

150

004

The DSD {0} used does not define a time dimension, required for the time series data.

When building a time-series dataset, one must use a DSD that includes a time dimension.

150

005

The dataset includes primary measure {0}, not expected by the DSD.

When building a time-series dataset, one must use a DSD that has a primary measure.

150

904-1

Series key {0} is not defined in DSD (unexpected size).

Dataset contains series keys with unexpected size. 

150

904-2

Series key {0} is not defined in DSD (incorrect codes).

Dataset contains series keys which unexpected size. 

150

007

The dataset contains a concept {0} that is not defined in DSD.

All concepts used in a dataset must be defined in a DSD.

150

008

Attribute {0} defined as mandatory in DSD is missing from the dataset.

The dataset contains a mandatory series level attribute which is not present in the data file.

150

009

Series attribute {0} is not defined in DSD.

The encountered attribute at the series level in data file does not exist in the DSD.

150

010

Attribute {0} defined as mandatory in DSD is missing from the group.

The dataset contains a mandatory group level attribute which is not present in the data file.

150

011

Attribute {0} is assigned to the incorrect group.

The encountered attribute at the dataset level in data file does not exist in DSD.

150

012

Attribute {0} defined as mandatory in DSD is missing from the observation.

The dataset contains a mandatory observation level attribute which is not present in the data file.

150

013

Attribute {0} is not defined in DSD for observation.

The encountered observation attribute is not defined in the DSD.

150

014

Dataset group {0} is not defined in the DSD.

Dataset contains group keys with unexpected size. 

150

015

Dataset group {0} is not defined in the DSD.

Data Structure Definition does not define a Group.

150

016

The mandatory concept {0} in DSD is currently missing from the group.

The dataset contains a group missing mandatory concept(s) as defined in the DSD.

150

017

Concept {0} is assigned to the incorrect group.

The encountered group in the dataset contains a concept which is not defined in the DSD.

150

018

XML error - The dataset contains an invalid node.

Appears when an unexpected node exists in the dataset file.

150

021

XML error - Unexpected text ''{0}'' found at node ''{1}''

Unexpected text is found as children of one SDMX node which does not contain text. SDMX node names are kept in an internal structure and has the names such as Header, Series, OBS or Group. This error message appears when the dataset contains children of these elements.

150

022

XML error - Dataset header fails to reference a provision agreement, dataflow, or DSD.

Dataset header fails to reference a provision agreement, a dataflow, or a DSD.

150

023

XML error - Dataset does not contain a header.

Dataset does not contain a header.

150

024

XML error - Dataset structure reference incomplete.

The message appears if the referenced structure is incomplete, i.e. the agencyId, ID or maintainable ParentId are missing or empty.

150

025

XML error - Invalid DSD reference.

Dataset structure reference invalid, could not process reference, no RefNode or URN node found.

150

026

Attribute {0} is not defined in DSD.

An attribute at dataset level is present in data file but it is not defined in the DSD.

150

027

Expected component {0} must be an attribute but is {1}.

Another component appears as a dataset attribute in data file.

150

028

Attribute {0} incorrectly attached to {2} instead of to {1}.

The dataset has an attribute with different attachment level.

150

029

{0} ''{1}'' is reporting value ''{2}'' which is not a valid representation in referenced code list ''{3}''.

An attribute at dataset, series or observation level has a value which is not valid in the referenced code list.

150

030

{0} {1} is reporting invalid value {3} which is not of expected type {2}.

Appears when reported value of a concept is unexpected.

150

031

Component {0} in group {1} not defined in DSD {2}.

The dataset contains groups which contains components that are not defined as group components in the DSD.

150

032

Observation missing time dimension for time series data.

Observation missing the time dimension for time series data.

150

033

Observations not allowed for this dataset.

Appears if there is a constraint on the dataset which does not allow observations.

150

034

Observation time ''{0}'' is before the expected reporting period start date "{1}".

Appears if there is a constraint which specify report start date and the observation time is before this date.

150

035

Observation Time ''{0}'' is after the expected reporting period end date "{1}".

Appears if there is a constraint which specify report end date and the observation time is after this date.

150

036

Series not allowed in this dataset.

Appears if there is a constraint on the dataset which does not allow series.

150

037

Series key {0} not allowed.

Appears if the dimension is not allowed in the key due to an existing constraint.

150

038

Illegal Series key {0} contains invalid value "{1}" not defined in DSD for {2} {3}.

Appears when the series key contains some value which is disallowed by constraints in DSD.

150

039

Duplicate observation found: {0}

Appears when more than one observation is found in one series.

150

040

Data validation failed: {0}

It appears when a custom validation rule does not pass.

150

041

Cross-sectional component {0} is incorrectly attached to {2} instead of to {1}.

The cross-sectional component is attached to a wrong level.

150

042

Invalid date format "{0}".

Appears if the TIME_PERIOD attribute does not match the TIME_FORMAT.

150

043

Structure type wrongly references {1} instead of {0}.

If the dataset header contains a URN reference to another artefact than expected.

100

044

The dataset references dataflow “{0}” which could not be resolved.

The structure file supplied to the STRUVAL service call does not contain a dataflow (identified by agency, name, and version) that is referenced from the dataset.

100

045

The dataset references DSD “{0}” which could not be resolved.

The structure file supplied to the STRUVAL service call does not contain a DSD (identified by agency, name, and version) that is referenced from the dataset.

501

046

Component attribute {0} with parent {1} not supported.

The XML attribute is in the wrong element.

501

047

Cannot read dataset for structure of type: '{0}'

If the dataset has a structure reference which is neither DSD nor Dataflow.

501

048

The DSD {0} is missing a time dimension.

DSDs that STRUVAL can process must contain a time dimension.

501

049

Cannot validate the header of format {0}.

Appears when STRUVAL tries to validate a header but the given dataset file is not detected as one of the following formats: COMPACT_2_0, GENERIC_2_0, CROSS_SECTIONAL_2_0, UTILITY_2_0, GENERIC_2_1, GENERIC_2_1_XS,COMPACT_2_1 or COMPACT_2_1_XS.

150

050

Property not found {0}

Appears when the validation fails, because of missing input or structure file

140

051

Configuration Error {0}

Appears when Excel Data Reader was not configured correctly.

140

052

Excel data reader error {0}

Appears when Reading the excel file was not possible.

140

053

Invalid Parameters detected {0}

Appears when misconfiguration exists inside Parameter Sheet or Mapping Sheet.

150

054

Error While Processing XML: {0}

Appears when XML structure validation fails.