From SDMX and Metadata Standards
SDMX is a standard designed to describe statistical data and normalise their exchange. Its use is a business choice (as opposed to technical or IT one) aiming at reduction of development, maintenance and operation costs for an organisation through the following:
- Logical unification of data stored inside and across national and international organisations through defining the common logical data model, harmonization of the statistical metadata (e.g. code lists) and use of prescribed objects (e.g., schemes, data structure definitions).
- Application of common logical model and related standards effects in reduction of diversity among statistical data production processes and related business process.
- Sharing of standard, generic software and IT infrastructures allowing automatic production, processing and exchange of data and metadata files among statistical organisations.
- Use of standard software and standard data model allows machine to machine communication what in turn minimizes manual interventions and human errors.
- Discovery and unification of distributed data shaped according to standard logical model.
Structure of data and SDMX objects
Data represent concrete observations of particular statistical phenomenon at a certain moment of time. Data set is a collection of related observations, organized according to a predefined structure which the SDMX objects presented in this section describe.
The structure of a data set is determined by statistical concepts describing the meaning of the observations contained in the data set and their roles in the structure.
|Data Structure Definition (DSD) is metadata describing the structure and organization of a data set,
the statistical concepts and attached to them code lists used within the data set.
The concepts called “dimensions” determine the data set's “physical” structure. The code lists are linked to the dimensions listing the possible values the concepts can take. Other concepts do not affect the data set structure itself, but give additional information about the concepts used and they are called “attributes”. Attributes can be coded or not coded. The actual reported value ("measure" in SDMX language) is also considered a concept.
|Concept Scheme is a list of statistical concepts that are used in a Data or Metadata Structure Definition.|
In the cases of actual transmission of data between organisational entities, the data are retrieved from their storage environment at the Data Provider, put in data files with predefined format (by the DSD) and sent to the receiver. The transmission can be accompanied on both ends of the transmission channel by operations like validation, encryption/decryption.
To parameterise data exchange the SDMX Information Model defines several objects (incl. DSD and Concept Scheme) and their features allowing their identification, maintenance and versioning:
|The Provision Agreement describes the way in which data are provided by Data Provider.|
|(Hierarchical) Code list is a (hierarchical) inventory of codes used in a DSD listing values to be used in the representation of dimensions or attributes.|
|Other objects define the organisations involved in the data exchange, including maintainers of the metadata defining the objects (e.g. Organization Scheme).|
To allow for general access and ease of use the objects are stored in a central on-line repository - the SDMX Registry.
All the SDMX objects are defined in the SDMX Technical Specifications (www.sdmx.org).
Simplified example of a data table showing relation with SDMX objects
The rudimental example describes the first two steps of the compliance procedure as described in the previous section.
Let’s assume a statistical data table entitled: Monthly amounts (tons) and value (M€) of traded goods among EU MSs, arrivals and dispatches, in 2008.
Then Data Structure Definition (DSD) contains the detail description of concepts and their roles:
The concepts that are dimensions:
- Reporting MS (27 countries in its code list – EU MSs)
- Partner MS (27 countries in its code list – EU MSs)
- Flow (with 2 elements in its code list – arrivals & dispatches)
- eference Month (12 months in its code list – January till December 2008)
The concept that are measures:
These two measures could be also grouped in one dimension containing these two concepts (measures) as elements in its code list)
The concepts that are attributes are:
- Unit of Measure, attached to amounts (tons)
- Currency, attached to value (M€)
The DSD also contains:
- the code lists used in the table
- the Concept Scheme, which is the list of all concepts used in the DSD
There could be other SDMX objects, e.g.:
- If Eurostat provides this table to OECD than Eurostat is the Data Provider. The Provision Agreement might state that the table should be provided before the end of January 2009.
- Organisation scheme making Eurostat the maintainer of the objects (related to the data table): concepts, code lists, Provision Agreements, Concept Schemes, etc.
Compliance & implementation
To achieve SDMX compliance the responsible authority appreciating the expected benefits from using SDMX decides to implement it. SDMX Technical Specifications define how to prepare and use the SDMX objects related to particular data exchange processes in practice. Generally the following four steps need to be done:
1. analyse the data and their exchange processes to have the needed knowledge to
2. develop all the necessary objects according to the rules in the SDMX Technical Specifications. Then
3. test the data and their exchange process using the developed objects. Finally
4. prepare methodological and implementation documentation (accompanying SDMX objects) for stakeholders, e.g., data providers and those who maintain the objects.
The steps produce specifications allowing and assuring compliance. If the developments are applied in practice so the data and metadata exchanges within the domain are carried out according to SDMX compliant specifications this constitutes an SDMX implementation.
Note that SDMX and its implementation also apply to reference metadata.
For the exchange of the data sets (often referred to as tables, files or cubes), two file formats are recognized by SDMX:
- SDMX-ML (preferred) based on the XML to describe and structure the data inside a file
- SDMX-EDI (also called GESMES/TS). Due to inherent constraints, this format is not always able to respond to all requirements
For exchange of statistical microdata, geographical data and updates to registers and for data exchange via web services SDMX does not provide specific formats. The work to set up such standards is ongoing.