From SDMX and Metadata Standards
|NEW! Census Hub newsletter, March 2012|
The Census European Hub is a conceptually new system whose aim is to achieve the dissemination of the 2011 Census data and housing censuses in the European Union Member States and EFTA countries.
The hub environment has been designed in order to offer an efficient solution for dissemination of census data and metadata, which are methodologically comparable between the Member States and structured in the same way. It is a pull mode based architecture for common data sharing, where a group of partners agree on providing access to their data according to standard processes, formats and technologies. For this purpose, SDMX standards will be used. Census data is not previously collected and stored in a central repository but it is directly accessed from the Member States’ databases through a central census hub upon request of a data collector.
The workflow is presented in a few steps below:
- A “data user” browses the Hub to define a dataset of interest via structural metadata. He browses the dimensions and selects a dataset. Then he chooses the organization of the output layout specifying which dimension will match X-axis and Y-axis and which dimension will vary item after item to generate new tables
- The Hub converts the user request into an SDMX Query and sends the SDMX Query to an interested NSI’s (National Statistical Institute) Web Service
- The NSI Web Service converts the SDMX Query in a set of SQL queries and sends them to the NSI data warehouse
- The NSI data warehouse sends the result to the NSI web service
- The NSI Web Service converts the result in a SDMX-ML Data message and sends it to the Hub
- The same steps are repeated if the user has requested data from different member states
- The Hub puts together all the SDMX-ML data messages proceeding from the interested NSIs and presents the result to the “data user” in the web browser in readable format
Modules and functionalities which have been implemented in the Census European Hub are presented in Figure 2. All of the modules are kept for the target environment implementation with addition of new tools and applications in order to facilitate the use of census hub and offer the possibility of reusing the different components.
- Query Management module intercepts the incoming requests from the different users and dispatches them to the different interfaces, offered by the Web Services. Thus, the generic architectural goal of loose coupling is respected.
- The Result Aggregation module: once the data is retuned from the different Web Services, the result needs to be aggregated in order to be rendered to the user. The Aggregator architectural pattern was used in order to respect the architectural goal that consists in reusing existing and proven assets.
- Multi Threading module: architectural mechanism that consists of avoiding fetching the data from the NSI Web Services sequentially but concurrently. This mechanism was put in place to optimize performance.
- Cache module: architectural mechanism that helps to increase the performance by fetching first in a local area. If the data are present in the cache, it will avoid a useless trip over the network in order to retrieve the statistical datasets. Thus, the performance is drastically increased. If the result data is presented in the cache, data is retrieved from the local storage and therefore there is no actual WS call to the concerned NSIs.
- Web / Application Server: WebLogic 10.3 or tomcat 5.5
- Offline download module: a user can run a query asynchronously. This architectural pattern is known as asynchronous request and aims at reusing existing assets defined in the generic architectural goals. This module uses the Census DB Architectural mechanism in order to store the data.
- Administration module: an Admin user (located in the Eurostat tier) can manage specific configuration of the Census System. These configurations can act on SMTP, Web Services’ Endpoints, and Proxy configuration. This module uses the Census DB Architectural mechanism in order to store the data.
- Multilanguage support allows a user to select a different language that will translate the labels displayed by the user interface. By default, the operating system language will be selected.
- Local NSI Web Service aims at fetching the statistical data stored in the Local NSI Database.
- Local NSI Database contains the statistical data for testing purposes.
- DLBB (Data Loader Building Block) is a middleware mechanism that was developed in order to import datasets into the local NSI Database.
- LAU Management module should allow the countries to manage and update the Local Administrative Unit (LAU) codes they will be using for Census 2011 regulation. An ESTAT user (The LAU Validator) must be able to validate the change done by the NSIs (LAU Managers) before they are published and used in Census Hub. This Module is independent from the Census Hub application.
- SMD application manages codelists, Data Structure Definitions (DSD) and concepts in the SMD database. The application also manages the principal marginal’s and hypercube categorization. Only SMD Manager has access to this application.
- SMD WebService pulls the codelist, dsd and concept data from SMD database and returns them as SDMX artefacts.
Figure 3 shows what the Census Hub environment looks like up to the target goal now established. After establishing the outlined architecture and finalising the development, the Census Hub environment is going to be published as Open Source software. If someone wants to establish a hub environment or reuse some of the components, this diagram can give an overview of what is needed.
Currently, all the major functionalities of Census Hub are implemented: user registration, data querying, data storing and data retrieving via multilingual graphical user interface. An architecture is already established which is able to handle the hypercube definitions, defined in the Census 2011 Regulation, and implemented in SDMX 2.0 cross-sectional format.
Supporting applications, such as a tool to generate sample dummy data packages, extract the DSDs and XSDs, have been developed. Currently, the work is concentrated on improving the GUI, on performance testing and on the Web Service module. Guidelines are provided to Member States who expressed interest in participating. 19 countries were on board in 2011. Future developments focus on performance improvements with the Member States, corrective maintenance of the tool managing Local Administrative Units and on the integration of metadata on quality.
- The data from the population and housing censuses conducted in the EU Member States in 2011 are structured upon agreed hypercubes (tables): http://circa.europa.eu/Public/irc/dsis/x-dis-xensus-hub/library?l=/census_documents_1/census_working_group/census_2010_june/censushub_20100601_3/_EN_1.0_&a=i
- The Census Hub application is based on SDMX standards and uses cross sectional SMDX-ML messages for data exchange (dsd, queries, responses). In order to reach and retrieve the census data, hosted in the Member States, Web Service technology is used. Currently Census Hub supports SOAP WS calls. You can find more information on SOAP implementation under http://circa.europa.eu/Public/irc/dsis/x-dis-xensus-hub/library?l=/census_documents_1/technical_guidelines/implementation_guideline/_EN_1.0_&a=i.
- There are security guidelines available on http://circa.europa.eu/Public/irc/dsis/x-dis-xensus-hub/library?l=/census_documents_1/technical_guidelines/censushub0187-wsg-001/_EN_1.0_&a=i which are under review, and the security methods described are applicable with the pilot implementation of the tool.
- The Census Hub project documentation:
- Census Hub IT WG 1 (2-3 June 2010) http://circa.europa.eu/Public/irc/dsis/x-dis-xensus-hub/library?l=/census_documents_1/census_working_group/census_2010_june&vm=detailed&sb=Title
- Census Hub IT WG 2 (14-15 February 2011) http://circa.europa.eu/Public/irc/dsis/x-dis-xensus-hub/library?l=/census_documents_1/census_working_group/census_february_2011&vm=detailed&sb=Title
- Starting package for each country http://circa.europa.eu/Public/irc/dsis/x-dis-xensus-hub/library?l=/census_documents_1/starting_toolkit/hypercube_6/
- For web services, you can find more information under:
- Web application for exposing browsing dissemination environment of an NSI http://circa.europa.eu/Public/irc/dsis/stne/library?l=/x-dis/tools/reference_architecture/application_disseminatio&vm=detailed&sb=Title
- .NET http://circa.europa.eu/Public/irc/dsis/stne/library?l=/x-dis/tools/reference_architecture/application_disseminatio/net&vm=detailed&sb=Title
- JAVA http://circa.europa.eu/Public/irc/dsis/stne/library?l=/x-dis/tools/reference_architecture/application_disseminatio/java&vm=detailed&sb=Title
- It will be really usefull for us if you can fill in our Census IT questionnaires below and send the forms back, in order to obtain some more information for your IT environment.