WPH Overview English (en) français (fr)

From ESSnet Big Data
Jump to: navigation, search


Earth observation (EO) creates an unprecedented advantage in Europe and the World for the development of operational applications of remote sensing providing an enormous dataset. Recently the EO has become increasingly technologically sophisticated. The market is full of the EO data from high to low resolution, gathered from unmanned aerial vehicles through aircrafts to satellites. Especially the launch of the Sentinels from the Copernicus Programme opened a new chapter in applicability of remote sensing data ensuring free, open access, continuous and systematic acquisition of the satellite images. One of the important economic and commercial applications of EO data is official statistical production and landscape mapping for variable thematic purposes. Nowadays there is the evident need to facilitate and improve the mandatory statistical registers. In the era of geospatialization of the information the use of EO data is reasonable and continuously promising, particularly in the perspective of upcoming Census 2021 and Agricultural Census as well as other commitments of European Commission or United Nations. The crucial goal of the project is the usage of the EO data from different sources that will contribute to build the geospatial framework to support the mentioned registers. Whitin this project the usefulness and practical usage of EO data in order to fill the gap between statistical and geographical information named as “geospatial breakdown” will be proposed. The objective of the project will be achieved through the elaboration of the several thematic fields like agriculture, build-up area, land cover and settlements, enumeration and forestry.

The main objectives of the whole WPH will be implemented by the execution of different case studies relying on own previous experience from the 2016-2018 ESSnet as well as external projects. The crucial expectation from the project is identification and analysis of EO data sources for multiple statistical themes product and development of an adequate reference methodological framework for processing data. As a physical deliverables the developer's and user's guidelines at European and national level will be prepared. Additionally the collaboration with the scientific community will be established in order to strengthen the substantive side of the project.

Description of work

The WPH is divided into thematic tasks:

  1. Agriculture;
  2. Built-up area (SDG/Sustainable Cities and Communities);
  3. Land cover;
  4. Settlements, Enumeration areas and Forestry.

The thematic tasks will be execute by nine case studies as described below.

Task 1 – Agriculture

Case study 1 - Crop recognition, mapping and monitoring

Performed by GUS (Statistics Poland)

The main purpose of the “Agriculture - Crop recognition, mapping and monitoring” case study is to use Sentinel 1 and Sentinel-2 satellites for agricultural crops mapping and area estimates in Northern Europe conditions. Specific crops recognition using long time series of Sentinel-1 images is an additional goal. Besides of crops classification and mapping of their acreage the information on the state of winter crops (overwintering) will be also investigated using Sentinel-2 images. Based on previous experience from ESSnet 2016-2018 the main crops identification is possible, but the accuracy sometimes can be questionable and should be improved. The aim of this case study is to verify and improve the plausibility of the crop recognition methodology. The detailed analysis and discussion about the accuracy of classifier will be taken. The different classifiers will be tested.

For the pilot study the Warmia and Mazury voivodship (Poland) will be selected as a region of interest.

The remote sensing and geodata will be used:

  • collection of time series (X 2017 – IX 2018) Sentinel-1A/B SAR data in Interferometric Wide swath mode;
  • collection of Sentinel-2A/B VNIR/SWIR images for the same period;
  • acquisition of cadastral parcels limits from ARMA agency (PL)
  • acquisition of agricultural plots borders from General Geographic Geodatabase (GUGIK)
  • acquisition of information on crops declared by farmers (ARMA)
  • collection of information on crop types and/or species from field visits (Central Statistical Office).

The gathered data will be prepared properly from the source format to ingested format. The collection of remote sensing dataset will be pre-processed for further analyses and classifications. The pre-processing consists in radiometric and geometric transformations of Sentinel-1A/B images leading to the elaboration of orthorectified sigma nought maps. In case of Sentinel-2 images the key pre-processing step is image mosaicking and cloud masking. The pre-processing will be done using open source software - SNAP (ESA). The geodata will be reformatting, database will be re-shaping and necessary information will be extracted. The geodata ingestion brings the parcels limits facilitating image segmentation, objects aggregation and validation.

In order to select the subsets of images the most suitable for object-based classification the time series analysis will be done. Merging temporal and polarimetric features of the crops should permit to extract the subsets returning maximum separability of crops and highly reduced data volume. This task is very important and critical for final results. It should give the answer which crops can be effectively recognized and mapped using Sentinel-1A/B very long time series.

In order to achieve the goals object based classification including machine learning algorithms will be performed. Key elements of this task are:

  • testing mean shift segmentation algorithm parameters for calculating homogeneous areas (segments) on Sentinel 1 and Sentinel 2 preprocessed data with open source CNES OrfeoToolbox software;
  • testing machine learning algorithms (support vector machine classifier, decision tree classifier, artificial neural network classifier, random forest classifier, KNN classifier) parameters in the context of obtaining the best accuracy for crop recognition with open source CNES OrfeoToolbox software.

At the end the results obtained in the classification processing will be validated. The validation include: extraction of validation samples from ARMA and insitu data, computing confusion matrix for each classifier, comparing results to the official statistics.

Case study 2 – Monitoring of the off-season vegetation cover

Performed by LUKE (Natural Resources Institute Finland)

The general objective of this case study is to monitor the off-season vegetation cover of agricultural soils in high-latitude agricultural systems. This gives important information on nutrients losses from fields to water bodies. The statistical product would be related to the UN's Sustainable Development Goals (SDGs), Indicator 2.4.1: Proportion of agricultural area under productive and sustainable agriculture. This method would provide grounds for establishing an indicator on sustainable agriculture as land management practices closely relate to sustainability.

Specifically, the aim is to establish a pilot case of machine learning based classifier by merging Earth observation (EO) data, the administrative data and agro-meteorological data. For predictive modelling, the C-Band SAR data from European Space Agency's (ESA) Sentinel-1 over an area of interest in Southern Finland will be utilised in fusion with the publicly available ancillary data on precipitation, temperature and soil properties. The existing administrative data from the national Integrated Administration and Control System (IACS) operated by Finnish Agency for Rural Affairs will stand as reference data. IACS provides open data on annual agricultural land use (Land Parcel Identification System, LPIS) and agricultural payment entitlements. Hereby, it is possible to infer classes of the quality of winter time soil cover on fields. The classifying predictive model can thus provide information e.g. on the amount of bare soil coverage in a timely fashion.

The essence of this case study is to evaluate how well IACS data can stand as a reference data to EO data. I.e. how good a model can be produced from the data available. Furthermore, it will be examined what kind of information can be derived from the predictive model for statistical production.

Case study 3 (BE) – Crop recognition with very high resolution aerial data

Performed by Statbel (Statistics Belgium)

The usage of the high-resolution satellite data (10 m for Sentinel-1/2) for agriculture and more specifically for crop recognition might not offer the best possible result, especially where the small sizes of parcels are tested. In this case study the use of aerial photography with a higher resolution gathered in winter 3-yearly (with 25 cm resolution), in summer yearly (with 40 cm resolution) and 10-yearly (with 10 cm for RBG and 25 cm for LIDAR) will be tested.

Task 2 – Built-up area

Case study 4 - Implementing SDG indicator 11.7.1

Performed by INSEE (Statistics France)

Within the French statistical institute, the Spatial Method Unit (DMRG in French) is in charge of both providing INSEE with the geospatial infrastructure that proves useful for the statistical process and to assess the opportunities to use new geospatial data or statistical methods that may improve this process. The SSPlab is in charge of fostering, promoting and supporting innovation throughout INSEE.

In accordance with various initiatives undertaken at the Global, European or national levels, both of these units aim at assessing the cons and the pros of using satellite imagery for the official statistical process while acquiring knowledge and skills for an efficient use of these data.

The SDG indicator 11.7.1 proves a very suitable case study to achieve these goals. (Indicator 11.7.1: Average share of the built-up area of cities that is open space for public use for all, by sex, age and persons with disabilities).

Indicator 11.7.1 has several interesting concepts that required global consultations and consensus. These include; built-up area, cities, open spaces for public use, etc. As a custodian agency, UN-Habitat has worked on these concepts along with several other partners. Finally, UN-habitat has just released, in July 2018, a methodology that might be helpful for the implementation of this indicator. This methodology mostly relies on satellite imagery.

Many European countries, including France, have already reported this indicator being very difficult to implement. For example, the French national council for statistical information (CNIS in France) has requested to delve deeper into the methodology of this indicator in France. The project that would be conducted in this case study consists of 3 tasks in relation with these methodological investigations:

  • Implementing the UN-Habitat methodology for the whole France;
  • Benchmarking this methodology with specifics data or concepts that are available in France or in Europe (French or European definition of cities, Sentinel 2, French Road maps layer);
  • Promoting the results at the French level (CNIS), European level (Eurostat, UN-GGIM Europe), Global level (IAEG-SDG).

Apart from DMRG and SSPlab, the project will be a joint work with the SDES (statistical office of the French Ministry for an Ecological and Solidary Transition) that is the French custodian agency for Indicator 11.7.1, experts in Satellite imagery from the French National Mapping Agency (IGN) or from the Center for the Study of the Biosphere from Space (CESBIO).

The project will be co-chaired by the head of DMRG and a representative of the SDES. Statisticians from both DMRG and SSPlab will be involved with the support of IGN and CESBIO. Presentations in international conferences or meetings are planned as well. UN-GGIM working group B or UN Habitat have already expressed their interest in the project.

Case study 5 – Urban sprawl across urban areas in Europe

Performed by CBS (Statistics Netherlands)

In 2015, a set of goals to end poverty, protect the planet and ensure prosperity for all as part of a new sustainable development agenda was adopted by the UN. Each goal has specific targets to be achieved over the next 15 years and required commitment from all parties governments, private sector, civil society but also the citizens. The use of Earth Observation datasets offers many opportunities to improve the monitoring of these SDGs for both reaching the SDG targets and reporting on progress. Earth observation will enable the tracking of global change at high resolution and in real time. The geospatial information provided by EO data will allow for implementation at local to national levels while still allowing for a monitoring and reporting based on the global indicator framework.

The aims of this case study will be to evaluate to what extent can national statistic offices benefit from Earth observation to monitor and report on the SGDs at local to national level.

This case study will be focused on the characterization of urban sprawl (SDG 11.7.1) across Urban areas in Europe. Urban sprawl was only recently officially acknowledge as an issue in Europe (EEA, 2016) and numerous attempts at characterizing urban sprawl have been made in recent years however a consensus remains to be reached. This study will attempt to characterize urban sprawl across urban areas at a pan European scale by means of data-driven machine learning methods. Further, the study will investigate the possibility of providing temporal continuity on the basis of multiple datasets provided by satellites such as MODIS and SENTINEL 2.

Case study 6 - Combination of administrative and Earth Observation data to determine the quality of housing

Performed by Destatis (Statistics Germany)

In this case study the added value of combining administrative data with Earth Observation (EO) data will be demonstrated. The goal is to present that a more comprehensive understanding can be developed through their combination rather than by analysing only a single data source. For the exploration of the quality of urban living, the combination of the following data sources could be useful: Extensive administrative data exist for urban areas in Germany, such as census data, which contains a lot of information about urban living. By complementing it with EO data, even more information about a neighbourhood can be deducted such as green areas nearby, the existence of yards and their characteristics, the amount of traffic and therefore noise, and the availability and occupancy of parking spaces. In this way the discrepancies of the quality of living can be determined between and within urban areas. The data holder, the Federal Agency for Cartography and Geodesy (BKG), will support the WPH by providing geodata for further analysis and calculations. To assure the reliability of the land cover classification, verification of the results is necessary and will be conducted by the data provider. Further information regarding housing quality can be provided by the provider (BKG) as permitted by time and budget constraints. Currently, DE participates in the project Makswell (Making Sustainable development and wellbeing frameworks work for policy analysis), which investigates how sustainable development indicators could be measured and improved through satellite data. Some of the sustainable development indicators that might be explored in the project are closely related to the quality of living, especially in urban areas, such as access to public transport, open spaces for public use and air pollution. The findings and methodology of the Makswell project could be a useful starting point for this case study.

Task 3 – Land Cover

Case study 7 - Comparing «in-situ» and «remote-sensing» collection mode for land cover data

Performed by SSP (Ministry of Agriculture, Agrifood Industry and Forestry, FR)

TERUTI is the French statistical area-frame survey on land cover and land use (LC/LU). Conducted since the end of 1960s, it covers all of the French metropolitan territory since 1982 and includes 3 French overseas departments since 2005. The sampling unit is a portion of territory (a point), generally a circular place of 3 meters diameter. The master sample base is a 250 meter squared grid of almost 8 millions points. The sample of points is drawn from a stratified sampling scheme.

Since 2017, administrative data (from Land parcel identification system for CAP payments) and geographical databases (i.e. BD FORET from National Geographic Institute – IGN) are used for the imputation of LC/LU information for most (70%) of the french territory. For the 30 % left, which covers natural or urban unoccupied areas, a sample of 70 000 points is drawn annually where LC/LU information are collected in-situ by a surveyor every year.

The SSP aims at using more remotely sensed data from satellite images in order to limit and concentrate the field observations and the travels of the surveyors. This could be achieved by improving the stratified sampling scheme of the TERUTI survey and by grouping geo-located points where LC/LU have a high probability to change. Remote sensing could be also used for the imputation of more points of the sample base.

The Center for the Study of the Biosphere from Space (CESBIO) is a mixed unit of research associating the University of Toulouse III (Paul Sabatier), the National Center of Scientific Research (CNRS), the National Center of Spatial Studies (CNES) and the Research Institute for Development (IRD), located in Toulouse, France which aims to develop knowledge on continental biosphere dynamics and functioning at various temporal and spatial scales.

Since 2015, the CESBIO has developed a methodology for the fully automatic production of land cover maps at country scale using high resolution optical image time series which is based on supervised classification and uses existing databases as reference data for training and validation. The produced maps (called OSO) cover the whole area of the French metropolitan territory with a kappa coefficient of 0.86 and 17 land cover classes. The maps available for 2016 and 2017 are produced from Sentinel-2 images and provided with a confidence map which gives information at the pixel level about the expected quality of the result.

For the WPH, the following activities should be conducted:

  • analysing the differences (frequency, reasons) between land cover information collected on a geo-located point in Teruti survey and LC information provided by the automatic classification of the same pixel in the OSO map 
  • identifying land cover types which could be automatically classified by remote sensing with an adjustment of the classification used into the OSO process 
  • identifying with a high confidence pixels which contains a high probability of land cover change in order to sent a Teruti's surveyor to collect and verify the LC change on the ground.

Case study 8 – Land cover maps at very detailed scale

Performed by ISTAT (Statistics Italy)

The aim of this case study is to realise a land cover map at various scale by four bands aerial and satellite (Sentinel and LANDSAT) images based on 1st LUCAS (Land Use/Cover Area frame Survey) level legend. Machine learning algorithms will be used (i.e a segmentation algorithm based on CNN and Unet in order to recognize built-up artificial areas).

In order to detect green areas and vegetations indeces, the machine learning algorithms will be used.

As far as woodland and cropland concerns, at a low resolution extent, the Sentinel 2 images will be processed by means of simple standard classification algorithms. Some of our algorithms and services (i.e WMTS Image Sentinel services) can be shared inside the WPH. The discussion how to use the aerial images which are extremely important to produce statistical land cover data will be done.

Task 4 - Settlements, Enumeration Areas and Forestry

Case study 9 - Update the INSPIRE Theme Statistical Units dataset and preventing forest fire

Performed by INE (Statistics Portugal)

The main goal within this case study is to explore the Copernicus data in order to:

  • update the INSPIRE Theme Statistical Units dataset, namely the Settlements and Enumeration Areas; the process will contribute to build the geospatial framework to support 2021 Census;
  • explore the possibility of studying the forest and the eucalyptus plantation and its impact in preventing forest fire.

An additional goal is to evaluate how Copernicus data, namely Sentinel images, can be used in the Statistical Geospatial Framework and how to integrate the statistical production process, bearing in mind the sustainability and free access to Copernicus data services. A thorough knowledge of the data sources is what may potentiate its future use and the accuracy and correctness of all its possible applications. A comprehensive study of the Sentinel images describing their limitations regarding the official statistical production will be proposed. The usage of these data together with current processes also needs to be studied, evaluated and documented in a clear and transparent manner.

For each of the case studies the following activities will be undertaken:

  • Statistical products description based on data sources and needs of statistical data users;
  • Preparation of proof of concepts taking into account the European level. They will demonstrate the feasibility and conditions under which results can be obtained;
  • Data access (ensuring continuity of data sources and statistical information for longer time period);
  • Definition of business processes and derived metadata (auditable steps including assurance of data security and confidentiality; ensuring data quality and its documentation);
  • Quality assessment of the data;
  • Development of methodology for production of statistics;
  • IT infrastructures definition for data processing;
  • Treatment of legal issues (related to data access, processing and output);
  • Pilot production of statistical data and assessment of quality (including multi-purpose and multi-source aspects); these experimental statistics will be published at the Eurostat website and at the national website.

All results of WPH will be presented in the final technical report, including the evaluation of big data sources and definition of possible statistical products from examined big data sources as well as associated with them the protection of privacy and confidentiality and other legal issues. Relations between case studies covering similar topics and indicators will be identified. The methodological framework of EO data processing together with the quality analysis in order to facilitate and improve the mandatory statistical registers will be investigated. Also, the WPH aims at analysing the inventory, requirements, definition, and specification of future big data integration.

Milestones and deliverables

See here for an overview of available milestones and deliverables.

WPH milestones

  HM1   Report on the WP meeting mid-2019   Month 9
  hM2   Report on the WP meeting mid-2020   Month 20

WPH deliverables

  H1   Interim technical report concerning the Reference Methodological Framework and the conditions for using the data, the methodology and the procedures to be used for producing statistics   Month 12
  H2   Description of the results of each of the activities listed in detail in description of the work for each case study   Month 21
  H3   Final technical report on the outcomes: documented description of statistical products, data sources & access rules, relevant software, description of methods applied, indications of key points in data processing chain and methodology. The results will include an assessment of sustainability over time.   Month 24