Earth observation - Quality Guidelines for Big Data English (en) français (fr)

From ESSnet Big Data
Revision as of 11:09, 6 May 2019 by Nkowaral (Talk | contribs)

Jump to: navigation, search

Data class Earth observation

  1. Short description of this class of big data, the source(s)  and the structure of the raw data

The class “Earth observation” is related to processing satellite images. Data sources includes both radar (Sentinel-1) and optical (Sentinel-2) data. Data can be acquired from the European Copernicus program (https://sentinel.esa.int). Two examples of raw data are presented on figures 1 and 2, respectively radar and optical data.

 

Figure 1. Sentinel-1 radar data after segmentation

[[File:|322x238px|https://screenshotscdn.firefoxusercontent.com/images/a12900a9-f19b-4d40-9fc2-df19cf0b2b52.png]]

Source: https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata/images/0/04/WP7_Deliverable_7_7_2018_05_31.pdf, as of 14th April 2019

 

Figure 2. Sentinel-2 optical data

[[File:|323x181px|https://screenshotscdn.firefoxusercontent.com/images/2b6a4a35-9984-4243-8a88-20e0815c81cf.png]]

Source: https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata/images/0/04/WP7_Deliverable_7_7_2018_05_31.pdf, as of 14th April 2019

 

Earth observation class is an example of the use of machine learning algorithms to provide reliable data on different objects identified from the images.

  1. Short description of the role of the big data class in the ESSnet(s), including links to deliverables (if already existing)

The role of the Big Data class “Earth observation” in ESSNet is related to various use cases conducted by NSI’s to deliver the data directly from the Internet. It includes especially crop types identification, which supplement the current agricultural surveys[1]. Therefore, according to the classification, the statistical domain is Agriculture and Fisheries and the surveys supplemented is Annual Crop Statistics.

  1. Basic description which processes are necessary to transform the raw data into statistical data

The process of crop identification relies on several processes, started with integration of administrative data, field surveys and satellite images. It includes the following steps[2]:

  • insitu sample collection: ground truth as training input for machine learning and accuracy assessment, administrative  data collection,
  • administrative data collection: support data used for insitu plots selection and raster data segmentation,
  • remote sensing data processing: Sentinel 1 and Sentinel 2 data processing, creating time series SAR and optical raster orthomosaics,
  • data fusion: fusion of raster and vector datasets,
  • image segmentation: extracting segments sharing similar spectral characteristics, input for object based image classification,
  • object based image classification: learning classifier based on insitu learn sample, SAR and optical image classification based on machine learning algorithms,
  • accuracy assessment: computing confusion matrix based on insitu control sample.


According to GSBPM framework, phase Collect, Process and Analyse are used to deliver the output data from web scraping.

  1. Quality guidelines relevant for this big data class

4.1.Accuracy

Accuracy is the major measure because of the specification on algorithms used that are using machine learning algorithms. For example, accuracy was measured by the number of fields with crop types identified correctly and varies from 75% to 85% depending on the crop type and machine learning algorithm used (KNN and SVM are the most accurate).

4.2.Coverage

The coverage is the territory of all European countries. However, because of cloudy weather, there may be some missing data in most cloudy months (e.g., February). Land can also be covered by snow what make it impossible to make any analysis during snowy days in wintertime.

4.3.Comparability over time

For Earth observation is not an issue as the data source is stable and will be available in long term.

4.4.Process Errors / data source specific errors

Because of the use of machine learning algorithms, it is very important to have a stable and reliable fundamental sources to compare. Four different indicators have been developed to evaluate process errors:

  • the training fields classification error matrix,
  • the calculations accuracy,
  • comparisons with administrative data,
  • comparisons with statistical data.