Ideas for developing a quality framework for job vacancy statistics English (en) français (fr)

From ESSnet Big Data
Revision as of 13:38, 25 January 2017 by Nswierni (Talk | contribs)

Jump to: navigation, search
Error Type Relevance to using on-line job advertisements for statistical purposes

Phase 1 errors

Phase 1 applies to a single dataset in isolation. For a complex statistical output from many different datasets, carry out the phase 1 evaluation separately for each source dataset. The framework can be used for both administrative and survey data.

Measurement (variables) terms

The measurement side describes the path from an abstract target concept to a final edited value for a concretely defined variable.

Validity error

Measurement begins with the target concept, or ‘the ideal information that is sought about an object’. To obtain this information, we must define a variable or measure that can be observed in practice. Validity error indicates misalignment between the ideal target information and the operational target measure used to collect it.

Measurement error

Once the target measure is defined, we collect actual data values. The values for specific units are the obtained measures.

Processing error

The edited measure is the final value recorded in the administrative or survey dataset, after any processing, validation, or other checks. These checks might correct errors in the values originally obtained, but can introduce additional errors.

Representation (objects)

Thhis deals with defining and creating ‘objects’ – the basic elements of the population being measured.

Frame error

The target set is similar to the target concept – it is the set of all objects the data producer would ideally have data on. An important distinction between the usual statistical concept of ‘units’ and ‘objects’ in this context is that in some administrative datasets the base units could be records of individual events (eg transactions with customers).

Selection error

Many collections have objects in the accessible set that don’t end up in the data. For instance, our accessible set could be all people eligible to vote, but the accessed set, the set we actually obtain information about, includes only people who actually registered on the electoral roll. The missing, unregistered people are a source of selection error.

Missing/redundancy error

The observed set comprises objects in the final, verified dataset. Most checks on administrative data are likely to remove objects that shouldn’t have been in the selected set to begin with (eg someone trying to enrol to vote who is under 18); these types of errors are selection errors. The incidence of errors where the agency mistakenly rejects or duplicates objects due to their own processing is fairly rare, but this category of error exists so we keep such errors distinct from reporting-type errors.