D4.3 Report on the criteria for data quality and species characteristics for estimating species status and trends
Key findings from these case studies demonstrate that extensive quality control, data filtering, and validation are essential to producing robust results from unstructured data. Specific technical barriers identified include coordinate uncertainty, where large uncertainty radii can distort spatial signals; taxonomic inconsistencies, where unlinked accepted names artificially inflate species richness; and publication delays, which can create misleading temporal trends. Furthermore, the analysis reveals that data cubes are often dominated by a small number of highly influential component datasets, making indicators sensitive to the presence or absence of specific sources.
To address these biases, the report implements diagnostic frameworks to quantify survey effort and completeness. It introduces a survey-effort score, capturing record volume, temporal replication, and taxonomic coverage, and utilizes probabilistic estimators to assess survey completeness. Additionally, the report implementation examines species detectability, implementing survey-based detection probability metrics to distinguish between genuine ecological signals and reporting biases driven by technology or observer behaviour.
Finally, the report operationalizes these assessments through specialized software tools developed within the B3 project, specifically the gcube R package for simulating occurrence cubes and the dubicube R package for quality checks and quantifying indicator uncertainty. These insights are synthesized into a set of operational guidelines for reliable indicator and trend calculations, ensuring that biodiversity reporting based on aggregated occurrence data is transparent, reproducible, and robust.
Details
| Number of pages | 86 |
|---|---|
| Type | Report not published by INBO |
| Category | Research |
| Language | English |
Bibtex
@misc{0f4a6bc4-d373-4858-a497-27a03da11170,
title = "D4.3 Report on the criteria for data quality and species characteristics for estimating species status and trends",
abstract = "This deliverable report identifies and substantiates criteria for determining the reliability of species status and trend estimates derived from aggregated data cubes from the Global Biodiversity Information Facility (GBIF). To achieve this, the report adopts a comparative approach, contrasting unstructured GBIF cube data with structured monitoring data from bird surveys in Flanders (Belgium) and the Western Cape (South Africa).
Key findings from these case studies demonstrate that extensive quality control, data filtering, and validation are essential to producing robust results from unstructured data. Specific technical barriers identified include coordinate uncertainty, where large uncertainty radii can distort spatial signals; taxonomic inconsistencies, where unlinked accepted names artificially inflate species richness; and publication delays, which can create misleading temporal trends. Furthermore, the analysis reveals that data cubes are often dominated by a small number of highly influential component datasets, making indicators sensitive to the presence or absence of specific sources.
To address these biases, the report implements diagnostic frameworks to quantify survey effort and completeness. It introduces a survey-effort score, capturing record volume, temporal replication, and taxonomic coverage, and utilizes probabilistic estimators to assess survey completeness. Additionally, the report implementation examines species detectability, implementing survey-based detection probability metrics to distinguish between genuine ecological signals and reporting biases driven by technology or observer behaviour.
Finally, the report operationalizes these assessments through specialized software tools developed within the B3 project, specifically the gcube R package for simulating occurrence cubes and the dubicube R package for quality checks and quantifying indicator uncertainty. These insights are synthesized into a set of operational guidelines for reliable indicator and trend calculations, ensuring that biodiversity reporting based on aggregated occurrence data is transparent, reproducible, and robust.",
author = "Ward Langeraert and Katelyn Faulkner and Emma Cartuyvels and Quentin Groom and Toon Van Daele",
year = "2026",
month = feb,
day = "27",
doi = "",
language = "English",
publisher = "Instituut voor Natuur- en Bosonderzoek",
address = "Belgium,
type = "Other"
}