Cookies

We use cookies to ensure that we give you the best experience on our website. By continuing to browse this repository, you give consent for essential cookies to be used. You can read more about our Privacy and Cookie Policy.


Durham e-Theses
You are in:

The Verification of Ecological Citizen Science Data

BAKER, EMILY (2024) The Verification of Ecological Citizen Science Data. Doctoral thesis, Durham University.

[img]
Preview
PDF - Accepted Version
4Mb

Abstract

In the current climate and nature crisis, biodiversity and ecosystems are experiencing irreversible losses. In response to these threats, increasingly ambitious targets are being set to conserve and protect nature by 2030. Widespread and up-to-date data are required to understand the extent of losses and to ensure we are on track to meet these targets for nature. Citizen science datasets, based on records of species observations made by volunteers, are the primary sources of data at the geographical scale required to analyse large-scale trends in species abundances and distributions. Due to the unstructured nature of data collection by many individuals, there are concerns around data quality, inaccuracy, and bias in these datasets. Verification is an essential process for ensuring data quality but, as the volume of data being collected by citizen scientists grows, bottlenecks in data processing can arise. This thesis details my research into the verification of ecological citizen science data.

I start by reviewing current approaches to verification within ecological citizen science schemes whose data features in scientific literature. The results from this review identify three distinct approaches to verification: expert, community consensus and automation. This research highlights that expert verification has been the default approach for many schemes and proposes that alternative approaches should be considered more widely to deal with growing data volumes. Alongside identifying verification approaches, this review identifies the information that is used to inform the verification of citizen science data. This information typically comprises one or more of three types of data: attributes of the species, the environmental context, and attributes of the observer. I then outline an idealised system for verification, recommending that all information should be considered when verifying citizen science observations and identifying the meta-data that can be used in the verification process.

Informed by the results from the review of citizen science approaches, Chapters 3 and 4 outline my research into alternative frameworks for verification that use Bayesian Classification models that account for contextual information. In the first instance, I include the attributes of the species and the environmental context to verify citizen science records by using past data to quantify identification mistakes made by citizen scientists and considering when and where a species is more likely to be observed. I apply this approach to two contrasting citizen science schemes: MammalWeb, a scheme that uses community consensus verification to classify camera trap images; and iRecord, a scheme that uses expert verification for ad-hoc opportunistic species observations collected by field-based citizen scientists. The results show that for MammalWeb, including contextual information improved the accuracy of verification; for iRecord, including attributes of the species improved verification, but including contextual information provided little advantage. The framework outlined in Chapter 3 assumes all observers have the same expertise; therefore, in Chapter 4 I expand on this framework by exploring how observer variability can be integrated into approaches to verification. The results show that including observer traits makes minimal difference to the accuracy of verification, owing to low contributions by most individual observers, making it difficult to quantify observer variability. The results from Chapters 3 and 4 also highlight that citizen science identifications pre-verification are generally very accurate (90% or higher), bringing into question the need for developing highly accurate and intensive verification processes.

Given the human and technical effort that is channelled into verification, Chapter 5 of this thesis presents my research into the extent to which accurate verification matters in a conservation policy and management context. I simulate inaccuracies in a citizen science dataset of UK butterflies, to explore how data accuracy might impact estimates of the coverage provided by protected areas, and the consequences of these estimates for decisions that could be made using this analysis. The results show that, for more ubiquitous species, errors can be tolerated; however, for species with restricted ranges, inaccurate datasets tended to over-estimate the area of occupancy and therefore over- or under-estimate protected area coverage, depending on whether coverage was actually low or high, respectively. The results presented here indicate that, for some species, highly accurate verification may not be necessary and, moving forward, citizen science schemes should consider whether there is really a need to verify every record.

As data volumes grow, addressing bottlenecks to ensure that data are up-to-date and available for analysis increasingly requires more efficient approaches to verification. The results from this thesis explore how verification can evolve to meet the current needs of those who run and manage citizen science schemes, as well as end users of the data, without compromising the decisions that are made using citizen science data. By addressing issues within this foundational process on which citizen science data is reliant, this thesis aims to emphasise the valuable role that citizen science plays in addressing the biodiversity crisis and further strengthen its place within ecological research.

Item Type:Thesis (Doctoral)
Award:Doctor of Philosophy
Faculty and Department:Faculty of Science > Biological and Biomedical Sciences, School of
Thesis Date:2024
Copyright:Copyright of this thesis is held by the author
Deposited On:04 Jun 2024 12:36

Social bookmarking: del.icio.usConnoteaBibSonomyCiteULikeFacebookTwitter