scholarly journals Modelling the Heterogeneity within Citizen Science Data for Biodiversity Research

Author(s):  
Diana Bowler ◽  
Nick Isaac ◽  
Aletta Bonn

Large amounts of species occurrence data are compiled by platforms such as the Global Biodiversity Information Facility (GBIF) but these data are collected by a diversity of methods and people. Statistical tools, such as occupancy-detection models, have been developed and tested as a way to analyze these heterogeneous data and extract information on species’ population trends. However, these models make many assumptions that might not always be met. More detailed metadata associated with occurrence records would help better describe the observation/detection submodel within occupancy models and improve the accuracy/precision of species’ trend estimates. Here, we present examples of occupancy-detection models applied to citizen science datasets, including dragonfly data in Germany, and typical approaches to account for variation in sampling effort and species detectability, including visit covariates, such as list length. Using results from a recent questionnaire in Germany asking citizen scientists about why and how they collect species occurrence data, we also characterize the different approaches that citizen scientists take to sample and report species observations. We use our findings to highlight examples of key metadata that are often missing (e.g., length of time spent searching, complete checklist or not) in data sharing platforms but would greatly aid modelling attempts of heterogeneous species occurrence data.

Author(s):  
Scott A Chamberlain ◽  
Carl Boettiger

Background. The number of individuals of each species in a given location forms the basis for many sub-fields of ecology and evolution. Data on individuals, including which species, and where they're found can be used for a large number of research questions. Global Biodiversity Information Facility (hereafter, GBIF) is the largest of these. Programmatic clients for GBIF would make research dealing with GBIF data much easier and more reproducible. Methods. We have developed clients to access GBIF data for each of the R, Python, and Ruby programming languages: rgbif, pygbif, gbifrb. Results. For all clients we describe their design and utility, and demonstrate some use cases. Discussion. Programmatic access to GBIF will facilitate more open and reproducible science - the three GBIF clients described herein are a significant contribution towards this goal.


2019 ◽  
Author(s):  
Michael J.O. Pocock ◽  
Mark W. Logie ◽  
Nick J.B. Isaac ◽  
Charlotte L. Outhwaite ◽  
Tom August

AbstractSpecies records from volunteers are a vast and valuable source of information on biodiversity for a wide range of taxonomic groups. Although these citizen science data are opportunistic and unstructured, occupancy analysis can be used to quantify trends in distribution. However, occupancy analysis of unstructured data can be resource-intensive and requires substantial expertise. It is valuable to have simple ‘rules of thumb’ to efficiently assess the suitability of a dataset for occupancy analysis prior to analysis.Our analysis was possible due to the production of trends, from our Bayesian occupancy analysis, for 10 967 species from 34 multi-species recording schemes in Great Britain. These schemes had an average of 500 visits to sites per year, and an average of 20% of visited sites received a revisit in a year. Occupancy trend outputs varied in their precision and we used expert elicitation on a subset of outputs to determine a precision threshold above which trends were suitable for further consideration. We then used classification trees with seven metrics to define simple rules explaining when the data would result in outputs that met the precision threshold.We found that the suitability of a species’ data was best described by (i) the number of records of the focal species in the 10% best-recorded years, and (ii) the proportion of recording visits for that taxonomic group with non-detections of the focal species. Surprisingly few data were required to be predicted to meet the precision threshold. Specifically, for 98% confidence that our Bayesian occupancy models would produce outputs meeting the precision threshold, there needed to be ≥29 records of the focal species in the 10% best-recorded years (equivalent to an average of 12.5 records per year in our dataset), although only ≥10 records (equivalent to 4.5 records per year) were required for species recorded in less than 1 in 25 visits.We applied these rules to regional species data for Great Britain. Data from 32% of the species:region combinations met the precision threshold with 80% confidence, and 14% with 98% confidence. There was great variation between taxonomic groups (e.g. butterflies, moths and dragonflies were well recorded) and region (e.g. south-east England was best recorded).These simple criteria provide no indication of the accuracy or representativeness of the trend outputs: this is vital, but needs to be assessed individually. However our criteria do provide a rapid, quantitative assessment of the predicted suitability of existing data for occupancy analysis and could be used to inform the design and implementation of multi-species citizen science recording projects elsewhere in the world.


2020 ◽  
Author(s):  
D.E Bowler ◽  
D. Eichenberg ◽  
K.J. Conze ◽  
F. Suhling ◽  
K. Baumann ◽  
...  

AbstractRecent studies suggest insect declines in parts of Europe; however, the generality of these trends across different taxa and regions remains unclear. Standardized data are not available to assess large-scale, long-term changes for most insect groups but opportunistic citizen science data is widespread for some taxa. We compiled over 1 million occurrence records of Odonata (dragonflies and damselflies) from different regional databases across Germany. We used occupancy-detection models to estimate annual distributional changes between 1980 and 2016 for each species. We related species attributes to changes in the species’ distributions and inferred possible drivers of change. Species showing increases were generally warm-adapted species and/or running water species while species showing decreases were cold-adapted species using standing water habitats such as bogs. We developed a novel approach using time-series clustering to identify groups of species with similar patterns of temporal change. Using this method, we defined five typical patterns of change for Odonata – each associated with a specific combination of species attributes. Overall, trends in Odonata provide mixed news – improved water quality, coupled with positive impacts of climate change, could explain the positive trend status of many species. At the same time, declining species point to conservation challenges associated with habitat loss and degradation. Our study demonstrates the great value of citizen science data for assessing large-scale distributional change and conservation decision-making.


Author(s):  
Gerald Guala

Biodiversity Information Serving Our Nation (BISON - bison.usgs.gov) is the US Node application for the Global Biodiversity Information Facility (GBIF) and the most comprehensive source of species occurrence data for the United States of America. It currently contains more than 460 million records and provides significant augmentation and integration of US occurrence data in terrestrial, marine and freshwater systems. Publicly released in 2013, BISON has generated a large community of stakeholders and they have passed on a lot of questions over the years through email ([email protected]), presentations and other means. In this presentation, some of the most common questions will be addressed in detail. For example: why all BISON data isn't in GBIF; how is BISON different from GBIF; what is the relationship between BISON and other US providers to GBIF; and what is the exact role of the Integrated Taxonomic Information System (ITIS - www.itis.gov) in BISON.


Author(s):  
Jacob Heilmann-Clausen ◽  
Tobias Frøslev ◽  
Jens Petersen ◽  
Thomas Læssøe ◽  
Thomas Jeppesen

The Danish Fungal Atlas is a citizen science project launched in 2009 in collaboration among the University of Copenhagen, Mycokey and the Danish Mycological Society. The associated database now holds almost 1 million fungal records, contributed by more than 3000 recorders. The records represent more than 8000 fungal species, of which several hundred have been recorded as new to Denmark during the project. In addition several species have been described as new to science. Data are syncronized with the Global Biodiversity Information Facility (GBIF) on a weekly basis, and is hence freely available for research and nature conservation. Data have been used for systematic conservation planning in Denmark, and several research papers have used data to explore subjects such as host selection in wood-inhabiting fungi (Heilmann‐Clausen et al. 2016), recording bias in citizen science (Geldmann et al. 2016), fungal traits (Krah et al. 2019), biodiversity patterns (e.g. Andrew et al. 2018), and species discovery (Heilmann-Clausen et al. 2019). The project database is designed to faciliate direct interactions and communication among volunteers. The validation of submitted records is interactive and combines species-specific smart filters, user credibility, and expert tools to secure the highest possible data credibility. In 2019, an AI (artificial intelligence) trained species identification tool was launched along with a new mobile app, enabling users to identify and record species directly in the field (Sulc et al. 2020). At the same time, DNA sequencing was tested as an option to test difficult identifications, and in 2021 a high-throughput sequencing facility was developed to allow DNA sequencing of hundreds of fungal collections at a low cost. The presentation will give details on data validation, data use and how we have worked with cultivation of volunteers to provide a truly coherent model for collaboration on mushroom citizen science.


Author(s):  
Jerry Lanfear

ELIXIR unites Europe’s leading life science organisations in managing and safeguarding the increasing volume of data being generated by publicly funded research. It coordinates, integrates and sustains bioinformatics resources across its 22 member states, plus EMBL-EBI (European Molecular Biology Laboratory - European Bioinformatics Institute), and enables end users to access services and data that are vital for their research. ELIXIR's remit spans the full breadth of life science data, including data related to human health, food production (agriculture, farming, aquaculture) and the environment (e.g. pollution remediation, ecology), all of clear socio-economic benefit. As a result, ELIXIR contributes to the delivery of several sustainable development goals. This poster will introduce ELIXIR and describe the contribution it can make to coordinating data and services relevant to biodiversity. The poster will set the context for how molecularly-derived biodiversity occurrence data can significantly enhance resources such as the Global Biodiversity Information Facility (GBIF) and the Ocean Biogeographic Information System (OBIS), e.g. by filling in acute gaps in our knowledge of species across realms.


2019 ◽  
Vol 10 (1) ◽  
pp. 8-21 ◽  
Author(s):  
Res Altwegg ◽  
James D. Nichols

2021 ◽  
Author(s):  
Viviane Zulian ◽  
David A. W. Miller ◽  
Goncalo Ferraz

Mapping species distributions is a crucial but challenging requirement of wildlife management. The frequent need to sample vast expanses of potential habitat increases the cost of planned surveys and rewards accumulation of opportunistic observations. In this paper, we integrate planned survey data from roost counts with opportunistic samples from eBird, WikiAves and Xeno-canto citizen-science platforms to map the geographic range of the endangered Vinaceous-breasted Parrot. We demonstrate the estimation and mapping of species occurrence based on data integration while accounting for specifics of each data set, including observation technique and uncertainty about the observations. Our analysis illustrates 1) the incorporation of sampling effort, spatial autocorrelation, and site covariates in a joint-likelihood, hierarchical, data-integration model; 2) the evaluation of the contribution of each data set, as well as the contribution of effort covariates, spatial autocorrelation, and site covariates to the predictive ability of fitted models using a cross-validation approach; and 3) how spatial representation of the latent occupancy state (i.e. realized occupancy) helps identify areas with high uncertainty that should be prioritized in future field work. Our results reveal a Vinaceous-breasted Parrot geographic range of 434,670 square kilometers, which is three times larger than the Extant area previously reported in the IUCN Red List. The exclusion of one data set at a time from the analyses always resulted in worse predictions by the models of truncated data than by the full model, which included all data sets. Likewise, exclusion of spatial autocorrelation, site covariates, or sampling effort resulted in worse predictions. The integration of different data sets into one joint-likelihood model produced a more reliable representation of the species range than any individual data set taken on its own improving the use of citizen science data in combination with planned survey results.


Author(s):  
Scott A Chamberlain ◽  
Carl Boettiger

Background. The number of individuals of each species in a given location forms the basis for many sub-fields of ecology and evolution. Data on individuals, including which species, and where they're found can be used for a large number of research questions. Global Biodiversity Information Facility (hereafter, GBIF) is the largest of these. Programmatic clients for GBIF would make research dealing with GBIF data much easier and more reproducible. Methods. We have developed clients to access GBIF data for each of the R, Python, and Ruby programming languages: rgbif, pygbif, gbifrb. Results. For all clients we describe their design and utility, and demonstrate some use cases. Discussion. Programmatic access to GBIF will facilitate more open and reproducible science - the three GBIF clients described herein are a significant contribution towards this goal.


Sign in / Sign up

Export Citation Format

Share Document