scholarly journals Research applications of primary biodiversity databases in the digital age

2019 ◽  
Author(s):  
Joan E. Ball-Damerow ◽  
Laura Brenskelle ◽  
Narayani Barve ◽  
Pamela S. Soltis ◽  
Petra Sierwald ◽  
...  

ABSTRACTWe are in the midst of unprecedented change—climate shifts and sustained, widespread habitat degradation have led to dramatic declines in biodiversity rivaling historical extinction events. At the same time, new approaches to publishing and integrating previously disconnected data resources promise to help provide the evidence needed for more efficient and effective conservation and management. Stakeholders have invested considerable resources to contribute to online databases of species occurrences and genetic barcodes. However, estimates suggest that only 10% of biocollections are available in digital form. The biocollections community must therefore continue to promote digitization efforts, which in part requires demonstrating compelling applications of the data. Our overarching goal is therefore to determine trends in use of mobilized species occurrence data since 2010, as online systems have grown and now provide over one billion records. To do this, we characterized 501 papers that use openly accessible biodiversity databases. Our standardized tagging protocol was based on key topics of interest, including: database(s) used, taxa addressed, general uses of data, other data types linked to species occurrence data, and data quality issues addressed. We found that the most common uses of online biodiversity databases have been to estimate species distribution and richness, to outline data compilation and publication, and to assist in developing species checklists or describing new species. Only 69% of papers in our dataset addressed one or more aspects of data quality, which is low considering common errors and biases known to exist in opportunistic datasets. Globally, we find that biodiversity databases are still in the initial stages of data compilation. Novel and integrative applications are restricted to certain taxonomic groups and regions with higher numbers of quality records. Continued data digitization, publication, enhancement, and quality control efforts are necessary to make biodiversity science more efficient and relevant in our fast-changing world.

2015 ◽  
Author(s):  
Alexander Zizka ◽  
Alexandre Antonelli

1. Large-scale species occurrence data from geo-referenced observations and collected specimens are crucial for analyses in ecology, evolution and biogeography. Despite the rapidly growing availability of such data, their use in evolutionary analyses is often hampered by tedious manual classification of point occurrences into operational areas, leading to a lack of reproducibility and concerns regarding data quality. 2. Here we present speciesgeocodeR, a user-friendly R-package for data cleaning, data exploration and data visualization of species point occurrences using discrete operational areas, and linking them to analyses invoking phylogenetic trees. 3. The three core functions of the package are 1) automated and reproducible data cleaning, 2) rapid and reproducible classification of point occurrences into discrete operational areas in an adequate format for subsequent biogeographic analyses, and 3) a comprehensive summary and visualization of species distributions to explore large datasets and ensure data quality. In addition, speciesgeocodeR facilitates the access and analysis of publicly available species occurrence data, widely used operational areas and elevation ranges. Other functionalities include the implementation of minimum occurrence thresholds and the visualization of coexistence patterns and range sizes. SpeciesgeocodeR accompanies a richly illustrated and easy-to-follow tutorial and help functions.


Author(s):  
Alexander Zizka ◽  
Alexandre Antonelli ◽  
Daniele Silvestro

AbstractGeo-referenced species occurrences from public databases have become essential to biodiversity research and conservation. However, geographical biases are widely recognized as a factor limiting the usefulness of such data for understanding species diversity and distribution. In particular, differences in sampling intensity across a landscape due to differences in human accessibility are ubiquitous but may differ in strength among taxonomic groups and datasets. Although several factors have been described to influence human access (such as presence of roads, rivers, airports and cities), quantifying their specific and combined effects on recorded occurrence data remains challenging. Here we present sampbias, an algorithm and software for quantifying the effect of accessibility biases in species occurrence datasets. Sampbias uses a Bayesian approach to estimate how sampling rates vary as a function of proximity to one or multiple bias factors. The results are comparable among bias factors and datasets. We demonstrate the use of sampbias on a dataset of mammal occurrences from the island of Borneo, showing a high biasing effect of cities and a moderate effect of roads and airports. Sampbias is implemented as a well-documented, open-access and user-friendly R package that we hope will become a standard tool for anyone working with species occurrences in ecology, evolution, conservation and related fields.


Author(s):  
Michael K. Young ◽  
Daniel J. Isaak ◽  
Kevin S. McKelvey ◽  
Michael K. Schwartz ◽  
Kellie J. Carim ◽  
...  

Paleobiology ◽  
2001 ◽  
Vol 27 (4) ◽  
pp. 602-630 ◽  
Author(s):  
Michael Foote

Apparent variation in rates of origination and extinction reflects the true temporal pattern of taxonomic rates as well as the distorting effects of incomplete and variable preservation, effects that are themselves exacerbated by true variation in taxonomic rates. Here I present an approach that can undo these distortions and thus permit estimates of true taxonomic rates, while providing estimates of preservation in the process. Standard survivorship probabilities are modified to incorporate variable taxonomic rates and rates of fossil recovery. Time series of these rates are explored by numerical optimization until the set of rates that best explains the observed data is found. If internal occurrences within stratigraphic ranges are available, or if temporal patterns of fossil recovery can otherwise be assumed, these constraints can be exploited, but they are by no means necessary. In its most general form, the approach requires no data other than first and last appearances. When tested against simulated data, the method is able to recover temporal patterns in rates of origination, extinction, and preservation. With empirical data, it yields estimates of preservation rate that agree with those obtained independently by tabulating internal occurrences within stratigraphic ranges. Moreover, when empirical occurrence data are artificially degraded, the method detects the resulting gaps in sampling and corrects taxonomic rates. Preliminary application to data on Paleozoic marine animals suggests that some features of the apparent record, such as the forward smearing of true origination events and the backward smearing of true extinction events, can be detected and corrected. Other features, such as the end-Ordovician extinction, may be fairly accurate at face value.


2018 ◽  
Vol 93 ◽  
pp. 333-343 ◽  
Author(s):  
Charlotte L. Outhwaite ◽  
Richard E. Chandler ◽  
Gary D. Powney ◽  
Ben Collen ◽  
Richard D. Gregory ◽  
...  

Ecology ◽  
2003 ◽  
Vol 84 (1) ◽  
pp. 242-251 ◽  
Author(s):  
Raphaël Pélissier ◽  
Pierre Couteron ◽  
Stéphane Dray ◽  
Daniel Sabatier

Author(s):  
Damiano Oldoni ◽  
Quentin Groom ◽  
Peter Desmet

The digital era has brought about an impressive increase in the volume of published species occurrence data. Research infrastructures such as the Global Biodiversity Information Facility (GBIF), the digitization of legacy data, and the use of mobile applications have all played a role in this transition. More data implies, unavoidably, more heterogeneity at multiple levels as a result of the different methods and standards used to collect data. Data standardization and aggregation help to reduce this heterogeneity. Furthermore, intermediate data products that can be used for activities such as mapping, modeling and monitoring improve the repeatability and reproducibility of biodiversity research (Kissling et al. 2017). Occurrences can be defined as events in a three-dimensional space where the dimensions are taxonomic (what), temporal (when) and spatial (where). They are then aggregated into what we coined occurrence cube (Fig. 1). The taxonomic dimension is categorical. Research infrastructures like GBIF use a taxonomic backbone, thus making data aggregation at species level or higher rank relatively easy. The temporal dimension is a continuum and the temporal uncertainty is usually lower than the typical aggregation span, typically a year. Regarding the spatial dimension, occurrences are typically filtered to remove those with too large an uncertainty to fit the grid scheme being used. Meaning that the spatial uncertainty is largely unused. We developed a method to take into account this spatial uncertainty while aggregating data. In particular, we state that an occurrence is spatially representable as a closed plane figure such as a circle, hexagon or square, never as the geometric centre (centroid) of it. As for GBIF occurrence data, the coordinateUncertaintyInMeters is defined as the radius describing the smallest circle containing the whole of the location (see Darwin Core standard). So, spatially speaking, we refer to occurrences as circles, even if the method described below is general. After harvesting the occurrence data and providing a data quality assessment (e.g. removing occurrences without coordinates or with suspicious coordinates) we can assign occurrences to a reference grid such as the European reference grid of the European Environment Agency (EEA) at 1 km scale. In this spatial aggregation we randomly choose a point within the occurrence circle and assign it to the grid cell in which it is contained. We can aggregate further by time (e.g. by year) and taxonomy (e.g. by species), where aggregating means counting how many occurrences are in each specific taxonomic-spatial-temporal unit. The analogy with geometry goes further: the occurrence cube can, as any cube, be projected on an orthogonal plane by aggregating along one of the three dimensions. In particular, projecting the cube on the taxonomic and temporal dimensions can be done by adding up the number of occurrences, or counting the number of occupied cells, thus estimating the area of occupancy. The occurrence cube paradigm has been developed within the Tracking Invasive Alien Species (TrIAS) project (Vanderhoeven et al. 2017) following Open Science and FAIR principles. We created and published occurrence cubes at the species level for Belgium and Italy (Oldoni et al. 2020b) and the occurrence cubes for non-native taxa in Belgium and Europe (Oldoni et al. 2020a).


PLoS ONE ◽  
2021 ◽  
Vol 16 (3) ◽  
pp. e0234587
Author(s):  
Mariano J. Feldman ◽  
Louis Imbeau ◽  
Philippe Marchand ◽  
Marc J. Mazerolle ◽  
Marcel Darveau ◽  
...  

Citizen science (CS) currently refers to the participation of non-scientist volunteers in any discipline of conventional scientific research. Over the last two decades, nature-based CS has flourished due to innovative technology, novel devices, and widespread digital platforms used to collect and classify species occurrence data. For scientists, CS offers a low-cost approach of collecting species occurrence information at large spatial scales that otherwise would be prohibitively expensive. We examined the trends and gaps linked to the use of CS as a source of data for species distribution models (SDMs), in order to propose guidelines and highlight solutions. We conducted a quantitative literature review of 207 peer-reviewed articles to measure how the representation of different taxa, regions, and data types have changed in SDM publications since the 2010s. Our review shows that the number of papers using CS for SDMs has increased at approximately double the rate of the overall number of SDM papers. However, disparities in taxonomic and geographic coverage remain in studies using CS. Western Europe and North America were the regions with the most coverage (73%). Papers on birds (49%) and mammals (19.3%) outnumbered other taxa. Among invertebrates, flying insects including Lepidoptera, Odonata and Hymenoptera received the most attention. Discrepancies between research interest and availability of data were as especially important for amphibians, reptiles and fishes. Compared to studies on animal taxa, papers on plants using CS data remain rare. Although the aims and scope of papers are diverse, species conservation remained the central theme of SDM using CS data. We present examples of the use of CS and highlight recommendations to motivate further research, such as combining multiple data sources and promoting local and traditional knowledge. We hope our findings will strengthen citizen-researchers partnerships to better inform SDMs, especially for less-studied taxa and regions. Researchers stand to benefit from the large quantity of data available from CS sources to improve global predictions of species distributions.


Author(s):  
Scott A Chamberlain ◽  
Carl Boettiger

Background. The number of individuals of each species in a given location forms the basis for many sub-fields of ecology and evolution. Data on individuals, including which species, and where they're found can be used for a large number of research questions. Global Biodiversity Information Facility (hereafter, GBIF) is the largest of these. Programmatic clients for GBIF would make research dealing with GBIF data much easier and more reproducible. Methods. We have developed clients to access GBIF data for each of the R, Python, and Ruby programming languages: rgbif, pygbif, gbifrb. Results. For all clients we describe their design and utility, and demonstrate some use cases. Discussion. Programmatic access to GBIF will facilitate more open and reproducible science - the three GBIF clients described herein are a significant contribution towards this goal.


Sign in / Sign up

Export Citation Format

Share Document