scholarly journals R Python, and Ruby clients for GBIF species occurrence data

Author(s):  
Scott A Chamberlain ◽  
Carl Boettiger

Background. The number of individuals of each species in a given location forms the basis for many sub-fields of ecology and evolution. Data on individuals, including which species, and where they're found can be used for a large number of research questions. Global Biodiversity Information Facility (hereafter, GBIF) is the largest of these. Programmatic clients for GBIF would make research dealing with GBIF data much easier and more reproducible. Methods. We have developed clients to access GBIF data for each of the R, Python, and Ruby programming languages: rgbif, pygbif, gbifrb. Results. For all clients we describe their design and utility, and demonstrate some use cases. Discussion. Programmatic access to GBIF will facilitate more open and reproducible science - the three GBIF clients described herein are a significant contribution towards this goal.

Author(s):  
Scott A Chamberlain ◽  
Carl Boettiger

Background. The number of individuals of each species in a given location forms the basis for many sub-fields of ecology and evolution. Data on individuals, including which species, and where they're found can be used for a large number of research questions. Global Biodiversity Information Facility (hereafter, GBIF) is the largest of these. Programmatic clients for GBIF would make research dealing with GBIF data much easier and more reproducible. Methods. We have developed clients to access GBIF data for each of the R, Python, and Ruby programming languages: rgbif, pygbif, gbifrb. Results. For all clients we describe their design and utility, and demonstrate some use cases. Discussion. Programmatic access to GBIF will facilitate more open and reproducible science - the three GBIF clients described herein are a significant contribution towards this goal.


Author(s):  
Gerald Guala

Biodiversity Information Serving Our Nation (BISON - bison.usgs.gov) is the US Node application for the Global Biodiversity Information Facility (GBIF) and the most comprehensive source of species occurrence data for the United States of America. It currently contains more than 460 million records and provides significant augmentation and integration of US occurrence data in terrestrial, marine and freshwater systems. Publicly released in 2013, BISON has generated a large community of stakeholders and they have passed on a lot of questions over the years through email ([email protected]), presentations and other means. In this presentation, some of the most common questions will be addressed in detail. For example: why all BISON data isn't in GBIF; how is BISON different from GBIF; what is the relationship between BISON and other US providers to GBIF; and what is the exact role of the Integrated Taxonomic Information System (ITIS - www.itis.gov) in BISON.


Author(s):  
Michael Trizna ◽  
Torsten Dikow

Taxonomic revisions contain crucial biodiversity data in the material examined sections for each species. In entomology, material examined lists minimally include the collecting locality, date of collection, and the number of specimens of each collection event. Insect species might be represented in taxonomic revisions by only a single specimen or hundreds to thousands of specimens. Furthermore, revisions of insect genera might treat small genera with few species or include tens to hundreds of species. Summarizing data from such large and complex material examined lists and revisions is cumbersome, time-consuming, and prone to errors. However, providing data on the seasonal incidence, abundance, and collecting period of species is an important way to mobilize primary biodiversity data to understand a species’s occurrence or rarity. Here, we present SpOccSum (Species Occurrence Summary)—a tool to easily obtain metrics of seasonal incidence from specimen occurrence data in taxonomic revisions. SpOccSum is written in Python (Python Software Foundation 2019) and accessible through the Anaconda Python/R Data Science Platform as a Jupyter Notebook (Kluyver et al. 2016). The tool takes a simple list of specimen data containing species name, locality, date of collection (preferably separated by day, month, and year), and number of specimens in CSV format and generates a series of tables and graphs summarizing: number of specimens per species, number of specimens collected per month, number of unique collection events, as well as earliest, and most recent collecting year of each species. number of specimens per species, number of specimens collected per month, number of unique collection events, as well as earliest, and most recent collecting year of each species. The results can be exported as graphics or as csv-formatted tables and can easily be included in manuscripts for publication. An example of an early version of the summary produced by SpOccSum can be viewed in Tables 1, 2 from Markee and Dikow (2018). To accommodate seasonality in the Northern and Southern Hemispheres, users can choose to start the data display with either January or July. When geographic coordinates are available and species have widespread distributions spanning, for example, the equator, the user can itemize particular regions such as North of Tropic of Cancer (23.5˚N), Tropic of Cancer to the Equator, Equator to Tropic of Capricorn, and South of Tropic of Capricorn (23.5˚S). Other features currently in development include the ability to produce distribution maps from the provided data (when geographic coordinates are included) and the option to export specimen occurrence data as a Darwin-Core Archive ready for upload to the Global Biodiversity Information Facility (GBIF).


Author(s):  
Tim Robertson ◽  
David Martin ◽  
Nick dos Remedios

The Global Biodiversity Information Facility (GBIF) and the Atlas of Living Australia (ALA) are two well-connected leading infrastructures serving the biodiversity community. As the national node for GBIF, the ALA serves to provide rich, localized services for the community of users in Australia and also acts as the gateway for datasets being shared internationally on GBIF.org. While these explorations target collaboration initially with Australia, we anticipate this may be of interest to other adopters of the Living Atlas platform in the future. We will give an update of the state of progress to date, along with lessons learnt and summarise a roadmap for the future. Recognising that significant overlap exists in the function of the systems, and that advancement in technology allows GBIF.org to offer more functionality, we have initiated a process of exploring better alignment of these infrastructures. Such a move is expected to bring the benefits of consistent data handling, improved citation tracking, coordinated deployment of new features across the entire data publishing community, better reuse of modules and an overall reduction in cost of development, deployment and operation. Our initial areas of exploration focuses on two specific components which are common to most biodiversity portals: a registry of datasets and the indexing of occurrence data. Use of a common registry for organisations, collections, datasets and associated metadata will reduce the effort spent in curating content, while also improving consistency by removing the need for synchronisation. In addition, a revised data pipeline for the indexing of occurrence records that powers both GBIF.org and ALA is anticipated to accommodate features such as consistent flagging of data quality issues and standardised practice for citation and tracking citations.


2020 ◽  
Vol 8 ◽  
Author(s):  
Sonia Ferreira ◽  
Rui Andrade ◽  
Ana Gonçalves ◽  
Pedro Sousa ◽  
Joana Paupério ◽  
...  

The InBIO Barcoding Initiative (IBI) Diptera 01 dataset contains records of 203 specimens of Diptera. All specimens have been morphologically identified to species level, and belong to 154 species in total. The species represented in this dataset correspond to about 10% of continental Portugal dipteran species diversity. All specimens were collected north of the Tagus river in Portugal. Sampling took place from 2014 to 2018, and specimens are deposited in the IBI collection at CIBIO, Research Center in Biodiversity and Genetic Resources. This dataset contributes to the knowledge on the DNA barcodes and distribution of 154 species of Diptera from Portugal and is the first of the planned IBI database public releases, which will make available genetic and distribution data for a series of taxa. All specimens have their DNA barcodes made publicly available in the Barcode of Life Data System (BOLD) online database and the distribution dataset can be freely accessed through the Global Biodiversity Information Facility (GBIF).


Author(s):  
Raul Sierra-Alcocer ◽  
Christopher Stephens ◽  
Juan Barrios ◽  
Constantino González‐Salazar ◽  
Juan Carlos Salazar Carrillo ◽  
...  

SPECIES (Stephens et al. 2019) is a tool to explore spatial correlations in biodiversity occurrence databases. The main idea behind the SPECIES project is that the geographical correlations between the distributions of taxa records have useful information. The problem, however, is that if we have thousands of species (Mexico's National System of Biodiversity Information has records of around 70,000 species) then we have millions of potential associations, and exploring them is far from easy. Our goal with SPECIES is to facilitate the discovery and application of meaningful relations hiding in our data. The main variables in SPECIES are the geographical distributions of species occurrence records. Other types of variables, like the climatic variables from WorldClim (Hijmans et al. 2005), are explanatory data that serve for modeling. The system offers two modes of analysis. In one, the user defines a target species, and a selection of species and abiotic variables; then the system computes the spatial correlations between the target species and each of the other species and abiotic variables. The request from the user can be as small as comparing one species to another, or as large as comparing one species to all the species in the database. A user may wonder, for example, which species are usual neighbors of the jaguar, this mode could help answer this question. The second mode of analysis gives a network perspective, in it, the user defines two groups of taxa (and/or environmental variables), the output in this case is a correlation network where the weight of a link between two nodes represents the spatial correlation between the variables that the nodes represent. For example, one group of taxa could be hummingbirds (Trochilidae family) and the second flowers of the Lamiaceae family. This output would help the user analyze which pairs of hummingbird and flower are highly correlated in the database. SPECIES data architecture is optimized to support fast hypotheses prototyping and testing with the analysis of thousands of biotic and abiotic variables. It has a visualization web interface that presents descriptive results to the user at different levels of detail. The methodology in SPECIES is relatively simple, it partitions the geographical space with a regular grid and treats a species occurrence distribution as a present/not present boolean variable over the cells. Given two species (or one species and one abiotic variable) it measures if the number of co-occurrences between the two is more (or less) than expected. If it is more than expected indicates a signal of a positive relation, whereas if it is less it would be evidence of disjoint distributions. SPECIES provides an open web application programming interface (API) to request the computation of correlations and statistical dependencies between variables in the database. Users can create applications that consume this 'statistical web service' or use it directly to further analyze the results in frameworks like R or Python. The project includes an interactive web application that does exactly that: requests analysis from the web service and lets the user experiment and visually explore the results. We believe this approach can be used on one side to augment the services provided from data repositories; and on the other side, facilitate the creation of specialized applications that are clients of these services. This scheme supports big-data-driven research for a wide range of backgrounds because end users do not need to have the technical know-how nor the infrastructure to handle large databases. Currently, SPECIES hosts: all records from Mexico's National Biodiversity Information System (CONABIO 2018) and a subset of Global Biodiversity Information Facility data that covers the contiguous USA (GBIF.org 2018b) and Colombia (GBIF.org 2018a). It also includes discretizations of environmental variables from WorldClim, from the Environmental Rasters for Ecological Modeling project (Title and Bemmels 2018), from CliMond (Kriticos et al. 2012), and topographic variables (USGS EROS Center 1997b, USGS EROS Center 1997a). The long term plan, however, is to incrementally include more data, specially all data from the Global Biodiversity Information Facility. The code of the project is open source, and the repositories are available online (Front-end, Web Services Application Programming Interface, Database Building scripts). This presentation is a demonstration of SPECIES' functionality and its overall design.


Sign in / Sign up

Export Citation Format

Share Document