Integrating ABCD and DarwinCore: Toward a better foundation for biodiversity information standards

For the last 15 years, Biodiversity Information Standards (TDWG) has recognized two competing standards for organism occurrence data, ABCD (Access to Biological Collections Data; Holetschek et al. 2012) and DarwinCore (Wieczorek et al. 2012). These two representations emerged from contrasting strategies for mobilizing information about organism occurrences (also commonly called species occurrence data). ABCD was capable of representing details of more kinds of information, but was necessarily more complicated. DarwinCore, on the other hand, was simpler but more limited in its ability to represent data of different kinds and formats. TDWG endorsed both standards because the different projects and communities that generated them remained dedicated to their different strategies and tool sets, and the Global Biodiversity Information Facility (GBIF) developed the ability to integrate data published in either standard. Since their inceptions, DarwinCore and ABCD have become more similar. DarwinCore has gotten more complicated through the addition of terms and has begun to assign terms to classes. ABCD is now expressed in RDF (Resource Description Framework), potentially enabling re-use of terms with alternative structures among classes. At the same time, methodologies for conceptual modeling and representing complex scientific data have continued to evolve. In particular, a suite of modeling and data representation methods related to linked data and the semantic web, i.e., RDF, SKOS (Simple Knowledge Organization System), and OWL (web Ontology Language), promise to make it easier for us to reconcile shared concepts among different representations or schemas. A mapping between ABCD 2.1 and DarwinCore has existed since before 2005.*1 ABCD 3.0 and DarwinCore are both now represented in RDF. In addition, the BioCollections Ontology (BCO) covers many of the shared concepts and is derived from the Basic Formal Ontology (BFO), an upper level ontology that has oriented many other biomedical ontologies. Reconciling ABCD and DarwinCore through alignment with BCO (in the OBO Foundry; Smith et al. 2007) would better connect TDWG standards to other domains in biology. We appreciate that many working scientists and data managers perceive ontologies as overly complicated. To mitigate the steep learning curve associated with ontologies, we expect to create simpler application profiles or schemas to guide and serve narrower communities of practice within the wider biodiversity domain. We also plan to integrate the current work of the Taxonomic Names and Concepts Interest Group and thereby eliminate the redundancy between DarwinCore and Taxonomic Concepts Transfer Schema (TCS; Kennedy et al. 2006). At the time of this writing, we have only agreements from the authors (i.e., conveners of relevant TDWG Interest Groups and other key stakeholders) to collaborate in pursuit of these common goals. In this presentation we will give a more detailed description of our objectives and products, the methods we are using to achieve them, and our progress to date.

Download Full-text

R Python, and Ruby clients for GBIF species occurrence data

10.7287/peerj.preprints.3304v1 ◽

2017 ◽

Cited By ~ 5

Author(s):

Scott A Chamberlain ◽

Carl Boettiger

Keyword(s):

Programming Languages ◽

Significant Contribution ◽

Species Occurrence ◽

Global Biodiversity Information Facility ◽

Occurrence Data ◽

Research Questions ◽

Global Biodiversity ◽

Number Of Individuals ◽

Biodiversity Information ◽

Programmatic Access

Background. The number of individuals of each species in a given location forms the basis for many sub-fields of ecology and evolution. Data on individuals, including which species, and where they're found can be used for a large number of research questions. Global Biodiversity Information Facility (hereafter, GBIF) is the largest of these. Programmatic clients for GBIF would make research dealing with GBIF data much easier and more reproducible. Methods. We have developed clients to access GBIF data for each of the R, Python, and Ruby programming languages: rgbif, pygbif, gbifrb. Results. For all clients we describe their design and utility, and demonstrate some use cases. Discussion. Programmatic access to GBIF will facilitate more open and reproducible science - the three GBIF clients described herein are a significant contribution towards this goal.

Download Full-text

What You Probably Didn't Know about Biodiversity Information Serving Our Nation (BISON)

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37476 ◽

2019 ◽

Vol 3 ◽

Author(s):

Gerald Guala

Keyword(s):

The United States ◽

Species Occurrence ◽

Global Biodiversity Information Facility ◽

Exact Role ◽

Occurrence Data ◽

The Us ◽

Significant Augmentation ◽

The Relationship ◽

Biodiversity Information

Biodiversity Information Serving Our Nation (BISON - bison.usgs.gov) is the US Node application for the Global Biodiversity Information Facility (GBIF) and the most comprehensive source of species occurrence data for the United States of America. It currently contains more than 460 million records and provides significant augmentation and integration of US occurrence data in terrestrial, marine and freshwater systems. Publicly released in 2013, BISON has generated a large community of stakeholders and they have passed on a lot of questions over the years through email ([email protected]), presentations and other means. In this presentation, some of the most common questions will be addressed in detail. For example: why all BISON data isn't in GBIF; how is BISON different from GBIF; what is the relationship between BISON and other US providers to GBIF; and what is the exact role of the Integrated Taxonomic Information System (ITIS - www.itis.gov) in BISON.

Download Full-text

ITIS and the Global Taxonomic Backbone

Biodiversity Information Science and Standards ◽

10.3897/biss.5.75471 ◽

2021 ◽

Vol 5 ◽

Author(s):

David Mitchell ◽

Thomas Orrell

Keyword(s):

Scientific Data ◽

Biological Data ◽

Global Biodiversity Information Facility ◽

Global Database ◽

The Public ◽

Taxonomic Information ◽

The World ◽

Biological Data Management ◽

Biodiversity Information ◽

Scientific Name

The Integrated Taxonomic Information System (ITIS) provides a regularly updated, global database that currently contains over 868,000 scientific names and their hierarchy. The program exists to communicate a comprehensive taxonomy of global species across 7 kingdoms that enables biodiversity information to be discovered, indexed, and connected across all human endeavors. ITIS partners with taxonomists and experts across the world to assemble scientific names and their taxonomic relationships, and then distributes that data through publicly available software. A single taxon may be represented by multiple scientific names, so ITIS makes it a priority to provide synonymy. Linking valid or accepted names with their subjective and objective synonyms is a key component of name translation and increases the precision of searches and organization of information. ITIS and its partner Species2000 create the Catalogue of Life (CoL) checklist that provides quality scientific name data for over 2.2M species. The CoL is the Global Biodiversity Information Facility (GBIF) taxonomic backbone. Providing automated open access to complete, current, literature-referenced, and expert-validated taxonomic information enables biological data management systems, and is elemental to enhancing the utility of the amassed scientific data across the world. Fully leveraging this information for the public good is crucial for empowering the global digital society to confront the most pressing social and environmental challenges.

Download Full-text

The Antarctic Biodiversity Portal, an Online Ecosystem for Linking, Integrating and Disseminating Antarctic Biodiversity Information

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37501 ◽

2019 ◽

Vol 3 ◽

Author(s):

Yi-Ming Gan ◽

Maxime Sweetlove ◽

Anton Van de Putte

Keyword(s):

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Antarctic Species ◽

Reference Guide ◽

Occurrence Data ◽

Event Based ◽

The Antarctic ◽

Global Initiatives ◽

Sampling Event ◽

Biodiversity Information

The Antarctic Biodiversity portal (biodiversity.aq) is a gateway to a wide variety of Antarctic biodiversity information and tools. Launched in 2005 as the Scientific Committee on Antarctic Research (SCAR) - Marine Biodiversity Information Network (SCAR-MarBIN, scarmarbin.be) and the Register of Antarctic Marine Species (RAMS, marinespecies.org/rams/), the system has grown in scope from purely marine to include terrestrial information. Biodiversity.aq is a SCAR product, currently supported by Belspo (Belgian Science Policy) as one of the Belgian contributions to the European Lifewatch-European Research Infrastructure Consortium (Lifewatch-ERIC). The goal of Lifewatch is to provide access to: distributed observatories/sensor networks; interoperable databases, existing (data-)networks, using accepted standards; high performance computing (HPC) and grid power, including the use of the state-of-the-art of cloud and big data paradigm technologies; software and tools for visualization, analysis and modeling. Here we provide an overview of the most recent advances in the biodiversity.aq online ecosystem, a number of use cases as well as an overview of future directions. Some of the most notable components are: The Register of Antarctic Species (RAS, ras.biodiversity.aq) is a component of the Lifewatch Taxonomic Backbone and provides an authoritative and comprehensive list of names of marine and terrestrial species in Antarctica and the Southern Ocean. It serves as a reference guide for users to interpret taxonomic literature, as valid names and other names in use are both provided. Integrated Publishing Toolkit (IPT, ipt.biodiversity.aq) allows disseminating Antarctic biodiversity data into global initiatives such as the Ocean Biogeographic Information System (OBIS, obis.org) as Antarctic node of OBIS (Ant-OBIS, also formerly known as SCAR-MarBIN) and the Global Biodiversity Information Facility (GBIF, gbif.org) as Antarctic Biodiversity Information Facility (AntaBIF). Data that can be made available include metadata, species checklists, species occurrence data and more recently, sampling event-based data. Data from these international portals can be accessed through data.biodiversity.aq. The Register of Antarctic Species (RAS, ras.biodiversity.aq) is a component of the Lifewatch Taxonomic Backbone and provides an authoritative and comprehensive list of names of marine and terrestrial species in Antarctica and the Southern Ocean. It serves as a reference guide for users to interpret taxonomic literature, as valid names and other names in use are both provided. Integrated Publishing Toolkit (IPT, ipt.biodiversity.aq) allows disseminating Antarctic biodiversity data into global initiatives such as the Ocean Biogeographic Information System (OBIS, obis.org) as Antarctic node of OBIS (Ant-OBIS, also formerly known as SCAR-MarBIN) and the Global Biodiversity Information Facility (GBIF, gbif.org) as Antarctic Biodiversity Information Facility (AntaBIF). Data that can be made available include metadata, species checklists, species occurrence data and more recently, sampling event-based data. Data from these international portals can be accessed through data.biodiversity.aq. Through SCAR, Biodiversity.aq builds on an international network of expert that provide expert knowledge on taxonomy, species distribution,and ecology. It provides a strong and tested platform for sharing, integrating, discovering and analysing Antarctic biodiversity information originating from a variety of sources into a distributed system.

Download Full-text

ELIXIR: Data for Life – Coordinating life science data and services across Europe

Biodiversity Information Science and Standards ◽

10.3897/biss.3.38430 ◽

2019 ◽

Vol 3 ◽

Author(s):

Jerry Lanfear

Keyword(s):

Life Science ◽

Publicly Funded ◽

Health Food ◽

Science Data ◽

Global Biodiversity Information Facility ◽

Occurrence Data ◽

Development Goals ◽

Biology Laboratory ◽

Access Services ◽

Biodiversity Information

ELIXIR unites Europe’s leading life science organisations in managing and safeguarding the increasing volume of data being generated by publicly funded research. It coordinates, integrates and sustains bioinformatics resources across its 22 member states, plus EMBL-EBI (European Molecular Biology Laboratory - European Bioinformatics Institute), and enables end users to access services and data that are vital for their research. ELIXIR's remit spans the full breadth of life science data, including data related to human health, food production (agriculture, farming, aquaculture) and the environment (e.g. pollution remediation, ecology), all of clear socio-economic benefit. As a result, ELIXIR contributes to the delivery of several sustainable development goals. This poster will introduce ELIXIR and describe the contribution it can make to coordinating data and services relevant to biodiversity. The poster will set the context for how molecularly-derived biodiversity occurrence data can significantly enhance resources such as the Global Biodiversity Information Facility (GBIF) and the Ocean Biogeographic Information System (OBIS), e.g. by filling in acute gaps in our knowledge of species across realms.

Download Full-text

R Python, and Ruby clients for GBIF species occurrence data

10.7287/peerj.preprints.3304 ◽

2017 ◽

Cited By ~ 4

Author(s):

Scott A Chamberlain ◽

Carl Boettiger

Keyword(s):

Programming Languages ◽

Significant Contribution ◽

Species Occurrence ◽

Global Biodiversity Information Facility ◽

Occurrence Data ◽

Research Questions ◽

Global Biodiversity ◽

Number Of Individuals ◽

Biodiversity Information ◽

Programmatic Access

Download Full-text

Introduction to the Symposium: Improving access to hidden scientific data in the Biodiversity Heritage Library

Biodiversity Information Science and Standards ◽

10.3897/biss.3.35620 ◽

2019 ◽

Vol 3 ◽

Author(s):

Constance Rinaldo

Keyword(s):

Character Recognition ◽

Optical Character Recognition ◽

Scientific Data ◽

Text Search ◽

Global Biodiversity Information Facility ◽

The World ◽

Species Specific ◽

Program Interface ◽

Biodiversity Information ◽

Scientific Name

This will be a short introduction to the symposium: Improving access to hidden scientific data in the Biodiversity Heritage Library. The symposium will present examples of how the Biodiversity Heritage Library (BHL) collaborates across the international consortium and with community partners around the world to help enhance access to the biodiversity literature. Literature repositories, particularly the BHL collections, have been recognized as critical to the global scientific community. A diverse global user community propels BHL and BHL users to develop access tools beyond the standard “title, author, subject” search. BHL utilizes the Global Names Recognition and Discovery (GNRD) service to identify taxonomic names within text rendered by Optical Character Recognition (OCR) software, enabling scientific name searches and creation of species-specific bibliographies, critical to systematics research. In this symposium, we will hear from international partners and creative users making data from the BHL globally accessible for the kinds of larger-scale analysis enabled by BHL’s full-text search capabilities and Application Program Interface (API) protocols. In addition to taxonomic name services already incorporated in BHL, the consortium has also begun exploring georeferencing strategies for better searching and potential connections with key biodiversity resources such as the Global Biodiversity Information Facility (GBIF). With many different institutions around the world participating, the ability to work together virtually is critical for a seamless end product that meets the demands of the international community as well as the needs of local institutions.

Download Full-text

The Antarctic Biodiversity Portal, an Online Ecosystem for Linking, Integrating and Disseminating Antarctic Biodiversity Information

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37182 ◽

2019 ◽

Vol 3 ◽

Author(s):

Yi-Ming Gan ◽

Maxime Sweetlove ◽

Anton Van de Putte

Keyword(s):

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Antarctic Species ◽

Reference Guide ◽

Occurrence Data ◽

Event Based ◽

The Antarctic ◽

Global Initiatives ◽

Sampling Event ◽

Biodiversity Information

The Antarctic Biodiversity portal (biodiversity.aq) is a gateway to a wide variety of Antarctic biodiversity information and tools. Launched in 2015 as the Scientific Committee on Antarctic Research (SCAR) - Marine Biodiversity Information Network (SCAR-MarBIN, scarmarbin.be) and the Register of Antarctic Marine Species (RAMS, marinespecies.org/rams/), the system has grown in scope from purely marine to include terrestrial information. Biodiversity.aq is a SCAR product, currently supported by Belspo (Belgian Science Policy) as one of the Belgian contributions to the European Lifewatch-European Research Infrastructure Consortium (Lifewatch-ERIC). The goal of Lifewatch is to provide access to: distributed observatories/sensor networks; interoperable databases, existing (data-)networks, using accepted standards; high performance computing (HPC) and grid power, including the use of the state-of-the-art of cloud and big data paradigm technologies; software and tools for visualization, analysis and modeling. Here we provide an overview of the most recent advances in the biodiversity.aq online ecosystem, a number of use cases as well as an overview of future directions. Some of the most notable components are: The Register of Antarctic Species (RAS, ras.biodiversity.aq) is a component of the Lifewatch Taxonomic Backbone and provides an authoritative and comprehensive list of names of marine and terrestrial species in Antarctica and the Southern Ocean. It serves as a reference guide for users to interpret taxonomic literature, as valid names and other names in use are both provided. Integrated Publishing Toolkit (IPT, ipt.biodiversity.aq) allows disseminating Antarctic biodiversity data into global initiatives such as the Ocean Biogeographic Information System (OBIS, obis.org) as Antarctic node of OBIS (Ant-OBIS, also formerly known as SCAR-MarBIN) and the Global Biodiversity Information Facility (GBIF, gbif.org) as Antarctic Biodiversity Information Facility (AntaBIF). Data that can be made available include metadata, species checklists, species occurrence data and more recently, sampling event-based data. Data from these international portals can be accessed through data.biodiversity.aq. The Register of Antarctic Species (RAS, ras.biodiversity.aq) is a component of the Lifewatch Taxonomic Backbone and provides an authoritative and comprehensive list of names of marine and terrestrial species in Antarctica and the Southern Ocean. It serves as a reference guide for users to interpret taxonomic literature, as valid names and other names in use are both provided. Integrated Publishing Toolkit (IPT, ipt.biodiversity.aq) allows disseminating Antarctic biodiversity data into global initiatives such as the Ocean Biogeographic Information System (OBIS, obis.org) as Antarctic node of OBIS (Ant-OBIS, also formerly known as SCAR-MarBIN) and the Global Biodiversity Information Facility (GBIF, gbif.org) as Antarctic Biodiversity Information Facility (AntaBIF). Data that can be made available include metadata, species checklists, species occurrence data and more recently, sampling event-based data. Data from these international portals can be accessed through data.biodiversity.aq. Through SCAR, Biodiversity.aq builds on an international network of expert that provide expert knowledge on taxonomy, species distribution,and ecology. It provides a strong and tested platform for sharing, integrating, discovering and analysing Antarctic biodiversity information originating from a variety of sources into a distributed system.

Download Full-text

SpOccSum: An easy-to-use Python tool to summarize species occurrence data from material examined lists in taxonomic revisions

Biodiversity Information Science and Standards ◽

10.3897/biss.3.36513 ◽

2019 ◽

Vol 3 ◽

Author(s):

Michael Trizna ◽

Torsten Dikow

Keyword(s):

Data Science ◽

Biodiversity Data ◽

Species Occurrence ◽

Global Biodiversity Information Facility ◽

Seasonal Incidence ◽

Distribution Maps ◽

Occurrence Data ◽

Darwin Core ◽

Biodiversity Information ◽

Northern And Southern Hemispheres

Taxonomic revisions contain crucial biodiversity data in the material examined sections for each species. In entomology, material examined lists minimally include the collecting locality, date of collection, and the number of specimens of each collection event. Insect species might be represented in taxonomic revisions by only a single specimen or hundreds to thousands of specimens. Furthermore, revisions of insect genera might treat small genera with few species or include tens to hundreds of species. Summarizing data from such large and complex material examined lists and revisions is cumbersome, time-consuming, and prone to errors. However, providing data on the seasonal incidence, abundance, and collecting period of species is an important way to mobilize primary biodiversity data to understand a species’s occurrence or rarity. Here, we present SpOccSum (Species Occurrence Summary)—a tool to easily obtain metrics of seasonal incidence from specimen occurrence data in taxonomic revisions. SpOccSum is written in Python (Python Software Foundation 2019) and accessible through the Anaconda Python/R Data Science Platform as a Jupyter Notebook (Kluyver et al. 2016). The tool takes a simple list of specimen data containing species name, locality, date of collection (preferably separated by day, month, and year), and number of specimens in CSV format and generates a series of tables and graphs summarizing: number of specimens per species, number of specimens collected per month, number of unique collection events, as well as earliest, and most recent collecting year of each species. number of specimens per species, number of specimens collected per month, number of unique collection events, as well as earliest, and most recent collecting year of each species. The results can be exported as graphics or as csv-formatted tables and can easily be included in manuscripts for publication. An example of an early version of the summary produced by SpOccSum can be viewed in Tables 1, 2 from Markee and Dikow (2018). To accommodate seasonality in the Northern and Southern Hemispheres, users can choose to start the data display with either January or July. When geographic coordinates are available and species have widespread distributions spanning, for example, the equator, the user can itemize particular regions such as North of Tropic of Cancer (23.5˚N), Tropic of Cancer to the Equator, Equator to Tropic of Capricorn, and South of Tropic of Capricorn (23.5˚S). Other features currently in development include the ability to produce distribution maps from the provided data (when geographic coordinates are included) and the option to export specimen occurrence data as a Darwin-Core Archive ready for upload to the Global Biodiversity Information Facility (GBIF).

Download Full-text

Current GBIF occurrence data demonstrates both promise and limitations for potential red listing of spiders

Biodiversity Data Journal ◽

10.3897/bdj.7.e47369 ◽

2019 ◽

Vol 7 ◽

Cited By ~ 3

Author(s):

Vaughn Shirey ◽

Sini Seppälä ◽

Vasco Branco ◽

Pedro Cardoso

Keyword(s):

Iucn Red List ◽

Red List ◽

Global Biodiversity Information Facility ◽

Occurrence Data ◽

Primary Literature ◽

Assessment Metrics ◽

Conservation Assessments ◽

Source Of Information ◽

Biodiversity Information ◽

Combined Data

Conservation assessments of hyperdiverse groups of organisms are often challenging and limited by the availability of occurrence data needed to calculate assessment metrics such as extent of occurrence (EOO). Spiders represent one such diverse group and have historically been assessed using primary literature with retrospective georeferencing. Here we demonstrate the differences in estimations of EOO and hypothetical IUCN Red List classifications for two extensive spider datasets comprising 479 species in total. The EOO were estimated and compared using literature-based assessments, Global Biodiversity Information Facility (GBIF)-based assessments and combined data assessments. We found that although few changes to hypothetical IUCN Red List classifications occurred with the addition of GBIF data, some species (3.3%) which could previously not be classified could now be assessed with the addition of GBIF data. In addition, the hypothetical classification changed for others (1.5%). On the other hand, GBIF data alone did not provide enough data for 88.7% of species. These results demonstrate the potential of GBIF data to serve as an additional source of information for conservation assessments, complementing literature data, but not particularly useful on its own as it stands right now for spiders.

Download Full-text