scholarly journals ELIXIR: Data for Life – Coordinating life science data and services across Europe 

Author(s):  
Jerry Lanfear

ELIXIR unites Europe’s leading life science organisations in managing and safeguarding the increasing volume of data being generated by publicly funded research. It coordinates, integrates and sustains bioinformatics resources across its 22 member states, plus EMBL-EBI (European Molecular Biology Laboratory - European Bioinformatics Institute), and enables end users to access services and data that are vital for their research. ELIXIR's remit spans the full breadth of life science data, including data related to human health, food production (agriculture, farming, aquaculture) and the environment (e.g. pollution remediation, ecology), all of clear socio-economic benefit. As a result, ELIXIR contributes to the delivery of several sustainable development goals. This poster will introduce ELIXIR and describe the contribution it can make to coordinating data and services relevant to biodiversity. The poster will set the context for how molecularly-derived biodiversity occurrence data can significantly enhance resources such as the Global Biodiversity Information Facility (GBIF) and the Ocean Biogeographic Information System (OBIS), e.g. by filling in acute gaps in our knowledge of species across realms.

Author(s):  
Scott A Chamberlain ◽  
Carl Boettiger

Background. The number of individuals of each species in a given location forms the basis for many sub-fields of ecology and evolution. Data on individuals, including which species, and where they're found can be used for a large number of research questions. Global Biodiversity Information Facility (hereafter, GBIF) is the largest of these. Programmatic clients for GBIF would make research dealing with GBIF data much easier and more reproducible. Methods. We have developed clients to access GBIF data for each of the R, Python, and Ruby programming languages: rgbif, pygbif, gbifrb. Results. For all clients we describe their design and utility, and demonstrate some use cases. Discussion. Programmatic access to GBIF will facilitate more open and reproducible science - the three GBIF clients described herein are a significant contribution towards this goal.


Author(s):  
Gerald Guala

Biodiversity Information Serving Our Nation (BISON - bison.usgs.gov) is the US Node application for the Global Biodiversity Information Facility (GBIF) and the most comprehensive source of species occurrence data for the United States of America. It currently contains more than 460 million records and provides significant augmentation and integration of US occurrence data in terrestrial, marine and freshwater systems. Publicly released in 2013, BISON has generated a large community of stakeholders and they have passed on a lot of questions over the years through email ([email protected]), presentations and other means. In this presentation, some of the most common questions will be addressed in detail. For example: why all BISON data isn't in GBIF; how is BISON different from GBIF; what is the relationship between BISON and other US providers to GBIF; and what is the exact role of the Integrated Taxonomic Information System (ITIS - www.itis.gov) in BISON.


Author(s):  
Stanley Blum ◽  
Katharine Barker ◽  
Steven J Baskauf ◽  
Walter G. Berendsohn ◽  
Pier Luigi Buttigieg ◽  
...  

For the last 15 years, Biodiversity Information Standards (TDWG) has recognized two competing standards for organism occurrence data, ABCD (Access to Biological Collections Data; Holetschek et al. 2012) and DarwinCore (Wieczorek et al. 2012). These two representations emerged from contrasting strategies for mobilizing information about organism occurrences (also commonly called species occurrence data). ABCD was capable of representing details of more kinds of information, but was necessarily more complicated. DarwinCore, on the other hand, was simpler but more limited in its ability to represent data of different kinds and formats. TDWG endorsed both standards because the different projects and communities that generated them remained dedicated to their different strategies and tool sets, and the Global Biodiversity Information Facility (GBIF) developed the ability to integrate data published in either standard. Since their inceptions, DarwinCore and ABCD have become more similar. DarwinCore has gotten more complicated through the addition of terms and has begun to assign terms to classes. ABCD is now expressed in RDF (Resource Description Framework), potentially enabling re-use of terms with alternative structures among classes. At the same time, methodologies for conceptual modeling and representing complex scientific data have continued to evolve. In particular, a suite of modeling and data representation methods related to linked data and the semantic web, i.e., RDF, SKOS (Simple Knowledge Organization System), and OWL (web Ontology Language), promise to make it easier for us to reconcile shared concepts among different representations or schemas. A mapping between ABCD 2.1 and DarwinCore has existed since before 2005.*1 ABCD 3.0 and DarwinCore are both now represented in RDF. In addition, the BioCollections Ontology (BCO) covers many of the shared concepts and is derived from the Basic Formal Ontology (BFO), an upper level ontology that has oriented many other biomedical ontologies. Reconciling ABCD and DarwinCore through alignment with BCO (in the OBO Foundry; Smith et al. 2007) would better connect TDWG standards to other domains in biology. We appreciate that many working scientists and data managers perceive ontologies as overly complicated. To mitigate the steep learning curve associated with ontologies, we expect to create simpler application profiles or schemas to guide and serve narrower communities of practice within the wider biodiversity domain. We also plan to integrate the current work of the Taxonomic Names and Concepts Interest Group and thereby eliminate the redundancy between DarwinCore and Taxonomic Concepts Transfer Schema (TCS; Kennedy et al. 2006). At the time of this writing, we have only agreements from the authors (i.e., conveners of relevant TDWG Interest Groups and other key stakeholders) to collaborate in pursuit of these common goals. In this presentation we will give a more detailed description of our objectives and products, the methods we are using to achieve them, and our progress to date.


Author(s):  
Yi-Ming Gan ◽  
Maxime Sweetlove ◽  
Anton Van de Putte

The Antarctic Biodiversity portal (biodiversity.aq) is a gateway to a wide variety of Antarctic biodiversity information and tools. Launched in 2005 as the Scientific Committee on Antarctic Research (SCAR) - Marine Biodiversity Information Network (SCAR-MarBIN, scarmarbin.be) and the Register of Antarctic Marine Species (RAMS, marinespecies.org/rams/), the system has grown in scope from purely marine to include terrestrial information. Biodiversity.aq is a SCAR product, currently supported by Belspo (Belgian Science Policy) as one of the Belgian contributions to the European Lifewatch-European Research Infrastructure Consortium (Lifewatch-ERIC). The goal of Lifewatch is to provide access to: distributed observatories/sensor networks; interoperable databases, existing (data-)networks, using accepted standards; high performance computing (HPC) and grid power, including the use of the state-of-the-art of cloud and big data paradigm technologies; software and tools for visualization, analysis and modeling. Here we provide an overview of the most recent advances in the biodiversity.aq online ecosystem, a number of use cases as well as an overview of future directions. Some of the most notable components are: The Register of Antarctic Species (RAS, ras.biodiversity.aq) is a component of the Lifewatch Taxonomic Backbone and provides an authoritative and comprehensive list of names of marine and terrestrial species in Antarctica and the Southern Ocean. It serves as a reference guide for users to interpret taxonomic literature, as valid names and other names in use are both provided. Integrated Publishing Toolkit (IPT, ipt.biodiversity.aq) allows disseminating Antarctic biodiversity data into global initiatives such as the Ocean Biogeographic Information System (OBIS, obis.org) as Antarctic node of OBIS (Ant-OBIS, also formerly known as SCAR-MarBIN) and the Global Biodiversity Information Facility (GBIF, gbif.org) as Antarctic Biodiversity Information Facility (AntaBIF). Data that can be made available include metadata, species checklists, species occurrence data and more recently, sampling event-based data. Data from these international portals can be accessed through data.biodiversity.aq. The Register of Antarctic Species (RAS, ras.biodiversity.aq) is a component of the Lifewatch Taxonomic Backbone and provides an authoritative and comprehensive list of names of marine and terrestrial species in Antarctica and the Southern Ocean. It serves as a reference guide for users to interpret taxonomic literature, as valid names and other names in use are both provided. Integrated Publishing Toolkit (IPT, ipt.biodiversity.aq) allows disseminating Antarctic biodiversity data into global initiatives such as the Ocean Biogeographic Information System (OBIS, obis.org) as Antarctic node of OBIS (Ant-OBIS, also formerly known as SCAR-MarBIN) and the Global Biodiversity Information Facility (GBIF, gbif.org) as Antarctic Biodiversity Information Facility (AntaBIF). Data that can be made available include metadata, species checklists, species occurrence data and more recently, sampling event-based data. Data from these international portals can be accessed through data.biodiversity.aq. Through SCAR, Biodiversity.aq builds on an international network of expert that provide expert knowledge on taxonomy, species distribution,and ecology. It provides a strong and tested platform for sharing, integrating, discovering and analysing Antarctic biodiversity information originating from a variety of sources into a distributed system.


Author(s):  
Takeru Nakazato

The museomics activity regards museum-preserved specimens as rich resources for DNA studies by extracting and analyzing DNA from these specimens in conjunction with their biodiversity information. Also in biodiversity field, DNA sequence data such as DNA barcoding has become essential as evidence for species identification and phylogenetic analysis as well as occurrence and morphological information. To accelerate biodiversity informatics, it is important to utilize both biodiversity occurrence and morphology data, and bioinformatics sequencing data. There are many databases for biodiversity domain such as GBIF (The Global Biodiversity Information Facility) for species occurrence records, EoL (The Encyclopedia of Life) as a knowledge base of all species, and BOLD (The Barcode of Life Data) for DNA barcoding data. In genomics science, molecular data involving DNA and protein sequences have been captured by the DNA Data Bank in Japan (DDBJ), the European Bioinformatics Institute (EBI, UK), and the National Center for Biotechnology Information (NCBI, US) under the International Nucleotide Sequence Database Collaboration (INSDC) for more than 30 years. Recently, NCBI launched a new database called BioCollections, including 7,930 culture collections, museums, herbaria, and other natural history collections. In addition, we can submit biodiversity information such as specimen voucher IDs, BOLD IDs, and latitude/longitude with DNA sequences. To find out the current situation, I downloaded GenBank (Nucleotide) files (updated at 22 Feb 2019) from the NCBI FTP (file transfer protocol) site and extracted biodiversity features including specimen voucher IDs and BOLD IDs. For Insecta, there are 2,427,343 sequence entries with specimen voucher ID and 1,766,142 entries with BOLD ID of 3,389,495 total entries. The most abundant species with voucher IDs is “Cecidomyiidae sp. BOLD−2016” (Diptera) (35,861 sequence entries). The most frequently referred voucher ID is “USNMENT00921257” (1510 sequence entries), indicating Stenamma megamanni (Hymenoptera, Formicidae, Myrmicinae). For flowering plants (Magnoliophyta), of 3,094,140 total entries, 1,109,420 sequence entries are assigned with voucher IDs and 73,409 entries with BOLD IDs. Additionally, 79,891 matK entries and 63,821 rbcL entries are submitted with voucher IDs, without BOLD IDs. I also retrieved BOLD data for Insecta and flowering plants. The 2,368,801 GenBank entries are referred from 4,176,481 BOLD total entries for Insecta, and the 259,245 GenBank entries from 345,706 BOLD entries for flowering plants. Some DNA barcoding data exist redundantly in BOLD database because BOLD imports sequences from NCBI submitted as DNA barcoding data in BOLD. These entries have different BOLD IDs but same BIN_URL is assigned. Recently, high-throughput sequencing technology, also called next-generation sequencing technology (NGS), has made a great impact in genomic science. Biodiversity researchers became to perform not only DNA barcoding but also RNA-Seq with NGS. NGS also accelerates museomics activity. NGS data are archived to the Sequence Read Archive (SRA) database, and sample information is described in BioSample database in INSDC. To utilize NGS data for biodiversity field, we will need to integrate such databases and other biodiversity databases. We, Database Center for Life Science, tackle to integrate life science data with Semantic Web technology. We held annual meetings to integrate life science data, called BioHackathons, in which researchers from all over the world participated. We began to RDFize BioSample data, but we should import existing schemes used in the biodiversity field including Darwin Core.


Author(s):  
Diana Bowler ◽  
Nick Isaac ◽  
Aletta Bonn

Large amounts of species occurrence data are compiled by platforms such as the Global Biodiversity Information Facility (GBIF) but these data are collected by a diversity of methods and people. Statistical tools, such as occupancy-detection models, have been developed and tested as a way to analyze these heterogeneous data and extract information on species’ population trends. However, these models make many assumptions that might not always be met. More detailed metadata associated with occurrence records would help better describe the observation/detection submodel within occupancy models and improve the accuracy/precision of species’ trend estimates. Here, we present examples of occupancy-detection models applied to citizen science datasets, including dragonfly data in Germany, and typical approaches to account for variation in sampling effort and species detectability, including visit covariates, such as list length. Using results from a recent questionnaire in Germany asking citizen scientists about why and how they collect species occurrence data, we also characterize the different approaches that citizen scientists take to sample and report species observations. We use our findings to highlight examples of key metadata that are often missing (e.g., length of time spent searching, complete checklist or not) in data sharing platforms but would greatly aid modelling attempts of heterogeneous species occurrence data.


Author(s):  
Scott A Chamberlain ◽  
Carl Boettiger

Background. The number of individuals of each species in a given location forms the basis for many sub-fields of ecology and evolution. Data on individuals, including which species, and where they're found can be used for a large number of research questions. Global Biodiversity Information Facility (hereafter, GBIF) is the largest of these. Programmatic clients for GBIF would make research dealing with GBIF data much easier and more reproducible. Methods. We have developed clients to access GBIF data for each of the R, Python, and Ruby programming languages: rgbif, pygbif, gbifrb. Results. For all clients we describe their design and utility, and demonstrate some use cases. Discussion. Programmatic access to GBIF will facilitate more open and reproducible science - the three GBIF clients described herein are a significant contribution towards this goal.


Author(s):  
Yi-Ming Gan ◽  
Maxime Sweetlove ◽  
Anton Van de Putte

The Antarctic Biodiversity portal (biodiversity.aq) is a gateway to a wide variety of Antarctic biodiversity information and tools. Launched in 2015 as the Scientific Committee on Antarctic Research (SCAR) - Marine Biodiversity Information Network (SCAR-MarBIN, scarmarbin.be) and the Register of Antarctic Marine Species (RAMS, marinespecies.org/rams/), the system has grown in scope from purely marine to include terrestrial information. Biodiversity.aq is a SCAR product, currently supported by Belspo (Belgian Science Policy) as one of the Belgian contributions to the European Lifewatch-European Research Infrastructure Consortium (Lifewatch-ERIC). The goal of Lifewatch is to provide access to: distributed observatories/sensor networks; interoperable databases, existing (data-)networks, using accepted standards; high performance computing (HPC) and grid power, including the use of the state-of-the-art of cloud and big data paradigm technologies; software and tools for visualization, analysis and modeling. Here we provide an overview of the most recent advances in the biodiversity.aq online ecosystem, a number of use cases as well as an overview of future directions. Some of the most notable components are: The Register of Antarctic Species (RAS, ras.biodiversity.aq) is a component of the Lifewatch Taxonomic Backbone and provides an authoritative and comprehensive list of names of marine and terrestrial species in Antarctica and the Southern Ocean. It serves as a reference guide for users to interpret taxonomic literature, as valid names and other names in use are both provided. Integrated Publishing Toolkit (IPT, ipt.biodiversity.aq) allows disseminating Antarctic biodiversity data into global initiatives such as the Ocean Biogeographic Information System (OBIS, obis.org) as Antarctic node of OBIS (Ant-OBIS, also formerly known as SCAR-MarBIN) and the Global Biodiversity Information Facility (GBIF, gbif.org) as Antarctic Biodiversity Information Facility (AntaBIF). Data that can be made available include metadata, species checklists, species occurrence data and more recently, sampling event-based data. Data from these international portals can be accessed through data.biodiversity.aq. The Register of Antarctic Species (RAS, ras.biodiversity.aq) is a component of the Lifewatch Taxonomic Backbone and provides an authoritative and comprehensive list of names of marine and terrestrial species in Antarctica and the Southern Ocean. It serves as a reference guide for users to interpret taxonomic literature, as valid names and other names in use are both provided. Integrated Publishing Toolkit (IPT, ipt.biodiversity.aq) allows disseminating Antarctic biodiversity data into global initiatives such as the Ocean Biogeographic Information System (OBIS, obis.org) as Antarctic node of OBIS (Ant-OBIS, also formerly known as SCAR-MarBIN) and the Global Biodiversity Information Facility (GBIF, gbif.org) as Antarctic Biodiversity Information Facility (AntaBIF). Data that can be made available include metadata, species checklists, species occurrence data and more recently, sampling event-based data. Data from these international portals can be accessed through data.biodiversity.aq. Through SCAR, Biodiversity.aq builds on an international network of expert that provide expert knowledge on taxonomy, species distribution,and ecology. It provides a strong and tested platform for sharing, integrating, discovering and analysing Antarctic biodiversity information originating from a variety of sources into a distributed system.


Author(s):  
Michael Trizna ◽  
Torsten Dikow

Taxonomic revisions contain crucial biodiversity data in the material examined sections for each species. In entomology, material examined lists minimally include the collecting locality, date of collection, and the number of specimens of each collection event. Insect species might be represented in taxonomic revisions by only a single specimen or hundreds to thousands of specimens. Furthermore, revisions of insect genera might treat small genera with few species or include tens to hundreds of species. Summarizing data from such large and complex material examined lists and revisions is cumbersome, time-consuming, and prone to errors. However, providing data on the seasonal incidence, abundance, and collecting period of species is an important way to mobilize primary biodiversity data to understand a species’s occurrence or rarity. Here, we present SpOccSum (Species Occurrence Summary)—a tool to easily obtain metrics of seasonal incidence from specimen occurrence data in taxonomic revisions. SpOccSum is written in Python (Python Software Foundation 2019) and accessible through the Anaconda Python/R Data Science Platform as a Jupyter Notebook (Kluyver et al. 2016). The tool takes a simple list of specimen data containing species name, locality, date of collection (preferably separated by day, month, and year), and number of specimens in CSV format and generates a series of tables and graphs summarizing: number of specimens per species, number of specimens collected per month, number of unique collection events, as well as earliest, and most recent collecting year of each species. number of specimens per species, number of specimens collected per month, number of unique collection events, as well as earliest, and most recent collecting year of each species. The results can be exported as graphics or as csv-formatted tables and can easily be included in manuscripts for publication. An example of an early version of the summary produced by SpOccSum can be viewed in Tables 1, 2 from Markee and Dikow (2018). To accommodate seasonality in the Northern and Southern Hemispheres, users can choose to start the data display with either January or July. When geographic coordinates are available and species have widespread distributions spanning, for example, the equator, the user can itemize particular regions such as North of Tropic of Cancer (23.5˚N), Tropic of Cancer to the Equator, Equator to Tropic of Capricorn, and South of Tropic of Capricorn (23.5˚S). Other features currently in development include the ability to produce distribution maps from the provided data (when geographic coordinates are included) and the option to export specimen occurrence data as a Darwin-Core Archive ready for upload to the Global Biodiversity Information Facility (GBIF).


2019 ◽  
Vol 7 ◽  
Author(s):  
Vaughn Shirey ◽  
Sini Seppälä ◽  
Vasco Branco ◽  
Pedro Cardoso

Conservation assessments of hyperdiverse groups of organisms are often challenging and limited by the availability of occurrence data needed to calculate assessment metrics such as extent of occurrence (EOO). Spiders represent one such diverse group and have historically been assessed using primary literature with retrospective georeferencing. Here we demonstrate the differences in estimations of EOO and hypothetical IUCN Red List classifications for two extensive spider datasets comprising 479 species in total. The EOO were estimated and compared using literature-based assessments, Global Biodiversity Information Facility (GBIF)-based assessments and combined data assessments. We found that although few changes to hypothetical IUCN Red List classifications occurred with the addition of GBIF data, some species (3.3%) which could previously not be classified could now be assessed with the addition of GBIF data. In addition, the hypothetical classification changed for others (1.5%). On the other hand, GBIF data alone did not provide enough data for 88.7% of species. These results demonstrate the potential of GBIF data to serve as an additional source of information for conservation assessments, complementing literature data, but not particularly useful on its own as it stands right now for spiders.


Sign in / Sign up

Export Citation Format

Share Document