An audit of some processing effects in aggregated occurrence records

ZooKeys ◽

10.3897/zookeys.751.24791 ◽

2018 ◽

Vol 751 ◽

pp. 129-146 ◽

Cited By ~ 7

Author(s):

Robert Mesibov

Keyword(s):

Data Loss ◽

Global Biodiversity Information Facility ◽

Australian Museum ◽

Darwin Core ◽

Species Groups ◽

Processing Effects ◽

Global Biodiversity ◽

Name Changes ◽

Biodiversity Information ◽

Occurrence Records

A total of ca 800,000 occurrence records from the Australian Museum (AM), Museums Victoria (MV) and the New Zealand Arthropod Collection (NZAC) were audited for changes in selected Darwin Core fields after processing by the Atlas of Living Australia (ALA; for AM and MV records) and the Global Biodiversity Information Facility (GBIF; for AM, MV and NZAC records). Formal taxon names in the genus- and species-groups were changed in 13–21% of AM and MV records, depending on dataset and aggregator. There was little agreement between the two aggregators on processed names, with names changed in two to three times as many records by one aggregator alone compared to records with names changed by both aggregators. The type status of specimen records did not change with name changes, resulting in confusion as to the name with which a type was associated. Data losses of up to 100% were found after processing in some fields, apparently due to programming errors. The taxonomic usefulness of occurrence records could be improved if aggregators included both original and the processed taxonomic data items for each record. It is recommended that end-users check original and processed records for data loss and name replacements after processing by aggregators.

Download Full-text

The Living Atlases community in action: the GBIF Benin data portal

Biodiversity Information Science and Standards ◽

10.3897/biss.2.25488 ◽

2018 ◽

Vol 2 ◽

pp. e25488

Author(s):

Anne-Sophie Archambeau ◽

Fabien Cavière ◽

Kourouma Koura ◽

Marie-Elise Lecoq ◽

Sophie Pamerlon ◽

...

Keyword(s):

African Country ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Capacity Enhancement ◽

Support Programme ◽

Data Portal ◽

Global Biodiversity ◽

The University ◽

Biodiversity Information ◽

Occurrence Records

Atlas of Living Australia (ALA) (https://www.ala.org.au/) is the Global Biodiversity Information Facility (GBIF) node of Australia. They developed an open and free platform for sharing and exploring biodiversity data. All the modules are publicly available for reuse and customization on their GitHub account (https://github.com/AtlasOfLivingAustralia). GBIF Benin, hosted at the University of Abomey-Calavi, has published more than 338 000 occurrence records from 87 datasets and 2 checklists. Through the GBIF Capacity Enhancement Support Programme (https://www.gbif.org/programme/82219/capacity-enhancement-support-programme), GBIF Benin, with the help of GBIF France, is in the process of deploying the Beninese data portal using the GBIF France back-end architecture. GBIF Benin is the first African country to implement this module of the ALA infrastructure. In this presentation, we will show you an overview of the registry and the occurrence search engine using the Beninese data portal. We will begin with the administration interface and how to manage metadata, then we will continue with the user interface of the registry and how you can find Beninese occurrences through the hub.

Download Full-text

BiGe-Onto: An ontology-based system for managing biodiversity and biogeography data1

Applied Ontology ◽

10.3233/ao-200228 ◽

2020 ◽

Vol 15 (4) ◽

pp. 411-437 ◽

Cited By ~ 3

Author(s):

Marcos Zárate ◽

Germán Braun ◽

Pablo Fillottrani ◽

Claudio Delrieux ◽

Mirtha Lewis

Keyword(s):

Data Sources ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Sparql Endpoint ◽

Darwin Core ◽

Metadata Standards ◽

Great Progress ◽

Global Biodiversity ◽

Research Domains ◽

Biodiversity Information

Great progress to digitize the world’s available Biodiversity and Biogeography data have been made recently, but managing data from many different providers and research domains still remains a challenge. A review of the current landscape of metadata standards and ontologies in Biodiversity sciences suggests that existing standards, such as the Darwin Core terminology, are inadequate for describing Biodiversity data in a semantically meaningful and computationally useful way. As a contribution to fill this gap, we present an ontology-based system, called BiGe-Onto, designed to manage data together from Biodiversity and Biogeography. As data sources, we use two internationally recognized repositories: the Global Biodiversity Information Facility (GBIF) and the Ocean Biogeographic Information System (OBIS). BiGe-Onto system is composed of (i) BiGe-Onto Architecture (ii) a conceptual model called BiGe-Onto specified in OntoUML, (iii) an operational version of BiGe-Onto encoded in OWL 2, and (iv) an integrated dataset for its exploitation through a SPARQL endpoint. We will show use cases that allow researchers to answer questions that manage information from both domains.

Download Full-text

"Look what they've done to our data!" — How Aggregators Change Data Items in Collection Records

Biodiversity Information Science and Standards ◽

10.3897/biss.2.25906 ◽

2018 ◽

Vol 2 ◽

pp. e25906

Author(s):

Robert Mesibov

Keyword(s):

Data Loss ◽

Data Standards ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Specific Data ◽

Global Biodiversity ◽

Biodiversity Information ◽

Counter Data

Aggregators such as the Atlas of Living Australia (ALA) and the Global Biodiversity Information Facility (GBIF) have recently been criticised for imposing "backbone taxonomies" on records provided by museums, herbaria and other sources. Taxon names may be changed to suit the backbone, with the result that the taxon rank of the record may change and the originally provided name may no longer be searchable online through the aggregator. Aggregators may also delete data items, either by omitting entire fields or rejecting data items not conforming to aggregator-specific data standards. Modifications are more common than deletions and are particularly worrying in geospatial, date and recorder data fields. It can be difficult to locate originally provided data on aggregator websites, even for individual records, and bulk downloads from aggregators typically mask the changes made. In this presentation I document the loss and modification of biodiversity data items by aggregators and suggest strategies for museums and herbaria to counter data loss and modification.

Download Full-text

Aligning GBIF.org and the Atlas of Living Australia

Biodiversity Information Science and Standards ◽

10.3897/biss.3.35867 ◽

2019 ◽

Vol 3 ◽

Author(s):

Tim Robertson ◽

David Martin ◽

Nick dos Remedios

Keyword(s):

Data Publishing ◽

Data Handling ◽

Global Biodiversity Information Facility ◽

Occurrence Data ◽

Data Pipeline ◽

The Future ◽

Global Biodiversity ◽

Publishing Community ◽

Biodiversity Information ◽

Occurrence Records

The Global Biodiversity Information Facility (GBIF) and the Atlas of Living Australia (ALA) are two well-connected leading infrastructures serving the biodiversity community. As the national node for GBIF, the ALA serves to provide rich, localized services for the community of users in Australia and also acts as the gateway for datasets being shared internationally on GBIF.org. While these explorations target collaboration initially with Australia, we anticipate this may be of interest to other adopters of the Living Atlas platform in the future. We will give an update of the state of progress to date, along with lessons learnt and summarise a roadmap for the future. Recognising that significant overlap exists in the function of the systems, and that advancement in technology allows GBIF.org to offer more functionality, we have initiated a process of exploring better alignment of these infrastructures. Such a move is expected to bring the benefits of consistent data handling, improved citation tracking, coordinated deployment of new features across the entire data publishing community, better reuse of modules and an overall reduction in cost of development, deployment and operation. Our initial areas of exploration focuses on two specific components which are common to most biodiversity portals: a registry of datasets and the indexing of occurrence data. Use of a common registry for organisations, collections, datasets and associated metadata will reduce the effort spent in curating content, while also improving consistency by removing the need for synchronisation. In addition, a revised data pipeline for the indexing of occurrence records that powers both GBIF.org and ALA is anticipated to accommodate features such as consistent flagging of data quality issues and standardised practice for citation and tracking citations.

Download Full-text

SEINet: A Centralized Specimen Resource Managed by a Distributed Network of Researchers

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37424 ◽

2019 ◽

Vol 3 ◽

Author(s):

Edward Gilbert ◽

Corinna Gries ◽

Nico Franz ◽

Landrum Leslie R. ◽

Thomas H. Nash III

Keyword(s):

Data Distribution ◽

Data Publishing ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Diverse Range ◽

Darwin Core ◽

Global Biodiversity ◽

Specimen Management ◽

Biodiversity Information

The SEINet Portal Network has a complex social and development history spanning nearly two decades. Initially established as a basic online search engine for a select handful of biological collections curated within the southwestern United States, SEINet has since matured into a biodiversity data network incorporating more than 330 institutions and 1,900 individual data contributors. Participating institutions manage and publish over 14 million specimen records, 215,000 observations, and 8 million images. Approximately 70% of the collections make use of the data portal as their primary "live" specimen management platform. The SEINet interface now supports 13 regional data portals distributed across the United States and northern Mexico (http://symbiota.org/docs/seinet/). Through many collaborative efforts, it has matured into a tool for biodiversity data exploration, which includes species inventories, interactive identification keys, specimen and field images, taxonomic information, species distribution maps, and taxonomic descriptions. SEINet’s initial developmental goals were to construct a read-only interface that integrated specimen records harvested from a handful of distributed natural history databases. Intermittent network conductivity and inconsistent data exchange protocols frequently restricted data persistence. National funding opportunities supported a complete redesign towards the development of a centralized data cache model with periodic "snapshot" updates from original data sources. A service-based management infrastructure was integrated into the interface to mobilize small- to medium-sized collections (<1 million specimen records) that commonly lack consistent infrastructure and technical expertise to maintain a standard compliant specimen database. These developments were the precursors to the Symbiota software project (Gries et al. 2014). Through further development of Symbiota, SEINet transformed into a robust specimen management system specifically geared toward specimen digitization with features including data entry from label images, harvesting data from specimen duplicates, batch georeferencing, data validation and cleaning, generating progress reports, and additional tools to improve the efficiency of the digitization process. The central developmental paradigm focused on data mobilization through the production of: a versatile import module capable of ingesting a diverse range of data structures, a robust toolkit to assist in digitizing and managing specimen data and images, and a Darwin Core Archive (DwC-A) compliant data publishing and export toolkit to facilitate data distribution to global aggregators such as Global Biodiversity Information Facility (GBIF) and iDigBio. a versatile import module capable of ingesting a diverse range of data structures, a robust toolkit to assist in digitizing and managing specimen data and images, and a Darwin Core Archive (DwC-A) compliant data publishing and export toolkit to facilitate data distribution to global aggregators such as Global Biodiversity Information Facility (GBIF) and iDigBio. User interfaces consist of a decentralized network of regional data portals, all connecting to a centralized shared data source. Each of the 13 data portals are configured to present a regional perspective specifically tailored to represent the needs of the local research community. This infrastructure has supported the formation of regional consortia, who provide network support to aid local institutions in digitizing and publishing their collections within the network. The community-based infrastructure creates a sense of ownership – perhaps even good-natured competition – by the data providers and provides extra incentive to improve data quality and expand the network. Certain areas of development remain challenging in spite of the project's overall success. For instance, data managers continuously struggle to maintain a current local taxonomic thesaurus used for name validation, data cleaning, and to resolve taxonomic discrepancies commonly encountered when integrating collection datasets. We will discuss the successes and challenges associated with the long-term sustainability model and explore potential future paths for SEINet that support the long-term goal of maintaining a data provider that is in full compliance with the FAIR use principles of making the datasets findable, accessible, interoperable, and reusable (Wilkinson et al. 2016).

Download Full-text

The Global Biodiversity Information Facility (GBIF)

Systematics Association Special Volumes - Biodiversity Databases ◽

10.1201/9781439832547.ch1 ◽

2007 ◽

pp. 1-4 ◽

Cited By ~ 5

Author(s):

Meredith Lane ◽

James Edwards

Keyword(s):

Global Biodiversity Information Facility ◽

Global Biodiversity ◽

Biodiversity Information

Download Full-text

International Infrastructure For Enabling The New Taxonomy The Role Of The Global Biodiversity Information Facility (gbif)

The New Taxonomy - Systematics Association Special Volumes ◽

10.1201/9781420008562.ch6 ◽

2008 ◽

pp. 87-94

Author(s):

James Edwards ◽

Larry Speers

Keyword(s):

Global Biodiversity Information Facility ◽

Global Biodiversity ◽

Biodiversity Information

Download Full-text

International Infrastructure for Enabling the New Taxonomy: The Role of the Global Biodiversity Information Facility (GBIF)

The New Taxonomy ◽

10.1201/9781420008562-10 ◽

2008 ◽

pp. 99-106

Keyword(s):

Global Biodiversity Information Facility ◽

Global Biodiversity ◽

Biodiversity Information

Download Full-text

The Global Biodiversity Information Facility (GBIF) and the Japan’s activities

Journal of Information Processing and Management ◽

10.1241/johokanri.46.389 ◽

2003 ◽

Vol 46 (6) ◽

pp. 389-393

Author(s):

Shun’ichi KIKUCHI

Keyword(s):

Global Biodiversity Information Facility ◽

Global Biodiversity ◽

Biodiversity Information

Download Full-text

The InBIO Barcoding Initiative Database: DNA barcodes of Portuguese Diptera 01

Biodiversity Data Journal ◽

10.3897/bdj.8.e49985 ◽

2020 ◽

Vol 8 ◽

Cited By ~ 2

Author(s):

Sonia Ferreira ◽

Rui Andrade ◽

Ana Gonçalves ◽

Pedro Sousa ◽

Joana Paupério ◽

...

Keyword(s):

Species Level ◽

Distribution Data ◽

Dna Barcodes ◽

Online Database ◽

Global Biodiversity Information Facility ◽

Life Data ◽

Tagus River ◽

Global Biodiversity ◽

Biodiversity Information ◽

Dipteran Species

The InBIO Barcoding Initiative (IBI) Diptera 01 dataset contains records of 203 specimens of Diptera. All specimens have been morphologically identified to species level, and belong to 154 species in total. The species represented in this dataset correspond to about 10% of continental Portugal dipteran species diversity. All specimens were collected north of the Tagus river in Portugal. Sampling took place from 2014 to 2018, and specimens are deposited in the IBI collection at CIBIO, Research Center in Biodiversity and Genetic Resources. This dataset contributes to the knowledge on the DNA barcodes and distribution of 154 species of Diptera from Portugal and is the first of the planned IBI database public releases, which will make available genetic and distribution data for a series of taxa. All specimens have their DNA barcodes made publicly available in the Barcode of Life Data System (BOLD) online database and the distribution dataset can be freely accessed through the Global Biodiversity Information Facility (GBIF).

Download Full-text