scholarly journals No one-size-fits-all solution to clean GBIF

PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e9916
Author(s):  
Alexander Zizka ◽  
Fernanda Antunes Carvalho ◽  
Alice Calvente ◽  
Mabel Rocio Baez-Lizarazo ◽  
Andressa Cabral ◽  
...  

Species occurrence records provide the basis for many biodiversity studies. They derive from georeferenced specimens deposited in natural history collections and visual observations, such as those obtained through various mobile applications. Given the rapid increase in availability of such data, the control of quality and accuracy constitutes a particular concern. Automatic filtering is a scalable and reproducible means to identify potentially problematic records and tailor datasets from public databases such as the Global Biodiversity Information Facility (GBIF; http://www.gbif.org), for biodiversity analyses. However, it is unclear how much data may be lost by filtering, whether the same filters should be applied across all taxonomic groups, and what the effect of filtering is on common downstream analyses. Here, we evaluate the effect of 13 recently proposed filters on the inference of species richness patterns and automated conservation assessments for 18 Neotropical taxa, including terrestrial and marine animals, fungi, and plants downloaded from GBIF. We find that a total of 44.3% of the records are potentially problematic, with large variation across taxonomic groups (25–90%). A small fraction of records was identified as erroneous in the strict sense (4.2%), and a much larger proportion as unfit for most downstream analyses (41.7%). Filters of duplicated information, collection year, and basis of record, as well as coordinates in urban areas, or for terrestrial taxa in the sea or marine taxa on land, have the greatest effect. Automated filtering can help in identifying problematic records, but requires customization of which tests and thresholds should be applied to the taxonomic group and geographic area under focus. Our results stress the importance of thorough recording and exploration of the meta-data associated with species records for biodiversity research.

Author(s):  
Alexander Zizka ◽  
Fernanda Antunes Carvalho ◽  
Alice Calvente ◽  
Mabel Rocio Baez-Lizarazo ◽  
Andressa Cabral ◽  
...  

ABSTRACTSpecies occurrence records provide the basis for many biodiversity studies. They derive from georeferenced specimens deposited in natural history collections and visual observations, such as those obtained through various mobile applications. Given the rapid increase in availability of such data, the control of quality and accuracy constitutes a particular concern. Automatic filtering is a scalable and reproducible means to identify potentially problematic records and tailor datasets from public databases such as the Global Biodiversity Information Facility (GBIF; www.gbif.org), for biodiversity analyses. However, it is unclear how much data may be lost by filtering, whether the same filters should be applied across all taxonomic groups, and what the effect of filtering is on common downstream analyses. Here, we evaluate the effect of 13 recently proposed filters on the inference of species richness patterns and automated conservation assessments for 18 Neotropical taxa, including terrestrial and marine animals, fungi, and plants downloaded from GBIF. We find that a total of 44.3% of the records are potentially problematic, with large variation across taxonomic groups (25 - 90%). A small fraction of records was identified as erroneous in the strict sense (4.2%), and a much larger proportion as unfit for most downstream analyses (41.7%). Filters of duplicated information, collection year, and basis of record, as well as coordinates in urban areas, or for terrestrial taxa in the sea or marine taxa on land, have the greatest effect. Automated filtering can help in identifying problematic records, but requires customization of which tests and thresholds should be applied to the taxonomic group and geographic area under focus. Our results stress the importance of thorough recording and exploration of the meta-data associated with species records for biodiversity research.


2021 ◽  
Author(s):  
Manuela Mejía Estrada ◽  
Luz Fernanda Jiménez-Segura ◽  
Iván Soto Calderón

The Barcoding was proposed motivated by the mismatch between the low number of taxonomists that contrasts with the large number of species, the method requires the construction of reference collections of DNA sequences that represent existing biodiversity. Freshwater fishes are key indicators for understanding biogeography around the world. Colombia with 1610 species of freshwater fishes is the second richest country in the world in this group. However, genetic information of the species continues to be limited, the contribution to a reference library of DNA barcodes for Colombian freshwater fishes highlights the importance of biological collections and seeks to strengthen inventories and taxonomy of such collections in future studies. This dataset contributes to the knowledge on the DNA barcodes and occurrence records of 96 species of Freshwater fishes from Colombia. The species represented in this dataset correspond to an addition to BOLD public databases of 39 species. Forty-nine specimens were collected in Atrato bassin and 708 in Magdalena-Cauca bassin during the period of 2010 to 2020, two species (Loricariichthys brunneus and Poecilia sphenops) are considered exotic to the Atrato, Cauca and Magdalena basins and four species (Oncorhynchu mykiss, Oreochromis niloticus, Parachromis friedrichsthalii and Xiphophorus helleri) are exotic to Colombian hydrogeographic regions. All specimens are deposited in the CIUA collection at University of Antioquia and have their DNA barcodes made publicly available in the Barcode of Life Data System (BOLD) online database and the distribution dataset can be freely accessed through the Global Biodiversity Information Facility (GBIF).


Author(s):  
Jeremy Miller ◽  
Yanell Braumuller ◽  
Puneet Kishor ◽  
David Shorthouse ◽  
Mariya Dimitrova ◽  
...  

A vast amount of biodiversity data is reported in the primary taxonomic literature. In the past, we have demonstrated the use of semantic enhancement to extract data from taxonomic literature and make it available to a network of databases (Miller et al. 2015). For technical reasons, semantic enhancement of taxonomic literature is most efficient when customized according to the format of a particular journal. This journal-based approach captures and disseminates data on whatever taxa happen to be published therein. But if we want to extract all treatments on a particular taxon of interest, these are likely to be spread across multiple journals. Fortunately, the GoldenGATE Imagine document editor (Sautter 2019) is flexible enough to parse most taxonomic literature. Tyrannosaurus rex is an iconic dinosaur with broad public appeal, as well as the subject of more than a century of scholarship. The Naturalis Biodiversity Center recently acquired a specimen that has become a major attraction in the public exhibit space. For most species on earth, the primary taxonomic literature contains nearly everything that is known about it. Every described species on earth is the subject of one or more taxonomic treatments. A taxon-based approach to semantic enhancement can mobilize all this knowledge using the network of databases and resources that comprise the modern biodiversity informatics infrastructure. When a particular species is of special interest, a taxon-based approach to semantic enhancement can be a powerful tool for scholarship and communication. In light of this, we resolved to semantically enhance all taxonomic treatments on T. rex. Our objective was to make these treatments and associated data available for the broad range of stakeholders who might have an interest in this animal, including professional paleontologists, the curious public, and museum exhibits and public communications personnel. Among the routine parsing and data sharing activities in the Plazi workflow (Agosti and Egloff 2009), taxonomic treatments, as well as cited figures, are deposited in the Biodiversity Literature Repository (BLR), and occurrence records are shared with the Global Biodiversity Information Facility (GBIF). Treatment citations were enhanced with hyperlinks to the cited treatment on TreatmentBank, and specimen citations were linked to their entries on public facing collections databases. We used the OpenBiodiv biodiversity knowledge graph (Senderov et al. 2017) to discover other taxa mentioned together with T. rex, and to create a timeline of T. rex research to evaluate the impact of individual researchers and specimen repositories to T. rex research. We contributed treatment links to WikiData, and queried WikiData to discover identifiers to different platforms holding data about T. rex. We used bloodhound-tracker.net to disambiguate human agents, like collectors, identifiers, and authors. We evaluate the adequacy of the fields currently available to extract data from taxonomic treatments, and make recommendations for future standards.


ZooKeys ◽  
2018 ◽  
Vol 751 ◽  
pp. 129-146 ◽  
Author(s):  
Robert Mesibov

A total of ca 800,000 occurrence records from the Australian Museum (AM), Museums Victoria (MV) and the New Zealand Arthropod Collection (NZAC) were audited for changes in selected Darwin Core fields after processing by the Atlas of Living Australia (ALA; for AM and MV records) and the Global Biodiversity Information Facility (GBIF; for AM, MV and NZAC records). Formal taxon names in the genus- and species-groups were changed in 13–21% of AM and MV records, depending on dataset and aggregator. There was little agreement between the two aggregators on processed names, with names changed in two to three times as many records by one aggregator alone compared to records with names changed by both aggregators. The type status of specimen records did not change with name changes, resulting in confusion as to the name with which a type was associated. Data losses of up to 100% were found after processing in some fields, apparently due to programming errors. The taxonomic usefulness of occurrence records could be improved if aggregators included both original and the processed taxonomic data items for each record. It is recommended that end-users check original and processed records for data loss and name replacements after processing by aggregators.


Author(s):  
Javier Molina ◽  
Peggy Newman ◽  
David Martin ◽  
Vicente Ruiz Jurado

The Global Biodiversity Information Facility (GBIF) and the Atlas of Living Australia (ALA) are two leading infrastructures serving the biodiversity community. In 2020, the ALA’s occurrence records management systems reached end of life after more than 10 years of operation, and the ALA embarked on a project to replace them. Significant overlap exists in the function of the ALA and GBIF data ingestion pipeline systems. Instead of the ALA developing new systems from scratch, we initiated a project to better align the two infrastructures. The collaboration brings benefits such as the improved reuse of modules and an overall reduction in development and operation costs. The ALA recently replaced its occurrence ingestion system with GBIF pipelines infrastructure and shared code. This is the first milestone of the broader ALA’s Core Infrastructure Project and some of the benefits from it are a more reliable, performant and scalable system, proven by the ability to ingest more and larger datasets while at the same time reducing infrastructure operational costs by more than 40% compared to the previous system. The new system is a key building block for an improved ingestion framework that is being developed within the ALA. The collaboration between the ALA and GBIF development teams will result in more consistent outputs from their respective processing pipelines. It will also allow the broader collective expertise of both infrastructure communities to inform future development and direction. The ALA’s adoption of GBIF pipelines will pave the way for the Living Atlases community to adopt GBIF systems and also contribute to them. In this talk we will introduce the project, share insights on how both the teams from the GBIF and the ALA worked together and finally we will delve into details about the technical implementation and benefits.


2018 ◽  
Vol 2 ◽  
pp. e25738 ◽  
Author(s):  
Arturo Ariño ◽  
Daniel Noesgaard ◽  
Angel Hjarding ◽  
Dmitry Schigel

Standards set up by Biodiversity Information Standards-Taxonomic Databases Working Group (TDWG), initially developed as a way to share taxonomical data, greatly facilitated the establishment of the Global Biodiversity Information Facility (GBIF) as the largest index to digitally-accessible primary biodiversity information records (PBR) held by many institutions around the world. The level of detail and coverage of the body of standards that later became the Darwin Core terms enabled increasingly precise retrieval of relevant records useful for increased digitally-accessible knowledge (DAK) which, in turn, may have helped to solve ecologically-relevant questions. After more than a decade of data accrual and release, an increasing number of papers and reports are citing GBIF either as a source of data or as a pointer to the original datasets. GBIF has curated a list of over 5,000 citations that were examined for contents, and to which tags were applied describing such contents as additional keywords. The list now provides a window on what users want to accomplish using such DAK. We performed a preliminary word frequency analysis of this literature, starting at titles, which refers to GBIF as a resource. Through a standardization and mapping of terms, we examined how the facility-enabled data seem to have been used by scientists and other practitioners through time: what concepts/issues are pervasive, which taxon groups are mostly addressed, and whether data concentrate around specific geographical or biogeographical regions. We hoped to cast light on which types of ecological problems the community believes are amenable to study through the judicious use of this data commons and found that, indeed, a few themes were distinctly more frequently mentioned than others. Among those, generally-perceived issues such as climate change and its effect on biodiversity at global and regional scales seemed prevalent. The taxonomic groups were also unevenly mentioned, with birds and plants being the most frequently named. However, the entire list of potential subjects that might have used GBIF-enabled data is now quite wide, showing that the availability of well-structured data has spawned a widening spectrum of possible use cases. Among them, some enjoy early and continuous presence (e.g. species, biodiversity, climate) while others have started to show up only later, once a critical mass of data seemed to have been attained (e.g. ecosystems, suitability, endemism). Biodiversity information in the form of standards-compliant DAK may thus already have become a commodity enabling insight into an increasingly more complex and diverse body of science. Paraphrasing Tennyson, more things were wrought by data than TDWG dreamt of.


Author(s):  
Azra Velagić-Hajrudinović

Featuring a large variety of ecosystems, abundant freshwater and forest resources, unique extensive karstic systems, and a high level of biodiversity and endemism, Southeast Europe (SEE) plays a crucial role in the conservation of biodiversity in Europe and beyond. In order to conserve and sustainably use these biodiversity assets and valuable natural resources, a regional concerted approach in the field of biodiversity information management and reporting (BIMR) has been strengthened. This has enabled improvement in access, transparency and exchange of biodiversity data and reporting processes among the participating economies. Certain significant and visible progress among SEE economies and stakeholders is due to to the knowledge gained about regional and national BIMR baselines, agreed and elaborated minimum Convention on Biological Diversity (CBD) and European Union (EU) requirements on BIMR among stakeholders and implemented BIMR tools (e.g., a regionally unified fundamental database for the Information System for Nature Conservation (ISNC), for instance in Montenegro (http://zasticenapodrucja-cg.tk//en), Bosnia and Herzegovina/entity of Republika Srpska (http://e-priroda.rs.ba/en/) and entity of Federation of Bosnia and Herzegovina and North Macedonia (Standard Data Form - SDF application for NATURA 2000) and compiled dataset on five taxonomic groups of endemic taxa using the Darwin Core standard). Therefore, BIMR activities/priorities from the region have become more evident and supported along with ownership of BIMR tools acquired by the partner institutions and recognized at the global level through the Global Biodiversity Information Facility (GBIF).


Author(s):  
Fabien Cavière ◽  
Anne-Sophie Archambeau ◽  
Raoufou Radji ◽  
Christian Ahadji ◽  
Sophie Pamerlon

GBIF Togo, hosted at the University of Lomé, has published more than 62,200 occurrence records from 37 datasets and checklists. As a node participant of Global Biodiversity Information Facility (GBIF) since 2011, it has participated actively in several projects including the Biodiversity Information for Development (BID) programme. GBIF facilitates collaboration between nodes at different levels through its Capacity Enhancement Support Programme (CESP). One of the actions included in the CESP guidelines is called ‘Mentoring activities’. Its main goal is the transfer of knowledge between partners, such as information, technologies, experience, and best practices. Sharing architecture and development is the key solution to solving some the technical challenges and impediments (e.g. hosting, staff turnover, etc.) that GBIF nodes occasionally face. The Atlas of Living Australia (ALA) team have developed a feature called ‘data hub’, which allows the creation of a standalone website with a dedicated occurrence search engine that supports data discovery (e.g. specific genus, geographic area) published by particular GBIF nodes. In 2017, a CESP project between the GBIF Benin and the GBIF France led to the creation of a new portal: Atlas of Living Beninises. This portal shared the same back-end database as the Atlas of Living France portal, while at the same time, each portal displayed and managed information relevant only to its region. In 2018, another CESP project between GBIF France and GBIF Togo shared the same goal as the previous one: implement a new Atlas of Living Australia portal for Togo. This goal will be fulfilled using a similar implementation as the previous project: a shared back-end and different front-end. Togo will be the second African GBIF node to implement this kind of infrastructure. This poster will highlight the architecture specific to the Atlas of Living Togo, and present the management procedure that distinguishes data coming from the three different countries.


2018 ◽  
Vol 2 ◽  
pp. e25488
Author(s):  
Anne-Sophie Archambeau ◽  
Fabien Cavière ◽  
Kourouma Koura ◽  
Marie-Elise Lecoq ◽  
Sophie Pamerlon ◽  
...  

Atlas of Living Australia (ALA) (https://www.ala.org.au/) is the Global Biodiversity Information Facility (GBIF) node of Australia. They developed an open and free platform for sharing and exploring biodiversity data. All the modules are publicly available for reuse and customization on their GitHub account (https://github.com/AtlasOfLivingAustralia). GBIF Benin, hosted at the University of Abomey-Calavi, has published more than 338 000 occurrence records from 87 datasets and 2 checklists. Through the GBIF Capacity Enhancement Support Programme (https://www.gbif.org/programme/82219/capacity-enhancement-support-programme), GBIF Benin, with the help of GBIF France, is in the process of deploying the Beninese data portal using the GBIF France back-end architecture. GBIF Benin is the first African country to implement this module of the ALA infrastructure. In this presentation, we will show you an overview of the registry and the occurrence search engine using the Beninese data portal. We will begin with the administration interface and how to manage metadata, then we will continue with the user interface of the registry and how you can find Beninese occurrences through the hub.


2019 ◽  
Author(s):  
Eduardo E. Zattara ◽  
Marcelo A. Aizen

AbstractWild and managed bees are key pollinators, providing ecosystem services to a large fraction of the world’s flowering plants, including ∼85% of all cultivated crops. Recent reports of wild bee decline and its potential consequences are thus worrisome. However, evidence is mostly based on local or regional studies; global status of bee decline has not been assessed yet. To fill this gap, we analyzed publicly available worldwide occurrence records from the Global Biodiversity Information Facility spanning more than a century of specimen collection. We found that after the 1980’s the number of collected bee species declines steeply, and approximately 25% fewer species were reported between 2006 and 2015 relative to the number of species counted before the 1990’s. These trends are alarming and encourage swift action to avoid further decline of these key pollinators.


Sign in / Sign up

Export Citation Format

Share Document