scholarly journals Aligning GBIF and the Atlas of Living Australia

Author(s):  
David Martin ◽  
Javier Molina ◽  
Nick dos Remedios ◽  
Marie-Elise Lecoq ◽  
Tim Robertson ◽  
...  

The Global Biodiversity Information Facility (GBIF) and the Atlas of Living Australia (ALA) are two interconnected leading infrastructures serving the biodiversity community. Recognising that significant overlap exists in the function of the systems run by both organisations, and that advancement in technology allows GBIF to offer more functionality, we have initiated a process to align these infrastructures. Such a move is expected to bring the benefits of consistent data handling, improved bibliographic citation tracking, coordinated deployment of new features across the entire data publishing community, better reuse of modules and an overall reduction in cost of development, deployment and operation. This year, work has commenced to align these two infrastructures, focussing initially on data ingestion pipelines. The GBIF and ALA teams are collaborating closely, working on the same codebase, developing common working practices and agreeing on tools and coding standards. This focus on collaboration will lead to a defined model for the Living Atlas community to provide contributions. This work will also further the efforts to hand ownership of core ALA systems to the Living Atlas community and pave the way for the Living Atlas community to transition to the adoption of GBIF systems. Later this year, efforts will move towards use of a common registry for organisations, collections, datasets and associated metadata, which will reduce the effort spent in curating content, while also improving consistency by removing the need for synchronisation.

Author(s):  
Tim Robertson ◽  
David Martin ◽  
Nick dos Remedios

The Global Biodiversity Information Facility (GBIF) and the Atlas of Living Australia (ALA) are two well-connected leading infrastructures serving the biodiversity community. As the national node for GBIF, the ALA serves to provide rich, localized services for the community of users in Australia and also acts as the gateway for datasets being shared internationally on GBIF.org. While these explorations target collaboration initially with Australia, we anticipate this may be of interest to other adopters of the Living Atlas platform in the future. We will give an update of the state of progress to date, along with lessons learnt and summarise a roadmap for the future. Recognising that significant overlap exists in the function of the systems, and that advancement in technology allows GBIF.org to offer more functionality, we have initiated a process of exploring better alignment of these infrastructures. Such a move is expected to bring the benefits of consistent data handling, improved citation tracking, coordinated deployment of new features across the entire data publishing community, better reuse of modules and an overall reduction in cost of development, deployment and operation. Our initial areas of exploration focuses on two specific components which are common to most biodiversity portals: a registry of datasets and the indexing of occurrence data. Use of a common registry for organisations, collections, datasets and associated metadata will reduce the effort spent in curating content, while also improving consistency by removing the need for synchronisation. In addition, a revised data pipeline for the indexing of occurrence records that powers both GBIF.org and ALA is anticipated to accommodate features such as consistent flagging of data quality issues and standardised practice for citation and tracking citations.


Author(s):  
Edward Gilbert ◽  
Corinna Gries ◽  
Nico Franz ◽  
Landrum Leslie R. ◽  
Thomas H. Nash III

The SEINet Portal Network has a complex social and development history spanning nearly two decades. Initially established as a basic online search engine for a select handful of biological collections curated within the southwestern United States, SEINet has since matured into a biodiversity data network incorporating more than 330 institutions and 1,900 individual data contributors. Participating institutions manage and publish over 14 million specimen records, 215,000 observations, and 8 million images. Approximately 70% of the collections make use of the data portal as their primary "live" specimen management platform. The SEINet interface now supports 13 regional data portals distributed across the United States and northern Mexico (http://symbiota.org/docs/seinet/). Through many collaborative efforts, it has matured into a tool for biodiversity data exploration, which includes species inventories, interactive identification keys, specimen and field images, taxonomic information, species distribution maps, and taxonomic descriptions. SEINet’s initial developmental goals were to construct a read-only interface that integrated specimen records harvested from a handful of distributed natural history databases. Intermittent network conductivity and inconsistent data exchange protocols frequently restricted data persistence. National funding opportunities supported a complete redesign towards the development of a centralized data cache model with periodic "snapshot" updates from original data sources. A service-based management infrastructure was integrated into the interface to mobilize small- to medium-sized collections (<1 million specimen records) that commonly lack consistent infrastructure and technical expertise to maintain a standard compliant specimen database. These developments were the precursors to the Symbiota software project (Gries et al. 2014). Through further development of Symbiota, SEINet transformed into a robust specimen management system specifically geared toward specimen digitization with features including data entry from label images, harvesting data from specimen duplicates, batch georeferencing, data validation and cleaning, generating progress reports, and additional tools to improve the efficiency of the digitization process. The central developmental paradigm focused on data mobilization through the production of: a versatile import module capable of ingesting a diverse range of data structures, a robust toolkit to assist in digitizing and managing specimen data and images, and a Darwin Core Archive (DwC-A) compliant data publishing and export toolkit to facilitate data distribution to global aggregators such as Global Biodiversity Information Facility (GBIF) and iDigBio. a versatile import module capable of ingesting a diverse range of data structures, a robust toolkit to assist in digitizing and managing specimen data and images, and a Darwin Core Archive (DwC-A) compliant data publishing and export toolkit to facilitate data distribution to global aggregators such as Global Biodiversity Information Facility (GBIF) and iDigBio. User interfaces consist of a decentralized network of regional data portals, all connecting to a centralized shared data source. Each of the 13 data portals are configured to present a regional perspective specifically tailored to represent the needs of the local research community. This infrastructure has supported the formation of regional consortia, who provide network support to aid local institutions in digitizing and publishing their collections within the network. The community-based infrastructure creates a sense of ownership – perhaps even good-natured competition – by the data providers and provides extra incentive to improve data quality and expand the network. Certain areas of development remain challenging in spite of the project's overall success. For instance, data managers continuously struggle to maintain a current local taxonomic thesaurus used for name validation, data cleaning, and to resolve taxonomic discrepancies commonly encountered when integrating collection datasets. We will discuss the successes and challenges associated with the long-term sustainability model and explore potential future paths for SEINet that support the long-term goal of maintaining a data provider that is in full compliance with the FAIR use principles of making the datasets findable, accessible, interoperable, and reusable (Wilkinson et al. 2016).


2019 ◽  
Vol 5 ◽  
Author(s):  
Oleh Prylutskyi ◽  
Armine Abrahamyan ◽  
Nina Voronova ◽  
Tatevik Aloyan ◽  
Oleg Borodin ◽  
...  

BioDATA is an international project on developing skills in biodiversity data management and data publishing. Between 2018 and 2021, undergraduate and postgraduate students from Armenia, Belarus, Tajikistan, and Ukraine, have an opportunity to take part in the intensive courses to become certified professionals in biodiversity data management. They will gain practical skills and obtain appropriate knowledge on: international data standards (Darwin Core); data cleaning software, data publishing software such as the Integrated Publishing Toolkit (IPT), and preparation of data papers. Working with databases, creating datasets, managing data for statistical analyses and publishing research papers are essential for the everyday tasks of a modern biologist. At the same time, these skills are rarely taught in higher education. Most of the contemporary professionals in biodiversity have to gain these skills independently, through colleagues, or through supervision. In addition, all the participants familiarize themselves with one of the important international research data infrastructures such as the Global Biodiversity Information Facility (GBIF). The project is coordinated by the University of Oslo (Norway) and supported by the Global Biodiversity Information Facility (GBIF). The project is funded by the Norwegian Agency for International Cooperation and Quality Enhancement in Higher Education (DIKU).


2020 ◽  
Vol 8 ◽  
Author(s):  
Sonia Ferreira ◽  
Rui Andrade ◽  
Ana Gonçalves ◽  
Pedro Sousa ◽  
Joana Paupério ◽  
...  

The InBIO Barcoding Initiative (IBI) Diptera 01 dataset contains records of 203 specimens of Diptera. All specimens have been morphologically identified to species level, and belong to 154 species in total. The species represented in this dataset correspond to about 10% of continental Portugal dipteran species diversity. All specimens were collected north of the Tagus river in Portugal. Sampling took place from 2014 to 2018, and specimens are deposited in the IBI collection at CIBIO, Research Center in Biodiversity and Genetic Resources. This dataset contributes to the knowledge on the DNA barcodes and distribution of 154 species of Diptera from Portugal and is the first of the planned IBI database public releases, which will make available genetic and distribution data for a series of taxa. All specimens have their DNA barcodes made publicly available in the Barcode of Life Data System (BOLD) online database and the distribution dataset can be freely accessed through the Global Biodiversity Information Facility (GBIF).


Author(s):  
Amy Davis ◽  
Tim Adriaens ◽  
Rozemien De Troch ◽  
Peter Desmet ◽  
Quentin Groom ◽  
...  

To support invasive alien species risk assessments, the Tracking Invasive Alien Species (TrIAS) project has developed an automated, open, workflow incorporating state-of-the-art species distribution modelling practices to create risk maps using the open source language R. It is based on Global Biodiversity Information Facility (GBIF) data and openly published environmental data layers characterizing climate and land cover. Our workflow requires only a species name and generates an ensemble of machine-learning algorithms (Random Forest, Boosted Regression Trees, K-Nearest Neighbors and AdaBoost) stacked together as a meta-model to produce the final risk map at 1 km2 resolution (Fig. 1). Risk maps are generated automatically for standard Intergovernmental Panel on Climate Change (IPCC) greenhouse gas emission scenarios and are accompanied by maps illustrating the confidence of each individual prediction across space, thus enabling the intuitive visualization and understanding of how the confidence of the model varies across space and scenario (Fig. 2). The effects of sampling bias are accounted for by providing options to: use the sampling effort of the higher taxon the modelled species belongs to (e.g., vascular plants), and to thin species occurrences. use the sampling effort of the higher taxon the modelled species belongs to (e.g., vascular plants), and to thin species occurrences. The risk maps generated by our workflow are defensible and repeatable and provide forecasts of alien species distributions under further climate change scenarios. They can be used to support risk assessments and guide surveillance efforts on alien species in Europe. The detailied modeling framework and code are available on GitHub: https://github.com/trias-project.


2013 ◽  
Vol 64 (2) ◽  
Author(s):  
Shakina Mohd Talkah ◽  
Iylia Zulkiflee ◽  
Mohd Shahir Shamsir

Currently, all the information regarding ethnobotanical, phytochemical and pharmaceutical information of South East Asia are scattered over many different publications, depositories and databases using various digital and analogue formats. Although there are taxonomic databases of medicinal plants, they are not linked to phytochemical and pharmaceutical information which are often resides in scientific literature. We present Phyknome; an ethnobotanical and phytochemical database with more than 22,000 species of ethnoflora of Asia. The creation of this database will enable a biotechnology researcher to seek and identify ethnobotanical information based on a species’ scientific name, description and phytochemical information. It is constructed using a digitization pipeline that allow high throughput digitization of archival data, an automated dataminer to mine for pharmaceutical compounds information and an online database to integrated these information. The main functions include an automated taxonomy, bibliography and API interface with primary databases such as Global Biodiversity Information Facility (GBIF). We believe that Phyknome will contribute to the digital knowledge ecosystem to elevate access and provide tools for ethnobotanical research and contributes to the management, assessment and stewardship of biodiversity. The database is available at http://mapping.fbb.utm.my/phyknome/.


Sign in / Sign up

Export Citation Format

Share Document