Data Leakage and Loss in Biodiversity Informatics

The field of biodiversity informatics is in a massive, “grow-out” phase of creating and enabling large-scale biodiversity data resources. Because perhaps 90% of existing biodiversity data nonetheless remains unavailable for science and policy applications, the question arises as to how these existing and available data records can be mobilized most efficiently and effectively. This situation led to our analysis of several large-scale biodiversity datasets regarding birds and plants, detecting information gaps and documenting data “leakage” or attrition, in terms of data on taxon, time, and place, in each data record. We documented significant data leakage in each data dimension in each dataset. That is, significant numbers of data records are lacking crucial information in terms of taxon, time, and/or place; information on place was consistently the least complete, such that geographic referencing presently represents the most significant factor in degradation of usability of information from biodiversity information resources. Although the full process of digital capture, quality control, and enrichment is important to developing a complete digital record of existing biodiversity information, payoffs in terms of immediate data usability will be greatest with attention paid to the georeferencing challenge.

Download Full-text

African Biodiversity Challenge: Integrating Freshwater Biodiversity Information to Guide Informed Decision-Making in Rwanda

Biodiversity Information Science and Standards ◽

10.3897/biss.2.26367 ◽

2018 ◽

Vol 2 ◽

pp. e26367

Author(s):

Yvette Umurungi ◽

Samuel Kanyamibwa ◽

Faustin Gashakamba ◽

Beth Kaplin

Keyword(s):

Decision Making ◽

Natural Resources ◽

Economic Transformation ◽

Freshwater Ecosystems ◽

Data Availability ◽

Biodiversity Informatics ◽

Freshwater Biodiversity ◽

Biodiversity Data ◽

Albertine Rift ◽

Biodiversity Information

Freshwater biodiversity is critically understudied in Rwanda, and to date there has not been an efficient mechanism to integrate freshwater biodiversity information or make it accessible to decision-makers, researchers, private sector or communities, where it is needed for planning, management and the implementation of the National Biodiversity Strategy and Action Plan (NBSAP). A framework to capture and distribute freshwater biodiversity data is crucial to understanding how economic transformation and environmental change is affecting freshwater biodiversity and resulting ecosystem services. To optimize conservation efforts for freshwater ecosystems, detailed information is needed regarding current and historical species distributions and abundances across the landscape. From these data, specific conservation concerns can be identified, analyzed and prioritized. The purpose of this project is to establish and implement a long-term strategy for freshwater biodiversity data mobilization, sharing, processing and reporting in Rwanda. The expected outcome of the project is to support the mandates of the Rwanda Environment Management Authority (REMA), the national agency in charge of environmental monitoring and the implementation of Rwanda’s NBSAP, and the Center of Excellence in Biodiversity and Natural Resources Management (CoEB). The project also aligns with the mission of the Albertine Rift Conservation Society (ARCOS) to enhance sustainable management of natural resources in the Albertine rift region. Specifically, organizational structure, technology platforms, and workflows for the biodiversity data capture and mobilization are enhanced to promote data availability and accessibility to improve Rwanda’s NBSAP and support other decision-making processes. The project is enhancing the capacity of technical staff from relevant government and non-government institutions in biodiversity informatics, strengthening the capacity of CoEB to achieve its mission as the Rwandan national biodiversity knowledge management center. Twelve institutions have been identified as data holders and the digitization of these data using Darwin Core standards is in progress, as well as data cleaning for the data publication through the ARCOS Biodiversity Information System (http://arbmis.arcosnetwork.org/). The release of the first national State of Freshwater Biodiversity Report is the next step. CoEB is a registered publisher to the Global Biodiversity Information Facility (GBIF) and holds an Integrated Publishing Toolkit (IPT) account on the ARCOS portal. This project was developed for the African Biodiversity Challenge, a competition coordinated by the South African National Biodiversity Institute (SANBI) and funded by the JRS Biodiversity Foundation which supports on-going efforts to enhance the biodiversity information management activities of the GBIF Africa network. This project also aligns with SANBI’s Regional Engagement Strategy, and endeavors to strengthen both emerging biodiversity informatics networks and data management capacity on the continent in support of sustainable development.

Download Full-text

Making Biodiversity Data Social, Shareable, and Scalable: Reflections on iNaturalist & citizen science

Biodiversity Information Science and Standards ◽

10.3897/biss.3.46670 ◽

2019 ◽

Vol 3 ◽

Author(s):

Carrie Seltzer

Keyword(s):

Social Interaction ◽

Citizen Science ◽

Biodiversity Informatics ◽

Strategic Decisions ◽

Biodiversity Data ◽

Helping Others ◽

Advance Research ◽

Biodiversity Information

Since 2008, iNaturalist has been crowdsourcing identifications for biodiversity observations collected by citizen scientists. Today iNaturalist has over 25 million records of wild biodiversity with photo or audio evidence, from every country, representing more than 230,000 species, collected by over 700,000 people, and with 90,000 people helping others with identifications. Hundreds of publications have used iNaturalist data to advance research, conservation, and policy. There are three key themes that iNaturalist has embraced: social interaction; shareability of data, tools, and code; and scalability of the platform and community. The keynote will share reflections on what has (and has not) worked for iNaturalist while drawing on other examples from biodiversity informatics and citizen science. Insights about user motivations, synergistic collaborations, and strategic decisions about scaling offer some transferable approaches to address the broadly applicable questions: Which species is represented? How do we make the best use of the available biodiversity information? And how do we build something viable and enduring in the process?

Download Full-text

Completeness of Digital Accessible Knowledge of the Plants of Ghana

Biodiversity Informatics ◽

10.17161/bi.v11i0.5860 ◽

2016 ◽

Vol 11 ◽

Cited By ~ 7

Author(s):

Alex Asase ◽

A. Townsend Peterson

Keyword(s):

Geographic Distance ◽

Northern Ghana ◽

Biodiversity Informatics ◽

Primary Research ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Herbarium Data ◽

Research Grade ◽

Biodiversity Information

Providing comprehensive, informative, primary, research-grade biodiversity information represents an important focus of biodiversity informatics initiatives. Recent efforts within Ghana have digitized >90% of primary biodiversity data records associated with specimen sheets in Ghanaian herbaria; additional herbarium data are available from other institutions via biodiversity informatics initiatives such as the Global Biodiversity Information Facility. However, data on the plants of Ghana have not as yet been integrated and assessed to establish how complete site inventories are, so that appropriate levels of confidence can be applied. In this study, we assessed inventory completeness and identified gaps in current Digital Accessible Knowledge (DAK) of the plants of Ghana, to prioritize areas for future surveys and inventories. We evaluated the completeness of inventories at ½° spatial resolution using statistics that summarize inventory completeness, and characterized gaps in coverage in terms of geographic distance and climatic difference from well-documented sites across the country. The southwestern and southeastern parts of the country held many well-known grid cells; the largest spatial gaps were found in central and northern parts of the country. Climatic difference showed contrasting patterns, with a dramatic gap in coverage in central-northern Ghana. This study provides a detailed case study of how to prioritize for new botanical surveys and inventories based on existing DAK.

Download Full-text

A Google Sheet Add-on for Biodiversity Data Standardization and Sharing

Biodiversity Information Science and Standards ◽

10.3897/biss.4.59228 ◽

2020 ◽

Vol 4 ◽

Author(s):

José Augusto Salim ◽

Antonio Saraiva

Keyword(s):

Information Retrieval ◽

Data Sharing ◽

Information Science ◽

Data Sets ◽

Biodiversity Informatics ◽

Biodiversity Data ◽

Data Standardization ◽

Darwin Core ◽

Rest Api ◽

Biodiversity Information

For those biologists and biodiversity data managers who are unfamiliar with information science data practices of data standardization, the use of complex software to assist in the creation of standardized datasets can be a barrier to sharing data. Since the ratification of the Darwin Core Standard (DwC) (Darwin Core Task Group 2009) by the Biodiversity Information Standards (TDWG) in 2009, many datasets have been published and shared through a variety of data portals. In the early stages of biodiversity data sharing, the protocol Distributed Generic Information Retrieval (DiGIR), progenitor of DwC, and later the protocols BioCASe and TDWG Access Protocol for Information Retrieval (TAPIR) (De Giovanni et al. 2010) were introduced for discovery, search and retrieval of distributed data, simplifying data exchange between information systems. Although these protocols are still in use, they are known to be inefficient for transferring large amounts of data (GBIF 2017). Because of that, in 2011 the Global Biodiversity Information Facility (GBIF) introduced the Darwin Core Archive (DwC-A), which allows more efficient data transfer, and has become the preferred format for publishing data in the GBIF network. DwC-A is a structured collection of text files, which makes use of the DwC terms to produce a single, self-contained dataset. Many tools for assisting data sharing using DwC-A have been introduced, such as the Integrated Publishing Toolkit (IPT) (Robertson et al. 2014), the Darwin Core Archive Assistant (GBIF 2010) and the Darwin Core Archive Validator. Despite promoting and facilitating data sharing, many users have difficulties using such tools, mainly because of the lack of training in information science in the biodiversity curriculum (Convention on Biological Diversiity 2012, Enke et al. 2012). However, most users are very familiar with spreadsheets to store and organize their data, but the adoption of the available solutions requires data transformation and training in information science and more specifically, biodiversity informatics. For an example of how spreadsheets can simplify data sharing see Stoev et al. (2016). In order to provide a more "familiar" approach to data sharing using DwC-A, we introduce a new tool as a Google Sheet Add-on. The Add-on, called Darwin Core Archive Assistant Add-on can be installed in the user's Google Account from the G Suite MarketPlace and used in conjunction with the Google Sheets application. The Add-on assists the mapping of spreadsheet columns/fields to DwC terms (Fig. 1), similar to IPT, but with the advantage that it does not require the user to export the spreadsheet and import it into another software. Additionally, the Add-on facilitates the creation of a star schema in accordance with DwC-A, by the definition of a "CORE_ID" (e.g. occurrenceID, eventID, taxonID) field between sheets of a document (Fig. 2). The Add-on also provides an Ecological Metadata Language (EML) (Jones et al. 2019) editor (Fig. 3) with minimal fields to be filled in (i.e., mandatory fields required by IPT), and helps users to generate and share DwC-Archives stored in the user's Google Drive, which can be downloaded as a DwC-A or automatically uploaded to another public storage resource like a user's Zenodo Account (Fig. 4). We expect that the Google Sheet Add-on introduced here, in conjunction with IPT, will promote biodiversity data sharing in a standardized format, as it requires minimal training and simplifies the process of data sharing from the user's perspective, mainly for those users not familiar with IPT, but that historically have worked with spreadsheets. Although the DwC-A generated by the add-on still needs to be published using IPT, it does provide a simpler interface (i.e., spreadsheet) for mapping data sets to DwC than IPT. Even though the IPT includes many more features than the Darwin Core Assistant Add-on, we expect that the Add-on can be a "starting point" for users unfamiliar with biodiversity informatics before they move on to more advanced data publishing tools. On the other hand, Zenodo integration allows users to share and cite their standardized data sets without publishing them via IPT, which can be useful for users without access to an IPT installation. Additionally, we are working on new features and future releases will include the automatic generation of Global Unique Identifiers for shared records, the possibility of adding additional data standards and DwC extensions, integration with GBIF REST API and with IPT REST API.

Download Full-text

FinBIF: An all-embracing, integrated, cross-sectoral biodiversity data infrastructure

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37253 ◽

2019 ◽

Vol 3 ◽

Cited By ~ 1

Author(s):

Leif Schulman ◽

Aino Juslén ◽

Kari Lahti

Keyword(s):

Natural History ◽

Data Management ◽

Species Identification ◽

Large Scale ◽

Dna Barcode ◽

National Research Council ◽

Observation Data ◽

Biodiversity Data ◽

Research Infrastructures ◽

Biodiversity Information

The service model of the Global Biodiversity Information Facility (GBIF) is being implemented in an increasing number of national biodiversity (BD) data services. While GBIF already shares >109 data points, national initiatives are an essential component: increase in GBIF-mediated data relies on national data mobilisation and GBIF is not optimised to support local use. The Finnish Biodiversity Information Facility (FinBIF), initiated in 2012 and operational since late 2016, is one of the more recent examples of national BD research infrastructures (RIs) – and arguably among the most comprehensive. Here, we describe FinBIF’s development and service integration, and provide a model approach for the construction of all-inclusive national BD RIs. FinBIF integrates a wide array of BD RI approaches under the same umbrella. These include large-scale and multi-technology digitisation of natural history collections; building a national DNA barcode reference library and linking it to species occurrence data; citizen science platforms enabling recording, managing and sharing of observation data; management and sharing of restricted data among authorities; community-driven species identification support; an e-learning environment for species identification; and IUCN Red Listing (Fig. 1). FinBIF’s aims are to accelerate digitisation, mobilisation, and distribution of biodiversity data and to boost their use in research and education, environmental administration, and the private sector. The core functionalities of FinBIF were built in a 3.5-year project (01/2015–06/2018) by a consortium of four university-based natural history collection facilities led by the Finnish Museum of Natural History Luomus. Close to 30% of the total funding was granted through the Finnish Research Infrastructures programme (FIRI) governed by the national research council and based on scientific excellence. Government funds for productivity enhancement in state administration covered c.40 % of the development and the rest was self-financed by the implementing consortium of organisations that have both a research and an education mission. The cross-sectoral scope of FinBIF has led to rapid uptake and a broad user base of its functionalities and services. Not only researchers but also administrative authorities, various enterprises and a large number of private citizens show a significant interest in the RI (Table 1). FinBIF is now in its second construction cycle (2019–2022), funded through the FIRI programme and, thus, focused on researcher services. The work programme includes integration of tools for data management in ecological restoration and e-Lab tools for spatial analyses, morphometric analysis of 3D images, species identification from sound recordings, and metagenomics analyses.

Download Full-text

Outlook of Biodiversity Informatics in Benin: Main achievements

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37014 ◽

2019 ◽

Vol 3 ◽

Author(s):

Jean Ganglo

Keyword(s):

Climate Change ◽

Biodiversity Conservation ◽

Ecological Niche ◽

Regional Level ◽

Global Changes ◽

Biodiversity Informatics ◽

Biodiversity Data ◽

Master Program ◽

Occurrence Data ◽

Biodiversity Information

Benin became member of the Global Biodiversity Information Facility (GBIF) in 2004 and acceded to the status of voting member in 2011. GBIF Benin through the constant efforts of its node is now very active in GBIF community with respect to capacity building, data mobilization and data uses. GBIF Benin published more than 400 000 occurrence data from about 125 datasets on GBIF portal . As for capacity building, GBIF Benin yearly organizes at least 2 (two) workshops to enhance the capacities of national and regional partners in data mobilization and data uses. At regional level, GBIF Benin is leading a consortium of many countries (Senegal, Côte-d’Ivoire, Niger, Democratic Republic of Congo, Guinea, and Madagascar etc.) to help overcome the challenges of data mobilization and data uses at regional level. From the academic year 2017-2018, GBIF Benin, through its node manager, successfully cooperated with the University of Kansas to create a master program in biodiversity informatics. Biodiversity informatics is a field of investigation relatively new in science and is concerned with massive occurrence data collection on biodiversity as well as on environment; data treatments, analysis, and representations so as to derive sound research products to inform decisions on biodiversity conservation and sustainable uses in the context of climate and global changes. In Benin, the master program in biodiversity informatics is a permanent two-year program structured in teaching units with the following contents: 1) Basics concepts of biodiversity; 2) Biodiversity data capture; 3) Biodiversity inventories; 4) Biodiversity data analysis; 5) Climate change and biodiversity; 6) Ecological niche modeling and strategies for biodiversity conservation; 7) Data-science-policy interface; 8) Public Health and Applications of biodiversity data etc. At completion of their studies, students graduated in the program will be capacitated so as to achieve the following innovative objectives: 1) Use Geographic Information System to map spatial distribution of species; 2) Model the current and the future ecological niche of species in the context of climate and global changes; 3) Characterize biodiversity on scales ranging from local to global; 4) Assess geographic patterns among suites of species (i.e., communities); 5) Refine the knowledge on particular taxonomic groups; 6) Define priority zones of biodiversity conservation; 7) Develop strategies of species conservation; 8) Implement biodiversity conservation strategies; 9) Predict the risks of propagation of infectious diseases (Lassa fever, Ebola fever etc.) which vectors are living organisms, so as to support preventive actions, etc. With such capacities, the graduated students of the master program are obviously the new generation of biodiversity information scientists who are able to address the needs of information so as to contribute to biodiversity conservation and its sustainable uses. Furthermore, in their respective countries and the rest of Africa, they will successfully contribute to the achievements of the Sustainable Development Goals as defined by the United Nations in 2015. With respect to data uses, more and more research products are piling up in Benin and are being integrated into decision makers’ arena. In 2018, the results of our data uses were integrated in the elaboration of the second communication on climate change of Benin.

Download Full-text

Data integration enables global biodiversity synthesis

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2018093118 ◽

2021 ◽

Vol 118 (6) ◽

pp. e2018093118

Author(s):

J. Mason Heberling ◽

Joseph T. Miller ◽

Daniel Noesgaard ◽

Scott B. Weingart ◽

Dmitry Schigel

Keyword(s):

Data Integration ◽

Species Interactions ◽

Large Scale ◽

Data Use ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Research Areas ◽

Global Biodiversity ◽

Biodiversity Information ◽

Global Data

The accessibility of global biodiversity information has surged in the past two decades, notably through widespread funding initiatives for museum specimen digitization and emergence of large-scale public participation in community science. Effective use of these data requires the integration of disconnected datasets, but the scientific impacts of consolidated biodiversity data networks have not yet been quantified. To determine whether data integration enables novel research, we carried out a quantitative text analysis and bibliographic synthesis of >4,000 studies published from 2003 to 2019 that use data mediated by the world’s largest biodiversity data network, the Global Biodiversity Information Facility (GBIF). Data available through GBIF increased 12-fold since 2007, a trend matched by global data use with roughly two publications using GBIF-mediated data per day in 2019. Data-use patterns were diverse by authorship, geographic extent, taxonomic group, and dataset type. Despite facilitating global authorship, legacies of colonial science remain. Studies involving species distribution modeling were most prevalent (31% of literature surveyed) but recently shifted in focus from theory to application. Topic prevalence was stable across the 17-y period for some research areas (e.g., macroecology), yet other topics proportionately declined (e.g., taxonomy) or increased (e.g., species interactions, disease). Although centered on biological subfields, GBIF-enabled research extends surprisingly across all major scientific disciplines. Biodiversity data mobilization through global data aggregation has enabled basic and applied research use at temporal, spatial, and taxonomic scales otherwise not possible, launching biodiversity sciences into a new era.

Download Full-text

Sample data and training modules for cleaning biodiversity information

Biodiversity Informatics ◽

10.17161/bi.v13i0.7600 ◽

2018 ◽

Vol 13 ◽

pp. 49-50 ◽

Cited By ~ 4

Author(s):

Marlon E Cobos ◽

Laura Jiménez ◽

Claudia Nuñez-Penichet ◽

Daniel Romero-Alvarez ◽

Marianna Simoes

Keyword(s):

Large Scale ◽

Data Cleaning ◽

Empirical Models ◽

Ecological Niches ◽

Detailed Data ◽

Sample Data ◽

Crucial Information ◽

Training Modules ◽

And Training ◽

Biodiversity Information

Large-scale biodiversity databases have become crucial information sources in many analyses in biogeography, macroecology, and conservation biology, often involving development of empirical models of species’ ecological niches and predictions of their geographic distributions. These analyses, however, can be impaired by the presence of errors, particularly as regards taxonomic identifications and accurate geographic coordinates. Here, we present a detailed data-cleaning exercise based on two contrasting datasets; we link these example data with a step-by-step guide to overcoming these problems and improving data quality for analyses based on these data.

Download Full-text

Game of Tops: Trends in GBIF’s Community of Users

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37187 ◽

2019 ◽

Vol 3 ◽

Author(s):

Nora Escribano ◽

David Galicia ◽

Arturo H. Ariño

Keyword(s):

Information Exchange ◽

Full Range ◽

Open Data ◽

Biodiversity Informatics ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Research Areas ◽

Scientific Papers ◽

Opening Up ◽

Biodiversity Information

Building on the development of Biodiversity Informatics, the Global Biodiversity Information Facility (GBIF) undertook the task of enabling access to the world’s wealth of biodiversity data via the Internet. To date, GBIF has become, in many respects, the most extensive biodiversity information exchange infrastructure in the world, opening up a full range of possibilities for science. Science has benefited from such access to biodiversity data in research areas ranging from the effects of environmental change on biodiversity to the spread of invasive species, among many others. As of this writing, more than 7,000 published items (scientific papers, reviews, conference proceedings) have been indexed in the GBIF Secretariat’s literature tracking programme. On the basis on this database, we will represent trends in GBIF in the users’ behaviour over time regarding openness, social structure, and other features associated to such scientific production: what is the measurable impact of research using GBIF data? How is the GBIF community of users growing? Is the science made with, and enabled by, open data, actually open? Mapping GBIF users’ choices will show how biodiversity research is evolving through time, synthesising past and current priorities of this community in an attempt to forecast whether summer—or winter—is coming.

Download Full-text

Biodiversity Portal of the Northern Part of West Siberia, Russia

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37067 ◽

2019 ◽

Vol 3 ◽

Author(s):

Nina Filippova ◽

Ilya Filippov ◽

Natalya Ivanova

Keyword(s):

Natural Resources ◽

West Siberia ◽

Database Management System ◽

Nature Reserves ◽

Biodiversity Informatics ◽

Biodiversity Data ◽

Biological Collections ◽

Using Data ◽

Different Sources ◽

Biodiversity Information

Biodiversity-related studies in the northern part of West Siberia are relatively recent in line with intensive industrial development of the region in recent decades. The region posesses few biological collections within the universities and nature reserves. Still, the Department of Natural Resources pays considerable attention to the sustainable use of natural resources. On the global scale, the success of biodiversity informatics goals largely depends on the local initiatives and progress in data mobilization and sharing. Therefore, organization of regional biodiversity portals is important to promote data mobilization, education and citizen science on local scale. Previous experience of biodiversity information systems in the region was low. The program on digitization of observations of Red Listed species was launched in 2010 under the support of the Department of Natural Resources of Yugra. The information system for Red Listed species registrations was developed through this project and currently includes about three thousand observations. Another example of digitization in Western Siberia was developed by the biological collection of Yugra State University. Its database is based on the database management system Specify and available online through its web portal (http://bioportal.ugrasu.ru). Some collections of nature reserves have their catalogues in digital form. The need of biodiversity data mobilization is well understood and is discussed at regular workshops on biological collections management held in Khanty-Mansiysk. Recently, the biologists curating several biological collections in the region started a project on a regional biodiversity portal development (https://nwsbios.org). The portal has three major components: the database of collections based on Specify software (http://bioportal.ugrasu.ru), the metadata of different sources of biodiversity information in the region, an educational platform for learning biodiversity informatics, using data published via GBIF and DwC standards. the database of collections based on Specify software (http://bioportal.ugrasu.ru), the metadata of different sources of biodiversity information in the region, an educational platform for learning biodiversity informatics, using data published via GBIF and DwC standards. This initiative of biodiversity data mobilization in the region includes the organization of workshops, discussions and newsletters helping to reach potential data holders and coordinate work. Through this work four different organizations from Khanty-Mansi region have registered accounts in GBIF since 2019 and started uploading data to the GBIF portal. At present there are about 25,000 observations mobilized in GBIF from the Khanty-Mansi and Yamalo-Nenets regions. The integrated massive publishing of data in the portal will provide new opportunities for biodiversity research and sustainable management of nature resources in the northern part of West Siberia.

Download Full-text