scholarly journals DiSSCo, iDigBio and the Future of Global Collaboration

Author(s):  
Gil Nelson ◽  
Deborah L Paul

Integrated Digitized Biocollections (iDigBio) is the United States’ (US) national resource and coordinating center for biodiversity specimen digitization and mobilization. It was established in 2011 through the US National Science Foundation’s (NSF) Advancing Digitization of Biodiversity Collections (ADBC) program, an initiative that grew from a working group of museum-based and other biocollections professionals working in concert with NSF to make collections' specimen data accessible for science, education, and public consumption. The working group, Network Integrated Biocollections Alliance (NIBA), released two reports (Beach et al. 2010, American Institute of Biological Sciences 2013) that provided the foundation for iDigBio and ADBC. iDigBio is restricted in focus to the ingestion of data generated by public, non-federal museum and academic collections. Its focus is on specimen-based (as opposed to observational) occurrence records. iDigBio currently serves about 118 million transcribed specimen-based records and 29 million specimen-based media records from approximately 1600 datasets. These digital objects have been contributed by about 700 collections representing nearly 400 institutions and is the most comprehensive biodiversity data aggregator in the US. Currently, iDigBio, DiSSCo (Distributed System of Scientific Collections), GBIF (Global Biodiversity Information Facility), and the Atlas of Living Australia (ALA) are collaborating on a global framework to harmonize technologies towards standardizing and synchronizing ingestion strategies, data models and standards, cyberinfrastructure, APIs (application programming interface), specimen record identifiers, etc. in service to a developing consolidated global data product that can provide a common source for the world’s digital biodiversity data. The collaboration strives to harness and combine the unique strengths of its partners in ways that ensure the individual needs of each partner’s constituencies are met, design pathways for accommodating existing and emerging aggregators, simultaneously strengthen and enhance access to the world’s biodiversity data, and underscore the scope and importance of worldwide biodiversity informatics activities. Collaborators will share technology strategies and outputs, align conceptual understandings, and establish and draw from an international knowledge base. These collaborators, along with Biodiversity Information Standards (TDWG), will join iDigBio and the Smithsonian National Museum of Natural History as they host Biodiversity 2020 in Washington, DC. Biodiversity 2020 will combine an international celebration of the worldwide progress made in biodiversity data accessibility in the 21st century with a biodiversity data conference that extends the life of Biodiversity Next. It will provide a venue for the GBIF governing board meeting, TDWG annual meeting, and the annual iDigBio Summit as well as three days of plenary and concurrent sessions focused on the present and future of biodiversity data generation, mobilization, and use.

Author(s):  
Gerald Guala

Biodiversity Information Serving Our Nation (BISON - bison.usgs.gov) is the US Node application for the Global Biodiversity Information Facility (GBIF) and the most comprehensive source of species occurrence data for the United States of America. It currently contains more than 460 million records and provides significant augmentation and integration of US occurrence data in terrestrial, marine and freshwater systems. Publicly released in 2013, BISON has generated a large community of stakeholders and they have passed on a lot of questions over the years through email ([email protected]), presentations and other means. In this presentation, some of the most common questions will be addressed in detail. For example: why all BISON data isn't in GBIF; how is BISON different from GBIF; what is the relationship between BISON and other US providers to GBIF; and what is the exact role of the Integrated Taxonomic Information System (ITIS - www.itis.gov) in BISON.


2018 ◽  
Vol 2 ◽  
pp. e25642
Author(s):  
Annie Simpson

Biodiversity Information Serving our Nation - BISON (bison.usgs.gov) is the U.S. node to the Global Biodiversity Information Facility (gbif.org), containing more than 375 million documented locations for all species in the U.S. It is hosted by the United States Geological Survey (USGS) and includes a web site and application programming interface for apps and other websites to use for free. With this massive database one can see not only the 15 million records for nearly 10 thousand non-native species in the U.S. and its territories, but also their relationship to all of the other species in the country as well as their full national range. Leveraging this huge resource and its enterprise level cyberinfrastructure, USGS BISON staff have created a value-added feature by labeling non-native species records, even where contributing datasets have not provided such labels. Based on our ongoing four-year compilation of non-native species scientific names from the literature, specific examples will be shared about the ambiguity and evolution of terms that have been discovered, as they relate to invasiveness, impact, dispersal, and management. The idea of incorporating these terms into an invasive species extension to Darwin Core has been discussed by Biodiversity Information Standards (TDWG) working group participants since at least 2005. One roadblock to the implementation of this standard's extension has been the diverse terminology used to describe the characteristics of biological invasions, terminology which has evolved significantly over the past decade.


Author(s):  
Paula Zermoglio ◽  
Anabela Plos ◽  
Néstor Acosta ◽  
Leisy Amaya ◽  
Dairo Escobar ◽  
...  

Historically, some of the most successful biodiversity data sharing initiatives have been developed particularly in North America, Europe, and Australia. In parallel, and driven by necessity, tools, practices and standards were shared across othes communities. In the last decade, great efforts have been made by countries in other regions to join the biodiversity data network and share their data worldwide. Although knowledge, tools, and documentation are broadly distributed, language is the main constraint for their use, as most of it is only available in English. English may be the first most spoken language worldwide (Eberhard et al. 2020), but it is not native to most of the population, including a sizable proportion of the United States (Ryan 2013). For instance, Spanish is listed as the second most spoken native language worldwide, after Mandarin Chinese (Eberhard et al. 2020). While recognizing that English is currently considered the “universal language” for scientifically-related activities, it has been pointed out that a large proportion of biodiversity scientific knowledge is not produced in English, and that language constitutes a barrier to sharing knowledge (Amano et al. 2016). Actions to overcome this have been called for, for example by the 2nd Global Biodiversity Informatics Conference (GBIC2) in its list of ambitions for supporting international collaboration (Hobern et al. 2019), but are still largely missing in the broad community. Language affects the understanding and use of biodiversity data standards and related documentation for all the community, both English and non-English speakers. Our findings in the Latin American region suggest that the availability of materials in other languages, namely Spanish and Portuguese, would greatly benefit the region and improve our involvement in biodiversity data sharing. Also, on the other hand, the English speaking community would benefit from better understanding knowledge in other non-English languages, allowing broader use of data from all regions. This work also constitutes a plea from the Latin American and the Spanish-speaking community at large to the Biodiversity Information Standards (TDWG) to explore and incorporate other languages, hence fostering understanding, and therefore widening the use of TDWG standards in our region. We provide a list of people supporting the petition as Supplementary Material (Suppl. material 1). In the petition we also identify people (more than 60% of the signatories) who are willing to contribute to translating TDWG resources into Spanish. There is no single, best mechanism to move this initiative forward, but the approaches of some other initiatives (e.g., the Global Biodiversity Information Facility (GBIF) translators network) are being explored, weighing resources needed both from the volunteers and the management perspectives. We will present the different options for the community to evaluate and decide upon a suitable action plan.


Author(s):  
Gil Nelson ◽  
Talia Karim ◽  
Rosemary Gillespie ◽  
Jose Fortes ◽  
Douglas Jones

Over the last decade, the Integrated Digitized Biocollections (iDigBio) organization and the Advancing the Digitization of Biodiversity Collections (ADBC) grant program, both funded by the US National Science Foundation (NSF), have made large strides in the aggregation of pre-existing siloed digital collections data as well as the new digitization of previously dark collections data across the United States. The impact of iDigBio leadership in community engagement (e.g., through discipline-specific workshops and webinars) and data mobilization (e.g., aggregation assistance, portal development) is widespread and with impact across all collection types and sizes. Moreover, the funding model for the ADBC program, which required the development of digitization-based Thematic Collection Networks (TCNs), facilitated engagement and community building across collections, which previously often worked independently from one another or with a smaller group of institutions and/or collaborators. The attempt to create ever-growing biodiversity data aggregators to improve global research access to digital biodiversity data has made huge progress over the past decade and has resulted in increased availability of biodiversity data from fewer, larger data stores. It has also motivated unselfish collaboration between major aggregators in search of strategies for merging these data silos into a consolidated global data product. We describe an ongoing collaboration between the Global Biodiversity Information Facility (GBIF), The Atlas of Living Australia (ALA), Integrated Digitized Biocollections (iDigBio), and the Distributed System of Scientific Collections (DiSSCo) to establish a global framework for integrating technologies, processes, standards, Application Programming Interfaces (APIs), ingestion, data, and data services, with the goal of building a well-documented linked system that relies on the various areas of expertise of the initial partners but with definitive pathways for incorporating new and existing entities as they desire or are developed. We use the case of paleontological data as an exemplar of the potential impact of this collaboration. The iDigBio Paleontology Digitization Working Group, which was originally created by iDigBio as part of their community engagement program, has continued to be an active and engaged community of data providers and end-users, organizing numerous workshops and webinars. Currently, working group members, in collaboration with iDigBio staff and developers, are examining issues specific to paleontologic data aggregation that were identified by data providers; they are also working on a series of best-practices guidelines for sharing paleontologic data that will ideally help to reduce the number of mistakes made by downstream data aggregation manipulations. The focus of the working group is, and has been, largely community driven and supported by iDigBio through the provision of virtual meeting space for participants and by hosting the group's wiki-page of resources. Additionally, iDigBio has been proactive in working with other digitization initiatives in the paleontologic community (e.g., Paleobiology Database) on projects such as ePANDDA (enhancing Paleontological and Neontological Data Discovery API), which seeks to link existing digital resources through API development.


Author(s):  
Raul Sierra-Alcocer ◽  
Christopher Stephens ◽  
Juan Barrios ◽  
Constantino González‐Salazar ◽  
Juan Carlos Salazar Carrillo ◽  
...  

SPECIES (Stephens et al. 2019) is a tool to explore spatial correlations in biodiversity occurrence databases. The main idea behind the SPECIES project is that the geographical correlations between the distributions of taxa records have useful information. The problem, however, is that if we have thousands of species (Mexico's National System of Biodiversity Information has records of around 70,000 species) then we have millions of potential associations, and exploring them is far from easy. Our goal with SPECIES is to facilitate the discovery and application of meaningful relations hiding in our data. The main variables in SPECIES are the geographical distributions of species occurrence records. Other types of variables, like the climatic variables from WorldClim (Hijmans et al. 2005), are explanatory data that serve for modeling. The system offers two modes of analysis. In one, the user defines a target species, and a selection of species and abiotic variables; then the system computes the spatial correlations between the target species and each of the other species and abiotic variables. The request from the user can be as small as comparing one species to another, or as large as comparing one species to all the species in the database. A user may wonder, for example, which species are usual neighbors of the jaguar, this mode could help answer this question. The second mode of analysis gives a network perspective, in it, the user defines two groups of taxa (and/or environmental variables), the output in this case is a correlation network where the weight of a link between two nodes represents the spatial correlation between the variables that the nodes represent. For example, one group of taxa could be hummingbirds (Trochilidae family) and the second flowers of the Lamiaceae family. This output would help the user analyze which pairs of hummingbird and flower are highly correlated in the database. SPECIES data architecture is optimized to support fast hypotheses prototyping and testing with the analysis of thousands of biotic and abiotic variables. It has a visualization web interface that presents descriptive results to the user at different levels of detail. The methodology in SPECIES is relatively simple, it partitions the geographical space with a regular grid and treats a species occurrence distribution as a present/not present boolean variable over the cells. Given two species (or one species and one abiotic variable) it measures if the number of co-occurrences between the two is more (or less) than expected. If it is more than expected indicates a signal of a positive relation, whereas if it is less it would be evidence of disjoint distributions. SPECIES provides an open web application programming interface (API) to request the computation of correlations and statistical dependencies between variables in the database. Users can create applications that consume this 'statistical web service' or use it directly to further analyze the results in frameworks like R or Python. The project includes an interactive web application that does exactly that: requests analysis from the web service and lets the user experiment and visually explore the results. We believe this approach can be used on one side to augment the services provided from data repositories; and on the other side, facilitate the creation of specialized applications that are clients of these services. This scheme supports big-data-driven research for a wide range of backgrounds because end users do not need to have the technical know-how nor the infrastructure to handle large databases. Currently, SPECIES hosts: all records from Mexico's National Biodiversity Information System (CONABIO 2018) and a subset of Global Biodiversity Information Facility data that covers the contiguous USA (GBIF.org 2018b) and Colombia (GBIF.org 2018a). It also includes discretizations of environmental variables from WorldClim, from the Environmental Rasters for Ecological Modeling project (Title and Bemmels 2018), from CliMond (Kriticos et al. 2012), and topographic variables (USGS EROS Center 1997b, USGS EROS Center 1997a). The long term plan, however, is to incrementally include more data, specially all data from the Global Biodiversity Information Facility. The code of the project is open source, and the repositories are available online (Front-end, Web Services Application Programming Interface, Database Building scripts). This presentation is a demonstration of SPECIES' functionality and its overall design.


2016 ◽  
Vol 11 ◽  
Author(s):  
Alex Asase ◽  
A. Townsend Peterson

Providing comprehensive, informative, primary, research-grade biodiversity information represents an important focus of biodiversity informatics initiatives. Recent efforts within Ghana have digitized >90% of primary biodiversity data records associated with specimen sheets in Ghanaian herbaria; additional herbarium data are available from other institutions via biodiversity informatics initiatives such as the Global Biodiversity Information Facility. However, data on the plants of Ghana have not as yet been integrated and assessed to establish how complete site inventories are, so that appropriate levels of confidence can be applied. In this study, we assessed inventory completeness and identified gaps in current Digital Accessible Knowledge (DAK) of the plants of Ghana, to prioritize areas for future surveys and inventories. We evaluated the completeness of inventories at ½° spatial resolution using statistics that summarize inventory completeness, and characterized gaps in coverage in terms of geographic distance and climatic difference from well-documented sites across the country. The southwestern and southeastern parts of the country held many well-known grid cells; the largest spatial gaps were found in central and northern parts of the country. Climatic difference showed contrasting patterns, with a dramatic gap in coverage in central-northern Ghana. This study provides a detailed case study of how to prioritize for new botanical surveys and inventories based on existing DAK.


2018 ◽  
Vol 374 (1763) ◽  
pp. 20170391 ◽  
Author(s):  
Gil Nelson ◽  
Shari Ellis

The first two decades of the twenty-first century have seen a rapid rise in the mobilization of digital biodiversity data. This has thrust natural history museums into the forefront of biodiversity research, underscoring their central role in the modern scientific enterprise. The advent of mobilization initiatives such as the United States National Science Foundation's Advancing Digitization of Biodiversity Collections (ADBC), Australia's Atlas of Living Australia (ALA), Mexico's National Commission for the Knowledge and Use of Biodiversity (CONABIO), Brazil's Centro de Referência em Informação (CRIA) and China's National Specimen Information Infrastructure (NSII) has led to a rapid rise in data aggregators and an exponential increase in digital data for scientific research and arguably provide the best evidence of where species live. The international Global Biodiversity Information Facility (GBIF) now serves about 131 million museum specimen records, and Integrated Digitized Biocollections (iDigBio) in the USA has amassed more than 115 million. These resources expose collections to a wider audience of researchers, provide the best biodiversity data in the modern era outside of nature itself and ensure the primacy of specimen-based research. Here, we provide a brief history of worldwide data mobilization, their impact on biodiversity research, challenges for ensuring data quality, their contribution to scientific publications and evidence of the rising profiles of natural history collections. This article is part of the theme issue ‘Biological collections for understanding biodiversity in the Anthropocene’.


1996 ◽  
Vol 15 (5) ◽  
pp. 436-444
Author(s):  
Steven D. Brynes ◽  
Richard H. Teske

The Free Trade Agreement between the United States and Canada (FTA) went into effect January 1, 1989. To implement certain provisions of the agreement on technical regulations and standards, the United States Center for Veterinary Medicine, the Canadian Bureau of Veterinary Drugs, and Agriculture Canada established the Working Group on Veterinary Drug Tolerances. The primary charges to the Working Group on Veterinary Drug Tolerances were (1) to harmonize the procedures used for evaluating new animal drugs, performing risk assessments and calculating tolerances, and (2) to harmonize the tolerances (or maximum residue levels, MRLs) for approved drugs, with the goal of having the same tolerances in each country. The first of these charges was met early in the negotiations. Both the US and Canada will use a 6-step evaluation procedure for the human food safety evaluation of new animal drugs. On September 29, 1990, Canada published a list of MRLs for 38 drugs that had been harmonized through the FTA. The progress of the working group and its continuing efforts to harmonize tolerances for approximately 15 other veterinary drugs will be discussed. This paper proposes use of the toxicologically determined acceptable daily intake (ADI) for the drug as the safety standard for reaching conclusions on the acceptability of residues in meat for human consumption. Specifically, the ‘equivalence’ of different MRLs for the same veterinary drug would be determined by considering whether they are likely to result in dietary residues that exceed the other country's ADI for the drug. Estimates are made for the veterinary drugs lasalocid and halofuginone hydrobromide. Based on these estimates, the US and Canadian MRLs for each drug would be considered ‘equivalent’ for trade purposes.


2021 ◽  
Vol 9 ◽  
Author(s):  
Domingos Sandramo ◽  
Enrico Nicosia ◽  
Silvio Cianciullo ◽  
Bernardo Muatinte ◽  
Almeida Guissamulo

The collections of the Natural History Museum of Maputo have a crucial role in the safeguarding of Mozambique's biodiversity, representing an important repository of data and materials regarding the natural heritage of the country. In this paper, a dataset is described, based on the Museum’s Entomological Collection recording 409 species belonging to seven orders and 48 families. Each specimen’s available data, such as geographical coordinates and taxonomic information, have been digitised to build the dataset. The specimens included in the dataset were obtained between 1914–2018 by collectors and researchers from the Natural History Museum of Maputo (once known as “Museu Alváro de Castro”) in all the country’s provinces, with the exception of Cabo Delgado Province. This paper adds data to the Biodiversity Network of Mozambique and the Global Biodiversity Information Facility, within the objectives of the SECOSUD II Project and the Biodiversity Information for Development Programme. The aforementioned insect dataset is available on the GBIF Engine data portal (https://doi.org/10.15468/j8ikhb). Data were also shared on the Mozambican national portal of biodiversity data BioNoMo (https://bionomo.openscidata.org), developed by SECOSUD II Project.


Sign in / Sign up

Export Citation Format

Share Document