scholarly journals Biodiversity Information Services: A (not-so-) little knowledge that acts

2018 ◽  
Vol 2 ◽  
pp. e25738 ◽  
Author(s):  
Arturo Ariño ◽  
Daniel Noesgaard ◽  
Angel Hjarding ◽  
Dmitry Schigel

Standards set up by Biodiversity Information Standards-Taxonomic Databases Working Group (TDWG), initially developed as a way to share taxonomical data, greatly facilitated the establishment of the Global Biodiversity Information Facility (GBIF) as the largest index to digitally-accessible primary biodiversity information records (PBR) held by many institutions around the world. The level of detail and coverage of the body of standards that later became the Darwin Core terms enabled increasingly precise retrieval of relevant records useful for increased digitally-accessible knowledge (DAK) which, in turn, may have helped to solve ecologically-relevant questions. After more than a decade of data accrual and release, an increasing number of papers and reports are citing GBIF either as a source of data or as a pointer to the original datasets. GBIF has curated a list of over 5,000 citations that were examined for contents, and to which tags were applied describing such contents as additional keywords. The list now provides a window on what users want to accomplish using such DAK. We performed a preliminary word frequency analysis of this literature, starting at titles, which refers to GBIF as a resource. Through a standardization and mapping of terms, we examined how the facility-enabled data seem to have been used by scientists and other practitioners through time: what concepts/issues are pervasive, which taxon groups are mostly addressed, and whether data concentrate around specific geographical or biogeographical regions. We hoped to cast light on which types of ecological problems the community believes are amenable to study through the judicious use of this data commons and found that, indeed, a few themes were distinctly more frequently mentioned than others. Among those, generally-perceived issues such as climate change and its effect on biodiversity at global and regional scales seemed prevalent. The taxonomic groups were also unevenly mentioned, with birds and plants being the most frequently named. However, the entire list of potential subjects that might have used GBIF-enabled data is now quite wide, showing that the availability of well-structured data has spawned a widening spectrum of possible use cases. Among them, some enjoy early and continuous presence (e.g. species, biodiversity, climate) while others have started to show up only later, once a critical mass of data seemed to have been attained (e.g. ecosystems, suitability, endemism). Biodiversity information in the form of standards-compliant DAK may thus already have become a commodity enabling insight into an increasingly more complex and diverse body of science. Paraphrasing Tennyson, more things were wrought by data than TDWG dreamt of.

2018 ◽  
Vol 2 ◽  
pp. e26369
Author(s):  
Michael Trizna

As rapid advances in sequencing technology result in more branches of the tree of life being illuminated, there has actually been a decrease in the percentage of sequence records that are backed by voucher specimens Trizna 2018b. The good news is that there are tools Trizna (2017), NCBI (2005), Biocode LLC (2014) to enable well-databased museum vouchers to automatically validate and format specimen and collection metadata for high quality sequence records. Another problem is that there are millions of existing sequence records that are known to contain either incorrect or incomplete specimen data. I will show an end-to-end example of sequencing specimens from a museum, depositing their sequence records in NCBI's (National Center for Biotechnology Information) GenBank database, and then providing updates to GenBank as the museum database revises identifications. I will also talk about linking records from specimen databases as well. Over one million records in the Global Biodiversity Information Facility (GBIF) Trizna (2018a) contain a value in the Darwin Core term "associatedSequences", and I will examine what is currently contained in these entries, and how best to format them to ensure that a tight connection is made to sequence records.


ZooKeys ◽  
2018 ◽  
Vol 751 ◽  
pp. 129-146 ◽  
Author(s):  
Robert Mesibov

A total of ca 800,000 occurrence records from the Australian Museum (AM), Museums Victoria (MV) and the New Zealand Arthropod Collection (NZAC) were audited for changes in selected Darwin Core fields after processing by the Atlas of Living Australia (ALA; for AM and MV records) and the Global Biodiversity Information Facility (GBIF; for AM, MV and NZAC records). Formal taxon names in the genus- and species-groups were changed in 13–21% of AM and MV records, depending on dataset and aggregator. There was little agreement between the two aggregators on processed names, with names changed in two to three times as many records by one aggregator alone compared to records with names changed by both aggregators. The type status of specimen records did not change with name changes, resulting in confusion as to the name with which a type was associated. Data losses of up to 100% were found after processing in some fields, apparently due to programming errors. The taxonomic usefulness of occurrence records could be improved if aggregators included both original and the processed taxonomic data items for each record. It is recommended that end-users check original and processed records for data loss and name replacements after processing by aggregators.


Author(s):  
Azra Velagić-Hajrudinović

Featuring a large variety of ecosystems, abundant freshwater and forest resources, unique extensive karstic systems, and a high level of biodiversity and endemism, Southeast Europe (SEE) plays a crucial role in the conservation of biodiversity in Europe and beyond. In order to conserve and sustainably use these biodiversity assets and valuable natural resources, a regional concerted approach in the field of biodiversity information management and reporting (BIMR) has been strengthened. This has enabled improvement in access, transparency and exchange of biodiversity data and reporting processes among the participating economies. Certain significant and visible progress among SEE economies and stakeholders is due to to the knowledge gained about regional and national BIMR baselines, agreed and elaborated minimum Convention on Biological Diversity (CBD) and European Union (EU) requirements on BIMR among stakeholders and implemented BIMR tools (e.g., a regionally unified fundamental database for the Information System for Nature Conservation (ISNC), for instance in Montenegro (http://zasticenapodrucja-cg.tk//en), Bosnia and Herzegovina/entity of Republika Srpska (http://e-priroda.rs.ba/en/) and entity of Federation of Bosnia and Herzegovina and North Macedonia (Standard Data Form - SDF application for NATURA 2000) and compiled dataset on five taxonomic groups of endemic taxa using the Darwin Core standard). Therefore, BIMR activities/priorities from the region have become more evident and supported along with ownership of BIMR tools acquired by the partner institutions and recognized at the global level through the Global Biodiversity Information Facility (GBIF).


Author(s):  
Marie-Elise Lecoq ◽  
Anne-Sophie Archambeau ◽  
Rui Figueira ◽  
David Martin ◽  
Sophie Pamerlon ◽  
...  

The power and configurability of the the Atlas of Living Australia tools have enabled more and more institutions and participants of the Global Biodiversity Information Facility adapt and install biodiversity platforms. For six years, we have demonstrated that the community around this platform was needed and ready for its adoption. During the symposium organized for the SPNHC+TDWG 2018, we started a discussion that has led us to the creation of a more structured and sustainable community of practice. We want to create a community that follows the structure of open-source technical projects such as Linux or Apache foundation. After the GBIF Governing Board (GB25), the Kilkenny accord was agreed among 8 country or institution partners and early adopters of ALA platform to outline the scope of the new Living Atlases community. Thanks to this accord, we have begun to set up a new structure based on the Community of Practice (CoP) model. In summary, the governance will be held by a Management committee and a Technical advisory committee. Adding to these, the Living Atlases community will have two coordinators with technical and administrative duties. This presentation will briefly summarise the community history leading up to the agreement of the Kilkenny accord and provide information and context of the key points contained. Then, we will present and launch the new Living Atlases Community of Practice . Through this presentation, we aim to collect lessons learned and good practices from other CoP in topics like governance, communications, sustainability, among others to incorporate them in the consolidation process of the Living Atlases community.


Author(s):  
Alexander Zizka ◽  
Fernanda Antunes Carvalho ◽  
Alice Calvente ◽  
Mabel Rocio Baez-Lizarazo ◽  
Andressa Cabral ◽  
...  

ABSTRACTSpecies occurrence records provide the basis for many biodiversity studies. They derive from georeferenced specimens deposited in natural history collections and visual observations, such as those obtained through various mobile applications. Given the rapid increase in availability of such data, the control of quality and accuracy constitutes a particular concern. Automatic filtering is a scalable and reproducible means to identify potentially problematic records and tailor datasets from public databases such as the Global Biodiversity Information Facility (GBIF; www.gbif.org), for biodiversity analyses. However, it is unclear how much data may be lost by filtering, whether the same filters should be applied across all taxonomic groups, and what the effect of filtering is on common downstream analyses. Here, we evaluate the effect of 13 recently proposed filters on the inference of species richness patterns and automated conservation assessments for 18 Neotropical taxa, including terrestrial and marine animals, fungi, and plants downloaded from GBIF. We find that a total of 44.3% of the records are potentially problematic, with large variation across taxonomic groups (25 - 90%). A small fraction of records was identified as erroneous in the strict sense (4.2%), and a much larger proportion as unfit for most downstream analyses (41.7%). Filters of duplicated information, collection year, and basis of record, as well as coordinates in urban areas, or for terrestrial taxa in the sea or marine taxa on land, have the greatest effect. Automated filtering can help in identifying problematic records, but requires customization of which tests and thresholds should be applied to the taxonomic group and geographic area under focus. Our results stress the importance of thorough recording and exploration of the meta-data associated with species records for biodiversity research.


2020 ◽  
Vol 15 (4) ◽  
pp. 411-437 ◽  
Author(s):  
Marcos Zárate ◽  
Germán Braun ◽  
Pablo Fillottrani ◽  
Claudio Delrieux ◽  
Mirtha Lewis

Great progress to digitize the world’s available Biodiversity and Biogeography data have been made recently, but managing data from many different providers and research domains still remains a challenge. A review of the current landscape of metadata standards and ontologies in Biodiversity sciences suggests that existing standards, such as the Darwin Core terminology, are inadequate for describing Biodiversity data in a semantically meaningful and computationally useful way. As a contribution to fill this gap, we present an ontology-based system, called BiGe-Onto, designed to manage data together from Biodiversity and Biogeography. As data sources, we use two internationally recognized repositories: the Global Biodiversity Information Facility (GBIF) and the Ocean Biogeographic Information System (OBIS). BiGe-Onto system is composed of (i) BiGe-Onto Architecture (ii) a conceptual model called BiGe-Onto specified in OntoUML, (iii) an operational version of BiGe-Onto encoded in OWL 2, and (iv) an integrated dataset for its exploitation through a SPARQL endpoint. We will show use cases that allow researchers to answer questions that manage information from both domains.


2018 ◽  
Vol 2 ◽  
pp. e25642
Author(s):  
Annie Simpson

Biodiversity Information Serving our Nation - BISON (bison.usgs.gov) is the U.S. node to the Global Biodiversity Information Facility (gbif.org), containing more than 375 million documented locations for all species in the U.S. It is hosted by the United States Geological Survey (USGS) and includes a web site and application programming interface for apps and other websites to use for free. With this massive database one can see not only the 15 million records for nearly 10 thousand non-native species in the U.S. and its territories, but also their relationship to all of the other species in the country as well as their full national range. Leveraging this huge resource and its enterprise level cyberinfrastructure, USGS BISON staff have created a value-added feature by labeling non-native species records, even where contributing datasets have not provided such labels. Based on our ongoing four-year compilation of non-native species scientific names from the literature, specific examples will be shared about the ambiguity and evolution of terms that have been discovered, as they relate to invasiveness, impact, dispersal, and management. The idea of incorporating these terms into an invasive species extension to Darwin Core has been discussed by Biodiversity Information Standards (TDWG) working group participants since at least 2005. One roadblock to the implementation of this standard's extension has been the diverse terminology used to describe the characteristics of biological invasions, terminology which has evolved significantly over the past decade.


Author(s):  
Takeru Nakazato

DNA barcoding and environmental DNA (eDNA) are increasing the need for the utilization of gene sequences in the field of biodiversity. GBIF (Global Biodiversity Information Facility) and GGBN (Global Genome Biodiversity Network) are taking action on the treatment of gene sequences in the field of biodiversity (Finstad et al. 2020). Gene sequences have been collected and published by INSDC (International Nucleotide Sequence Database Collaboration) for over 30 years (Arita et al. 2020). Biodiversity information has been collected using standards such as Darwin Core (Wieczorek et al. 2012), but INSDC gene sequences are stored in their own format. In the field of bioinformatics, researchers are also organizing the BioHackathon series, notably the NBDC/DBCLS BioHackathon and the spin-off Biohackathon Europe, to standardize data through the Semantic Web (Garcia Castro et al. 2021, Vos et al. 2020), but the linkage with biodiversity information has just begun. In this study, as an example of linking gene sequence information with biodiversity information, I attempted to construct an infrastructure for knowledge extraction by utilising gene sequence entries derived from museum specimens from GenBank (Sayers et al. 2020). I have previously surveyed the BOLD (The Barcode of Life Data System) (Ratnasingham and Hebert 2007) IDs listed in GenBank (Nakazato 2020). I downloaded the fish and insect data from the GenBank FTP (file transfer protocol) site. Then I extracted the descriptions in the "specimen_voucher" field and obtained 749,627 (28% of the fish entries in GenBank) and 1,621,890 (13%) specimen IDs, respectively. I also extracted from the "note" field approximately 1000 entries describing the type of the specimen, such as "holotype", "lectotype", and "paratype". These extracts include descriptions written in natural language. NCBI (National Center for Biotechnology Information) publishes the BioCollections database (Sharma et al. 2019), and these data may be able to refine the description. In the future, I plan to map these extracted IDs to the collection IDs in the biodiversity information database. This will enable us to enrich the biodiversity information with GenBank descriptions, for example, by adding articles listed in GenBank as references to the specimen data.


Author(s):  
Michael Trizna ◽  
Torsten Dikow

Taxonomic revisions contain crucial biodiversity data in the material examined sections for each species. In entomology, material examined lists minimally include the collecting locality, date of collection, and the number of specimens of each collection event. Insect species might be represented in taxonomic revisions by only a single specimen or hundreds to thousands of specimens. Furthermore, revisions of insect genera might treat small genera with few species or include tens to hundreds of species. Summarizing data from such large and complex material examined lists and revisions is cumbersome, time-consuming, and prone to errors. However, providing data on the seasonal incidence, abundance, and collecting period of species is an important way to mobilize primary biodiversity data to understand a species’s occurrence or rarity. Here, we present SpOccSum (Species Occurrence Summary)—a tool to easily obtain metrics of seasonal incidence from specimen occurrence data in taxonomic revisions. SpOccSum is written in Python (Python Software Foundation 2019) and accessible through the Anaconda Python/R Data Science Platform as a Jupyter Notebook (Kluyver et al. 2016). The tool takes a simple list of specimen data containing species name, locality, date of collection (preferably separated by day, month, and year), and number of specimens in CSV format and generates a series of tables and graphs summarizing: number of specimens per species, number of specimens collected per month, number of unique collection events, as well as earliest, and most recent collecting year of each species. number of specimens per species, number of specimens collected per month, number of unique collection events, as well as earliest, and most recent collecting year of each species. The results can be exported as graphics or as csv-formatted tables and can easily be included in manuscripts for publication. An example of an early version of the summary produced by SpOccSum can be viewed in Tables 1, 2 from Markee and Dikow (2018). To accommodate seasonality in the Northern and Southern Hemispheres, users can choose to start the data display with either January or July. When geographic coordinates are available and species have widespread distributions spanning, for example, the equator, the user can itemize particular regions such as North of Tropic of Cancer (23.5˚N), Tropic of Cancer to the Equator, Equator to Tropic of Capricorn, and South of Tropic of Capricorn (23.5˚S). Other features currently in development include the ability to produce distribution maps from the provided data (when geographic coordinates are included) and the option to export specimen occurrence data as a Darwin-Core Archive ready for upload to the Global Biodiversity Information Facility (GBIF).


Author(s):  
Laurence Bénichou ◽  
Isabelle Gerard ◽  
Chloé Chester ◽  
Donat Agosti

The European Journal of Taxonomy (EJT) was initiated by a consortium of European natural history publishers to take advantage of the shift from paper to electronic-only publishing (Benichou et al. 2011). Whilst originally publishing in PDF format has been considered the state of the art, it became recently obvious that complementary dissemination channels help to disseminate taxonomic data - one of the pillars of Natural History institutions research - more widely and efficiently (Côtez et al. 2018). The adoption of semantic markup and assignment of persistent identifiers for content allow more comprehensive citations of the article, including elements therein, such as images, taxonomic treatments, and materials citation. It also allows more in-depth analyses and visualization of the contribution of collections, authors, or specimens to taxonomic output and third parties, such as the Global Biodiversity Information Facility, for reuse of the data or building the catalogue of life. In this presentation, EJT will be used to outline the nature of natural history publishers and their technical set up. This is followed by a description of the post-publishing workflow using the Plazi workflow and dissemination via the Biodiversity Literature Repository (BLR) and TreatmentBank. It outlines switching the publishing workflow to an increased use of extended markup language (XML) and visualization of the output and concludes by publishing guidelines that enable more efficient text and data mining of the content of taxonomic publications.


Sign in / Sign up

Export Citation Format

Share Document