scholarly journals The European Journal of Taxonomy: Enhancing taxonomic publications for dynamic data exchange and navigation

Author(s):  
Laurence Bénichou ◽  
Isabelle Gerard ◽  
Chloé Chester ◽  
Donat Agosti

The European Journal of Taxonomy (EJT) was initiated by a consortium of European natural history publishers to take advantage of the shift from paper to electronic-only publishing (Benichou et al. 2011). Whilst originally publishing in PDF format has been considered the state of the art, it became recently obvious that complementary dissemination channels help to disseminate taxonomic data - one of the pillars of Natural History institutions research - more widely and efficiently (Côtez et al. 2018). The adoption of semantic markup and assignment of persistent identifiers for content allow more comprehensive citations of the article, including elements therein, such as images, taxonomic treatments, and materials citation. It also allows more in-depth analyses and visualization of the contribution of collections, authors, or specimens to taxonomic output and third parties, such as the Global Biodiversity Information Facility, for reuse of the data or building the catalogue of life. In this presentation, EJT will be used to outline the nature of natural history publishers and their technical set up. This is followed by a description of the post-publishing workflow using the Plazi workflow and dissemination via the Biodiversity Literature Repository (BLR) and TreatmentBank. It outlines switching the publishing workflow to an increased use of extended markup language (XML) and visualization of the output and concludes by publishing guidelines that enable more efficient text and data mining of the content of taxonomic publications.

Author(s):  
Katharine Barker ◽  
Jonas Astrin ◽  
Gabriele Droege ◽  
Jonathan Coddington ◽  
Ole Seberg

Most successful research programs depend on easily accessible and standardized research infrastructures. Until recently, access to tissue or DNA samples with standardized metadata and of a sufficiently high quality, has been a major bottleneck for genomic research. The Global Geonome Biodiversity Network (GGBN) fills this critical gap by offering standardized, legal access to samples. Presently, GGBN’s core activity is enabling access to searchable DNA and tissue collections across natural history museums and botanic gardens. Activities are gradually being expanded to encompass all kinds of biodiversity biobanks such as culture collections, zoological gardens, aquaria, arboreta, and environmental biobanks. Broadly speaking, these collections all provide long-term storage and standardized public access to samples useful for molecular research. GGBN facilitates sample search and discovery for its distributed member collections through a single entry point. It stores standardized information on mostly geo-referenced, vouchered samples, their physical location, availability, quality, and the necessary legal information on over 50,000 species of Earth’s biodiversity, from unicellular to multicellular organisms. The GGBN Data Portal and the GGBN Data Standard are complementary to existing infrastructures such as the Global Biodiversity Information Facility (GBIF) and International Nucleotide Sequence Database (INSDC). Today, many well-known open-source collection management databases such as Arctos, Specify, and Symbiota, are implementing the GGBN data standard. GGBN continues to increase its collections strategically, based on the needs of the research community, adding over 1.3 million online records in 2018 alone, and today two million sample data are available through GGBN. Together with Consortium of European Taxonomic Facilities (CETAF), Society for the Preservation of Natural History Collections (SPNHC), Biodiversity Information Standards (TDWG), and Synthesis of Systematic Resources (SYNTHESYS+), GGBN provides best practices for biorepositories on meeting the requirements of the Nagoya Protocol on Access and Benefit Sharing (ABS). By collaboration with the Biodiversity Heritage Library (BHL), GGBN is exploring options for tagging publications that reference GGBN collections and associated specimens, made searchable through GGBN’s document library. Through its collaborative efforts, standards, and best practices GGBN aims at facilitating trust and transparency in the use of genetic resources.


2018 ◽  
Vol 2 ◽  
pp. e25738 ◽  
Author(s):  
Arturo Ariño ◽  
Daniel Noesgaard ◽  
Angel Hjarding ◽  
Dmitry Schigel

Standards set up by Biodiversity Information Standards-Taxonomic Databases Working Group (TDWG), initially developed as a way to share taxonomical data, greatly facilitated the establishment of the Global Biodiversity Information Facility (GBIF) as the largest index to digitally-accessible primary biodiversity information records (PBR) held by many institutions around the world. The level of detail and coverage of the body of standards that later became the Darwin Core terms enabled increasingly precise retrieval of relevant records useful for increased digitally-accessible knowledge (DAK) which, in turn, may have helped to solve ecologically-relevant questions. After more than a decade of data accrual and release, an increasing number of papers and reports are citing GBIF either as a source of data or as a pointer to the original datasets. GBIF has curated a list of over 5,000 citations that were examined for contents, and to which tags were applied describing such contents as additional keywords. The list now provides a window on what users want to accomplish using such DAK. We performed a preliminary word frequency analysis of this literature, starting at titles, which refers to GBIF as a resource. Through a standardization and mapping of terms, we examined how the facility-enabled data seem to have been used by scientists and other practitioners through time: what concepts/issues are pervasive, which taxon groups are mostly addressed, and whether data concentrate around specific geographical or biogeographical regions. We hoped to cast light on which types of ecological problems the community believes are amenable to study through the judicious use of this data commons and found that, indeed, a few themes were distinctly more frequently mentioned than others. Among those, generally-perceived issues such as climate change and its effect on biodiversity at global and regional scales seemed prevalent. The taxonomic groups were also unevenly mentioned, with birds and plants being the most frequently named. However, the entire list of potential subjects that might have used GBIF-enabled data is now quite wide, showing that the availability of well-structured data has spawned a widening spectrum of possible use cases. Among them, some enjoy early and continuous presence (e.g. species, biodiversity, climate) while others have started to show up only later, once a critical mass of data seemed to have been attained (e.g. ecosystems, suitability, endemism). Biodiversity information in the form of standards-compliant DAK may thus already have become a commodity enabling insight into an increasingly more complex and diverse body of science. Paraphrasing Tennyson, more things were wrought by data than TDWG dreamt of.


2021 ◽  
Vol 9 ◽  
Author(s):  
Domingos Sandramo ◽  
Enrico Nicosia ◽  
Silvio Cianciullo ◽  
Bernardo Muatinte ◽  
Almeida Guissamulo

The collections of the Natural History Museum of Maputo have a crucial role in the safeguarding of Mozambique's biodiversity, representing an important repository of data and materials regarding the natural heritage of the country. In this paper, a dataset is described, based on the Museum’s Entomological Collection recording 409 species belonging to seven orders and 48 families. Each specimen’s available data, such as geographical coordinates and taxonomic information, have been digitised to build the dataset. The specimens included in the dataset were obtained between 1914–2018 by collectors and researchers from the Natural History Museum of Maputo (once known as “Museu Alváro de Castro”) in all the country’s provinces, with the exception of Cabo Delgado Province. This paper adds data to the Biodiversity Network of Mozambique and the Global Biodiversity Information Facility, within the objectives of the SECOSUD II Project and the Biodiversity Information for Development Programme. The aforementioned insect dataset is available on the GBIF Engine data portal (https://doi.org/10.15468/j8ikhb). Data were also shared on the Mozambican national portal of biodiversity data BioNoMo (https://bionomo.openscidata.org), developed by SECOSUD II Project.


Author(s):  
Marie-Elise Lecoq ◽  
Anne-Sophie Archambeau ◽  
Rui Figueira ◽  
David Martin ◽  
Sophie Pamerlon ◽  
...  

The power and configurability of the the Atlas of Living Australia tools have enabled more and more institutions and participants of the Global Biodiversity Information Facility adapt and install biodiversity platforms. For six years, we have demonstrated that the community around this platform was needed and ready for its adoption. During the symposium organized for the SPNHC+TDWG 2018, we started a discussion that has led us to the creation of a more structured and sustainable community of practice. We want to create a community that follows the structure of open-source technical projects such as Linux or Apache foundation. After the GBIF Governing Board (GB25), the Kilkenny accord was agreed among 8 country or institution partners and early adopters of ALA platform to outline the scope of the new Living Atlases community. Thanks to this accord, we have begun to set up a new structure based on the Community of Practice (CoP) model. In summary, the governance will be held by a Management committee and a Technical advisory committee. Adding to these, the Living Atlases community will have two coordinators with technical and administrative duties. This presentation will briefly summarise the community history leading up to the agreement of the Kilkenny accord and provide information and context of the key points contained. Then, we will present and launch the new Living Atlases Community of Practice . Through this presentation, we aim to collect lessons learned and good practices from other CoP in topics like governance, communications, sustainability, among others to incorporate them in the consolidation process of the Living Atlases community.


Author(s):  
Matt Woodburn ◽  
Sarah Vincent ◽  
Helen Hardy ◽  
Clare Valentine

The natural science collections community has identified an increasing need for shared, structured and interoperable data standards that can be used to describe the totality of institutional collection holdings, whether digitised or not. Major international initiatives - including the Global Biodiversity Information Facility (GBIF), the Distributed System of Scientific Collections (DiSSCo) and the Consortium of European Taxonomic Facilities (CETAF) - consider the current lack of standards to be a major barrier, which must be overcome to further their strategic aims and contribute to an open, discoverable catalogue of global collections. The Biodiversity Information Standards (TDWG) Collection Descriptions (CD) group is looking to address this issue with a new data standard for collection descriptions. At an institutional level, this concept of collection descriptions aligns strongly with the need to use a structured and more data-driven approach to assessing and working with collections, both to identify and prioritise investment and effort, and to monitor the impact of the work. Use cases include planning conservation and collection moves, prioritising specimen digitisation activities, and informing collection development strategy. The data can be integrated with the collection description framework for ongoing assessments of the state of the collection. This approach was pioneered with the ‘Move the Dots’ methodology by the Smithsonian National Museum of Natural History, started in 2009 and run annually since. The collection is broken down into several hundred discrete subcollections, for each of which the number of objects was estimated and a numeric rank allocated according to a range of assessment criteria. This method has since been adopted by several other institutions, including Naturalis Biodiversity Centre, Museum für Naturkunde and Natural History Museum, London (NHM). First piloted in 2016, and now implemented as a core framework, the NHM’s adaptation, ‘Join the Dots’, divides the collection into approximately 2,600 ‘collection units’. The breakdown uses formal controlled lists and hierarchies, primarily taxonomy, type of object, storage location and (where relevant) stratigraphy, which are mapped to external authorities such as the Catalogue of Life and Paleobiology Database. The collection breakdown is enhanced with estimations of number of items, and ranks from 1 to 5 for each collection unit against 17 different criteria. These are grouped into four categories of ‘Condition’, ‘Information’ (including digital records), ‘Importance and Significance’ and ‘Outreach’. Although requiring significant time investment from collections staff to provide the estimates and assessments, this methodology has yielded a rich dataset that supports both discoverability (collection descriptions) and management (collection assessment). Links to further datasets about the building infrastructure and environmental conditions also make it into a powerful resource for planning activities such as collections moves, pest monitoring and building work. We have developed dynamic dashboards to provide rich visualisations for exploring, analysing and communicating the data. As an ongoing, embedded activity for collections staff, there will also be a build-up of historical data going forward, enabling us to see trends, track changes to the collection, and measure the impact of projects and events. The concept of Join the Dots also offers a generic, institution-agnostic model for enhancing the collection description framework with additional metrics that add value for strategic management and resourcing of the collection. In the design and implementation, we’ve faced challenges that should be highly relevant to the TDWG CD group, such as managing the dynamic breakdown of collections across multiple dimensions. We also face some that are yet to be resolved, such as a robust model for managing the evolving dataset over time. We intend to contribute these use cases into the development of the new TDWG data standard and be an early adopter and reference case. We envisage that this could constitute a common model that, where resources are available, provides the ability to add greater depth and utility to the world catalogue of collections.


Author(s):  
David Shorthouse ◽  
Jocelyn Pender ◽  
Richard Rabeler ◽  
James Macklin

A discussion session held at a National Science Foundation-sponsored Herbarium Networks Workshop at Michigan State University in September of 2004 resulted in a rallying objective: make all botanical specimen information in United States collections available online by 2020. Rabeler and Macklin 2006 outlined a toolkit for realizing this ambitious goal, which included: a review of relevant and state-of-the-art web resources, data exchange standards and, mechanisms to maximize efficiencies while minimizing costs. a review of relevant and state-of-the-art web resources, data exchange standards and, mechanisms to maximize efficiencies while minimizing costs. Given that we are now in the year 2020, it seems appropriate to examine the progress towards the objective of making all US botanical specimen collections data available online. Our presentation will attempt to answer several questions: How close have we come to meeting the original objective? What fraction of “digitized” specimens are minimally represented by a catalog number, a determination, and/or a photograph? What fraction has been thoroughly transcribed? How close have we come to attaining a seamlessly integrated, comprehensive, and national view of botanical specimen data that guides a stakeholder to appropriate resources regardless of their entry point? What “holes” in this effort still exist and what might be required to fill them? How close have we come to meeting the original objective? What fraction of “digitized” specimens are minimally represented by a catalog number, a determination, and/or a photograph? What fraction has been thoroughly transcribed? How close have we come to attaining a seamlessly integrated, comprehensive, and national view of botanical specimen data that guides a stakeholder to appropriate resources regardless of their entry point? What “holes” in this effort still exist and what might be required to fill them? Given our interest in the success of both the Global Biodiversity Information Facility (GBIF) and the Integrated Digitized Biocollections (iDigBio), as well as the overwhelming likelihood that either one of these initiatives is the usual entry point for someone seeking US-based botanical data, we approached the answers to the above questions by first crafting a repeatable data download and processing workflow in early July 2020. This resulted in 25.6M records of plant, fungi, and Chromista from 216 datasets available through GBIF and 32.8M comparable records available through iDigBio from 525 recordsets. We attempted to align these seemingly discordant sets of records and also chose Darwin Core terms that were best suited to match the four hierarchical levels of digitization defined in the Minimal Information for Digital Specimens (MIDS) (van Egmond et al. 2019). During the analysis/comparison of the datasets, we found several examples where the number of data records from an institution seemed much lower than expected. From a combination of analyzing record content in GBIF/iDigBio and consulting regional/taxonomic portals, it became evident that, besides datasets only being included in either GBIF or iDigBio, there was a significant number of records in regional/taxonomic portals that were not yet made available through either GBIF or iDigBio. Progress on digitization has benefited greatly from the US National Science Foundation's creation of the Advancing Digitization of Biodiversity Collections (ADBC) program, and funding of the 15 Thematic Collection Networks (TCN). The launching of new projects and the ensuing digitization of herbarium collections have led to a multitude of new specimen portals and the enhancement of existing software like Symbiota (Gries et al. 2014). But, it has also led to insufficient data sharing among projects and inadequately aligned data synchronization practices between aggregators. Consistency in terms of data availability and quality between GBIF and iDigBio is low, and the chronic lack of record-level identifiers consistently restricts the flow of enhancements made to records. We conclude that there remains substantial work to be done on the national infrastructure and on international best practices to help facilitate collaboration and to realize the original objective of making all US botanical specimen collections data available online.


2018 ◽  
Vol 2 ◽  
pp. e26060
Author(s):  
Pamela Soltis

Digitized natural history data are enabling a broad range of innovative studies of biodiversity. Large-scale data aggregators such as Global Biodiversity Information facility (GBIF) and Integrated Digitized Biocollections (iDigBio) provide easy, global access to millions of specimen records contributed by thousands of collections. A developing community of eager users of specimen data – whether locality, image, trait, etc. – is perhaps unaware of the effort and resources required to curate specimens, digitize information, capture images, mobilize records, serve the data, and maintain the infrastructure (human and cyber) to support all of these activities. Tracking of specimen information throughout the research process is needed to provide appropriate attribution to the institutions and staff that have supplied and served the records. Such tracking may also allow for annotation and comment on particular records or collections by the global community. Detailed data tracking is also required for open, reproducible science. Despite growing recognition of the value and need for thorough data tracking, both technical and sociological challenges continue to impede progress. In this talk, I will present a brief vision of how application of a DOI to each iteration of a data set in a typical research project could provide attribution to the provider, opportunity for comment and annotation of records, and the foundation for reproducible science based on natural history specimen records. Sociological change – such as journal requirements for data deposition of all iterations of a data set – can be accomplished using community meetings and workshops, along with editorial efforts, as were applied to DNA sequence data two decades ago.


Author(s):  
Donald Hobern ◽  
Deborah L Paul ◽  
Tim Robertson ◽  
Quentin Groom ◽  
Barbara Thiers ◽  
...  

Information about natural history collections helps to map the complex landscape of research resources and assists researchers in locating and contacting the holders of specimens. Collection records contribute to the development of a fully interlinked biodiversity knowledge graph (Page 2016), showcasing the existence and importance of museums and herbaria and supplying context to available data on specimens. These records also potentially open new avenues for fresh use of these collections and for accelerating their full availability online. A number of international (e.g., Index Herbariorum, GRSciColl) regional (e.g. DiSSCo and CETAF) national (e.g., ALA and the Living Atlases, iDigBio US Collections Catalog) and institutional networks (e.g., The Field Museum) separately document subsets of the world's collections, and the Biodiversity Information Standards (TDWG) Collection Descriptions Interest Group is actively developing standards to support information sharing on collections. However, these efforts do not yet combine to deliver a comprehensive and connected view of all collections globally. The Global Biodiversity Information Facility (GBIF) received funding as part of the European Commission-funded SYNTHESYS+ 7 project to explore development of a roadmap towards delivering such a view, in part as a contribution towards the establishment of DiSSCo services within a global ecosystem of collection catalogues. Between 17 and 29 April 2020, a coordination team comprising international representatives from multiple networks ran Advancing the Catalogue of the World’s Natural History Collections, a fully online consultation using the GBIF Discourse forum platform to guide discussion around 26 consultation topics identified in an initial Ideas Paper (Hobern et al. 2020). Discussions included support for contributions in Spanish, Chinese and French and were summarised daily throughout the consultation. The consultation confirmed broad agreement around the needs and goals for a comprehensive catalogue of the world’s natural history collections, along with possible strategies to overcome the challenges. This presentation will summarise the results and recommendations.


Author(s):  
Elie Tobi ◽  
Geovanne Aymar Nziengui Djiembi ◽  
Anna Feistner ◽  
Donald Midoko Iponga ◽  
Jean Felicien Liwouwou ◽  
...  

Language is a major barrier for researchers wanting to digitize and publish collection data in Africa. Despite being the fifth most spoken language on Earth and the second most common in Africa, resources in French about digitization, data management, and publishing are lacking. Furthermore, French-speaking regions of Africa (primarily Central/West Africa and Madagascar) host some of the highest biodiversity on the continent and therefore are of great importance to scientists and decision-makers. Without having representation in online portals like the Global Biodiversity Information Facility (GBIF) and Integrated Digitized Biocollections (iDigBio), these important collections are effectively invisible. Producing relevant/applicable resources about digitization in French will help shine a light on these valuable natural history records and allow the data-holders in Africa to retain the autonomy of their collections. Awarded a GBIF-BID (Biodiversity Information for Development) grant in 2021, an international, multilingual network of partners has undertaken the important task of digitizing and mobilizing Gabon’s vertebrate collections. There are an estimated 13,500 vertebrate specimens housed in five institutions in different parts of Gabon. To date, the group has mobilized >4,600 vertebrate records to our recently launched Gabon Biodiversity Portal (https://gabonbiota.org/). The portal also hosts French guides for using Symbiota-based portals to manage, georeference, and publish natural history databases. These resources can provide much-needed guidance for other Francophone countries⁠—in Africa and beyond⁠—working to maximize the accessibility and value of their biodiversity collections.


Author(s):  
Elspeth Haston ◽  
Lorna Mitchell

The specimens held in natural history collections around the world are the direct result of the effort of thousands of people over hundreds of years. However, the way that the names of these people have been recorded within the collections has never been fully standardised, and this makes the process of correctly assigning the event relating to the specimen to an individual difficult at best, and impossible at worst. The events in which people are related to specimens include collecting, identifying, naming, loaning and owning. Whilst there are resources in the botanical community that hold information on many collectors and authors of plant names, the residual number of unknown people and the effort required to disambiguate them is daunting. Moreover, in many cases, the work carried out within the collection to disambiguate the names relating to the specimens is often not recorded and made available, generally due to the lack of a system to do so. This situation is making it extremely difficult to search for collections within the main aggregators, such as GBIF —the Global Biodiversity Information Facility— , and severely hampers our ability to link collections both within and between institutes and disciplines. When we look at benefits of linking collections and people, the need to agree and implement a system of managing people names becomes increasingly urgent.


Sign in / Sign up

Export Citation Format

Share Document