scholarly journals Quantifying Institutional Reach Through the Human Network in Natural History Collections

Author(s):  
David Shorthouse ◽  
Roderic Page

Through the Bloodhound proof-of-concept, https://bloodhound-tracker.net an international audience of collectors and determiners of natural history specimens are engaged in the emotive act of claiming their specimens and attributing other specimens to living and deceased mentors and colleagues. Behind the scenes, these claims build links between Open Researcher and Contributor Identifiers (ORCID, https://orcid.org) or Wikidata identifiers for people and Global Biodiversity Information Facility (GBIF) specimen identifiers, predicated by the Darwin Core terms, recordedBy (collected) and identifiedBy (determined). Here we additionally describe the socio-technical challenge in unequivocally resolving people names in legacy specimen data and propose lightweight and reusable solutions. The unique identifiers for the affiliations of active researchers are obtained from ORCID whereas the unique identifiers for institutions where specimens are actively curated are resolved through Wikidata. By constructing closed loops of links between person, specimen, and institution, an interesting suite of potential metrics emerges, all due to the activities of employees and their network of professional relationships. This approach balances a desire for individuals to receive formal recognition for their efforts in natural history collections with that of an institutional-level need to alter budgets in response to easily obtained numeric trends in national and international reach. If handled in a coordinating fashion, this reporting technique may be a significant new driver for specimen digitization efforts on par with Altmetric, https://www.altmetric.com, an important new tool that tracks the impact of publications and delights administrators and authors alike.

Author(s):  
Marcus De Almeida ◽  
Ângelo Pinto ◽  
Alcimar Carvalho

Natural history collections (NHC) are guardians of biodiversity (Lane 1996) and essential to understand the natural world and its evolutionary processes. They hold samples of morphological and genetic heritages of living and extinct biotas, helping to reconstruct the timeline of life over the centuries (Gardner 2014). Primary data from specimens in NHC are crucial elements for research in many areas of biological sciences, considered the “bricks” of systematics and therefore one of the pillars for evolutionary studies (Troudet 2018). For this reason, studies carried out in NHC are essential for the development of the scientific knowledge and are pivotal for the scientific-technological progress of a nation (Camargo 2015). The digitization and availability of primary data on biodiversity from NHC represents a inexpensive, practical and secure means of exchanging information, allowing collaboration between institutions and researchers. In this sense, initiatives such as the Sistema de Informação sobre a Biodiversidade Brasileira (SiBBr), a country-level branch of the Global Biodiversity Information Facility (GBIF) platform, aim to encourage and establish ways for the informatization of biological collections and their type specimens. Known for housing one of the largest and oldest collections of insects in the world focused on Neotropical fauna, the Entomological Collection of the Museu Nacional of Federal University of Rio de Janeiro (MNRJ) had more than 3,000 primary types and approximately 12,005,000 specimens, of which about 96% were lost in the tragic fire occurred at the institution on September 2, 2018. The SiBBr project was active in that collection from 2016 to 2019 and enabled the digitization and preservation of data from the type material of many insect orders, including the charismatic dragonflies (order Odonata). Due to the end of the agreement between SiBBr and the Museu Nacional, most of the obtained primary data are pending full curation and, therefore, are not yet available to the public and researchers. The MNRJ housed the biggest and most important collection of dragonflies among all Central and South American institutions. It assembled most of the physical records of neotropical dragonfly fauna gathered over the last 80 years, many of which are of undescribed taxa. Unfortunately, almost all material was permanently lost. This study aims to gather, analyze and publicize primary data of the type material of dragonflies housed in the MNRJ, ensuring the preservation of its history, as well as providing data on the taxonomy and diversity of this marvelous group of insects. A total of 11 families, 50 genera and 131 species were recorded, belonging to the suborders Anisoptera and Zygoptera with distributional records widespread in South America. The MNRJ housed 105 holotypes of dragonflies' nomina representing 11.7% of the richness of the Brazilian Odonata fauna (901 spp.), a country with the highest number of species of the biosphere. The impact of the loss of this collection to studies of these insects is unprecedented, since some enigmatic and monotypic genera such as Brasiliogomphus, Fluminagrion and Roppaneura lost 100% of their type series, while others most diverse such as Lauromacromia, Oxyagrion and Neocordulia lost 50%, 35% and 31% of their holotypes. Therefore, due to the registration and preservation of primary biodiversity data, this work reiterates the importance of curating and digitizing biological scientific collections. Furthermore, it shows extreme relevance for preserving information on existing biodiversity permanently and providing support for future research. Digitization and interconnecting digital extended specimen data proves to be one of the main and most effective ways to protect NHC heritage and their primary data against catastrophic events.


Author(s):  
Erica Krimmel ◽  
Austin Mast ◽  
Deborah Paul ◽  
Robert Bruhn ◽  
Nelson Rios ◽  
...  

Genomic evidence suggests that the causative virus of COVID-19 (SARS-CoV-2) was introduced to humans from horseshoe bats (family Rhinolophidae) (Andersen et al. 2020) and that species in this family as well as in the closely related Hipposideridae and Rhinonycteridae families are reservoirs of several SARS-like coronaviruses (Gouilh et al. 2011). Specimens collected over the past 400 years and curated by natural history collections around the world provide an essential reference as we work to understand the distributions, life histories, and evolutionary relationships of these bats and their viruses. While the importance of biodiversity specimens to emerging infectious disease research is clear, empowering disease researchers with specimen data is a relatively new goal for the collections community (DiEuliis et al. 2016). Recognizing this, a team from Florida State University is collaborating with partners at GEOLocate, Bionomia, University of Florida, the American Museum of Natural History, and Arizona State University to produce a deduplicated, georeferenced, vetted, and versioned data product of the world's specimens of horseshoe bats and relatives for researchers studying COVID-19. The project will serve as a model for future rapid data product deployments about biodiversity specimens. The project underscores the value of biodiversity data aggregators iDigBio and the Global Biodiversity Information Facility (GBIF), which are sources for 58,617 and 79,862 records, respectively, as of July 2020, of horseshoe bat and relative specimens held by over one hundred natural history collections. Although much of the specimen-based biodiversity data served by iDigBio and GBIF is high quality, it can be considered raw data and therefore often requires additional wrangling, standardizing, and enhancement to be fit for specific applications. The project will create efficiencies for the coronavirus research community by producing an enhanced, research-ready data product, which will be versioned and published through Zenodo, an open-access repository (see doi.org/10.5281/zenodo.3974999). In this talk, we highlight lessons learned from the initial phases of the project, including deduplicating specimen records, standardizing country information, and enhancing taxonomic information. We also report on our progress to date, related to enhancing information about agents (e.g., collectors or determiners) associated with these specimens, and to georeferencing specimen localities. We seek also to explore how much we can use the added agent information (i.e., ORCID iDs and Wikidata Q identifiers) to inform our georeferencing efforts and to support crediting those collecting and doing identifications. The project will georeference approximately one third of our specimen records, based on those lacking geospatial coordinates but containing textual locality descriptions. We furthermore provide an overview of our holistic approach to enhancing specimen records, which we hope will maximize the value of the bat specimens at the center of what has been recently termed the "extended specimen network" (Lendemer et al. 2020). The centrality of the physical specimen in the network reinforces the importance of archived materials for reproducible research. Recognizing this, we view the collections providing data to iDigBio and GBIF as essential partners, as we expect that they will be responsible for the long-term management of enhanced data associated with the physical specimens they curate. We hope that this project can provide a model for better facilitating the reintegration of enhanced data back into local specimen data management systems.


ZooKeys ◽  
2012 ◽  
Vol 209 ◽  
pp. 75-86 ◽  
Author(s):  
Riitta Tegelberg ◽  
Jaana Haapala ◽  
Tero Mononen ◽  
Mika Pajari ◽  
Hannu Saarenmaa

Digitarium is a joint initiative of the Finnish Museum of Natural History and the University of Eastern Finland. It was established in 2010 as a dedicated shop for the large-scale digitisation of natural history collections. Digitarium offers service packages based on the digitisation process, including tagging, imaging, data entry, georeferencing, filtering, and validation. During the process, all specimens are imaged, and distance workers take care of the data entry from the images. The customer receives the data in Darwin Core Archive format, as well as images of the specimens and their labels. Digitarium also offers the option of publishing images through Morphbank, sharing data through GBIF, and archiving data for long-term storage. Service packages can also be designed on demand to respond to the specific needs of the customer. The paper also discusses logistics, costs, and intellectual property rights (IPR) issues related to the work that Digitarium undertakes.


Author(s):  
Jeremy Miller ◽  
Donat Agosti ◽  
Marcus Guidoti ◽  
Francisco Andres Rivera Quiroz

Citing the specimens used to describe new species or augment existing taxa is integral to the scholarship of taxonomic and related biodiversity-oriented publications. These so-called material citations (Darwin Core Term MaterialCitation), linked to the natural history collections in which they are archived, are the mechanism by which readers may return to the source material upon which reported observations are based. This is integral to the scientific nature of the project of documenting global biodiversity. Material citation records typically contain such information as the location and date associated with the collection of a specimen, along with other data, and taxonomic identification. Thus, material citations are a key line of evidence for biodiversity informatics, along with other evidence classes such as database records of specimens archived in natural history collections, human observations not linked to specimens, and DNA sequences that may or may not be linked to a specimen. Natural history collections are not completely databased and records of some occurrences are only available as material citations. In other cases, material citations can be linked to the record of the physical specimen in a collections database. Taxonomic treatments, sections of publications documenting the features or distribution of a related group of organisms (Catapano 2019), may contain citations of DNA sequences, which can be linked to database records. There is potential for bidirectional linking that could contribute data elements or entire records to collections and DNA databases, based on content found in material citations. We compare material citations data to other major sources of biodiversity records (preserved specimens, human observations, and material samples). We present pilot project data that reconcile material citations with their database records, and track all material citations across the taxonomic history of a species.


Author(s):  
Matt Woodburn ◽  
Sarah Vincent ◽  
Helen Hardy ◽  
Clare Valentine

The natural science collections community has identified an increasing need for shared, structured and interoperable data standards that can be used to describe the totality of institutional collection holdings, whether digitised or not. Major international initiatives - including the Global Biodiversity Information Facility (GBIF), the Distributed System of Scientific Collections (DiSSCo) and the Consortium of European Taxonomic Facilities (CETAF) - consider the current lack of standards to be a major barrier, which must be overcome to further their strategic aims and contribute to an open, discoverable catalogue of global collections. The Biodiversity Information Standards (TDWG) Collection Descriptions (CD) group is looking to address this issue with a new data standard for collection descriptions. At an institutional level, this concept of collection descriptions aligns strongly with the need to use a structured and more data-driven approach to assessing and working with collections, both to identify and prioritise investment and effort, and to monitor the impact of the work. Use cases include planning conservation and collection moves, prioritising specimen digitisation activities, and informing collection development strategy. The data can be integrated with the collection description framework for ongoing assessments of the state of the collection. This approach was pioneered with the ‘Move the Dots’ methodology by the Smithsonian National Museum of Natural History, started in 2009 and run annually since. The collection is broken down into several hundred discrete subcollections, for each of which the number of objects was estimated and a numeric rank allocated according to a range of assessment criteria. This method has since been adopted by several other institutions, including Naturalis Biodiversity Centre, Museum für Naturkunde and Natural History Museum, London (NHM). First piloted in 2016, and now implemented as a core framework, the NHM’s adaptation, ‘Join the Dots’, divides the collection into approximately 2,600 ‘collection units’. The breakdown uses formal controlled lists and hierarchies, primarily taxonomy, type of object, storage location and (where relevant) stratigraphy, which are mapped to external authorities such as the Catalogue of Life and Paleobiology Database. The collection breakdown is enhanced with estimations of number of items, and ranks from 1 to 5 for each collection unit against 17 different criteria. These are grouped into four categories of ‘Condition’, ‘Information’ (including digital records), ‘Importance and Significance’ and ‘Outreach’. Although requiring significant time investment from collections staff to provide the estimates and assessments, this methodology has yielded a rich dataset that supports both discoverability (collection descriptions) and management (collection assessment). Links to further datasets about the building infrastructure and environmental conditions also make it into a powerful resource for planning activities such as collections moves, pest monitoring and building work. We have developed dynamic dashboards to provide rich visualisations for exploring, analysing and communicating the data. As an ongoing, embedded activity for collections staff, there will also be a build-up of historical data going forward, enabling us to see trends, track changes to the collection, and measure the impact of projects and events. The concept of Join the Dots also offers a generic, institution-agnostic model for enhancing the collection description framework with additional metrics that add value for strategic management and resourcing of the collection. In the design and implementation, we’ve faced challenges that should be highly relevant to the TDWG CD group, such as managing the dynamic breakdown of collections across multiple dimensions. We also face some that are yet to be resolved, such as a robust model for managing the evolving dataset over time. We intend to contribute these use cases into the development of the new TDWG data standard and be an early adopter and reference case. We envisage that this could constitute a common model that, where resources are available, provides the ability to add greater depth and utility to the world catalogue of collections.


2018 ◽  
Vol 2 ◽  
pp. e26473
Author(s):  
Molly Phillips ◽  
Anne Basham ◽  
Marc Cubeta ◽  
Kari Harris ◽  
Jonathan Hendricks ◽  
...  

Natural history collections around the world are currently being digitized with the resulting data and associated media now shared online in aggregators such as the Global Biodiversity Information Facility and Integrated Digitized Biocollections (iDigBio). These collections and their resources are accessible and discoverable through online portals to not only researchers and collections professionals, but to educators, students, and other potential downstream users. Primary and secondary education (K-12) in the United States is going through its own revolution with many states adopting Next Generation Science Standards (NGSS https://www.nextgenscience.org/). The new standards emphasize science practices for analyzing and interpreting data and connect to cross-cutting concepts such as cause and effect and patterns. NGSS and natural history collections data portals seem to complement each other. Nevertheless, many educators and students are unaware of the digital resources available or are overwhelmed with working in aggregated databases created by scientists. To better address this challenge, participants within the National Science Foundation Advancing Digitization for Biodiversity Collections program (ADBC) have been working to increase awareness of, and scaffold learning for, digitized collections with K-12 educators and learners. They are accomplishing this through individual programs at institutions across the country as part of the Thematic Collections Networks and collaboratively through the iDigBio Education and Outreach Working Group. ADBC partners have focused on incorporating digital data and resources into K-12 classrooms through training workshops and webinars for both educators and collections professionals, as well as through creating educational resources, websites, and applications that use digital collections data. This presentation includes lessons learned from engaging K-12 audiences with digital data, summarizes available resources for both educators and collections professionals, shares how to become involved, and provides ways to facilitate transfer of educational resources to the K-12 community.


2018 ◽  
Vol 2 ◽  
pp. e25839
Author(s):  
Lise Stork ◽  
Andreas Weber ◽  
Eulàlia Miracle ◽  
Katherine Wolstencroft

Geographical and taxonomical referencing of specimens and documented species observations from within and across natural history collections is vital for ongoing species research. However, much of the historical data such as field books, diaries and specimens, are challenging to work with. They are computationally inaccessable, refer to historical place names and taxonomies, and are written in a variety of languages. In order to address these challenges and elucidate historical species observation data, we developed a workflow to (i) crowd-source semantic annotations from handwritten species observations, (ii) transform them into RDF (Resource Description Framework) and (iii) store and link them in a knowledge base. Instead of full-transcription we directly annotate digital field books scans with key concepts that are based on Darwin Core standards. Our workflow stresses the importance of verbatim annotation. The interpretation of the historical content, such a resolving a historical taxon to a current one, can be done by individual researchers after the content is published as linked open data. Through the storage of annotion provenance, who created the annotation and when, we allow multiple interpretations of the content to exist in parallel, stimulating scientific discourse. The semantic annotation process is supported by a web application, the Semantic Field Book (SFB)-Annotator, driven by an application ontology. The ontology formally describes the content and meta-data required to semantically annotate species observations. It is based on the Darwin Core standard (DwC), Uberon and the Geonames ontology. The provenance of annotations is stored using the Web Annotation Data Model. Adhering to the principles of FAIR (Findable, Accessible, Interoperable & Reusable) and Linked Open Data, the content of the specimen collections can be interpreted homogeneously and aggregated across datasets. This work is part of the Making Sense project: makingsenseproject.org. The project aims to disclose the content of a natural history collection: a 17,000 page account of the exploration of the Indonesian Archipelago between 1820 and 1850 (Natuurkundige Commissie voor Nederlands-Indie) With a knowledge base, researchers are given easy access to the primary sources of natural history collections. For their research, they can aggregate species observations, construct rich queries to browse through the data and add their own interpretations regarding the meaning of the historical content.


Author(s):  
Donald Hobern ◽  
Deborah L Paul ◽  
Tim Robertson ◽  
Quentin Groom ◽  
Barbara Thiers ◽  
...  

Information about natural history collections helps to map the complex landscape of research resources and assists researchers in locating and contacting the holders of specimens. Collection records contribute to the development of a fully interlinked biodiversity knowledge graph (Page 2016), showcasing the existence and importance of museums and herbaria and supplying context to available data on specimens. These records also potentially open new avenues for fresh use of these collections and for accelerating their full availability online. A number of international (e.g., Index Herbariorum, GRSciColl) regional (e.g. DiSSCo and CETAF) national (e.g., ALA and the Living Atlases, iDigBio US Collections Catalog) and institutional networks (e.g., The Field Museum) separately document subsets of the world's collections, and the Biodiversity Information Standards (TDWG) Collection Descriptions Interest Group is actively developing standards to support information sharing on collections. However, these efforts do not yet combine to deliver a comprehensive and connected view of all collections globally. The Global Biodiversity Information Facility (GBIF) received funding as part of the European Commission-funded SYNTHESYS+ 7 project to explore development of a roadmap towards delivering such a view, in part as a contribution towards the establishment of DiSSCo services within a global ecosystem of collection catalogues. Between 17 and 29 April 2020, a coordination team comprising international representatives from multiple networks ran Advancing the Catalogue of the World’s Natural History Collections, a fully online consultation using the GBIF Discourse forum platform to guide discussion around 26 consultation topics identified in an initial Ideas Paper (Hobern et al. 2020). Discussions included support for contributions in Spanish, Chinese and French and were summarised daily throughout the consultation. The consultation confirmed broad agreement around the needs and goals for a comprehensive catalogue of the world’s natural history collections, along with possible strategies to overcome the challenges. This presentation will summarise the results and recommendations.


Author(s):  
Tim Robertson ◽  
Marcos Gonzalez ◽  
Morten Høfft ◽  
Marie Grosjean

The Global Biodiversity Information Facility (GBIF) was established by governments in 2001, largely through the initiative and leadership of the natural history collections community, following the 1999 recommendation by a working group under the Megascience Forum (predecessor of the Global Science Forum) of the Organization for Economic Cooperation and Development (OECD). Over 20 years, GBIF has helped develop standards and convened a global community of data-publishing institutions, aggregating over one billion species occurrence records freely and openly available for use in research and policy making. Of these more than 150 million records originate from specimens preserved by the collections community. The recent adoption of the Global Registry of Scientific Collections by GBIF (https://www.gbif.org/news/5kyAslpqTVxYqZTwYn1cub) is the first step by GBIF to better enable a picture of the natural history collections of the world along with the associated science that they have and continue to enable. Recognising that other collection metadata initiatives exists, GBIF aims to discuss with the community and progress topics such as: Synchronising with existing metadata catalogues to ensure accurate, up-to-date information is available without unnecessary burden for authors Defining, testing and formalizing the Collection Descriptions standard (https://github.com/tdwg/cd) Providing clear guidelines of citation practice for collections, potentially building on the success of the Digital Object Identifier (DOI) approach used for datasets mediated through GBIF.org. Tracking citations of use through both data downloads and through references in literature, such as materials examined in a taxonomic publication Improving the linkages and discoverability of specimen records derived from the same collecting event but preserved in multiple institutions Improving the linkages between the people involved in collecting, preserving, and identifying specimen records through the use of Open Researcher and Contributor IDs (ORCID) Lowering the technical threshold to deploy tools such as “data dashboards” and specimen search/download on collection related websites Synchronising with existing metadata catalogues to ensure accurate, up-to-date information is available without unnecessary burden for authors Defining, testing and formalizing the Collection Descriptions standard (https://github.com/tdwg/cd) Providing clear guidelines of citation practice for collections, potentially building on the success of the Digital Object Identifier (DOI) approach used for datasets mediated through GBIF.org. Tracking citations of use through both data downloads and through references in literature, such as materials examined in a taxonomic publication Improving the linkages and discoverability of specimen records derived from the same collecting event but preserved in multiple institutions Improving the linkages between the people involved in collecting, preserving, and identifying specimen records through the use of Open Researcher and Contributor IDs (ORCID) Lowering the technical threshold to deploy tools such as “data dashboards” and specimen search/download on collection related websites The progress made to date will be summarised and a roadmap for the future will be introduced.


Author(s):  
Niels Raes ◽  
Emily van Egmond ◽  
Ana Casino ◽  
Matt Woodburn ◽  
Deborah L Paul

With digitisation of natural history collections over the past decades, their traditional roles — for taxonomic studies and public education — have been greatly expanded into the fields of biodiversity assessments, climate change impact studies, trait analyses, sequencing, 3D object analyses etc. (Nelson and Ellis 2019; Watanabe 2019). Initial estimates of the global natural history collection range between 1.2 and 2.1 billion specimens (Ariño 2010), of which 169 million (8-14% - as of April 2019) are available at some level of digitisation through the Global Biodiversity Information Facility (GBIF). With iDigBio (Integrated Digitized Biocollections) established in the United States and with the European DiSSCo (Distributed Systems of Scientific Collections) accepted on the ESFRI roadmap, it has become a priority to digitize natural history collections at an industrialized scale. Both iDigBio and DiSSCo aim at mobilising, unifying and delivering bio- and geo-diversity information at the scale, form and precision required by scientific communities, and thereby transform a fragmented landscape into a coherent and responsive research infrastructure. In order to prioritise digitisation based on scientific demand, and efficiency using industrial digitisation pipelines, it is required to arrive at a uniform and unambiguously accepted collection description standard that would allow comparing, grouping and analysing natural history collections at diverse levels. Several initiatives attempt to unambiguously describe natural history collections using taxonomic and storage classification schemes. These initiatives include One World Collection, Global Registry of Scientific Collections (GRSciColl), TDWG (Taxonomic Databases Working Group) Natural Collection Descriptions (NCD) and CETAF (Consortium of European Taxonomy Facilities) passports, among others. In a collaborative effort of DiSSCo, ICEDIG (Innovation and consolidation for large scale digitisation of natural heritage), iDigBio, TDWG and the Task Group Collection Digitisation Dashboards, the various schemes were compared in a cross-walk analysis to propose a preliminary natural collection description standard that is supported by the wider community. In the process, two main user groups of collection descriptions standards were identified; scientists and collection managers. The classification produced intends to meet requirements from them both, resulting in three classification schemes that exist in parallel to each other (van Egmond et al. 2019). For scientific purposes a ‘Taxonomic’ and ‘Stratigraphic’ classification were defined, and for management purposes a ‘Storage’ classification. The latter is derived from specimen preservation types (e.g. dried, liquid preserved) defining storage requirements and the physical location of specimens in collection holding facilities. The three parallel collection classifications can be cross-sectioned with a ‘Geographic’ classification to assign sub-collections to major terrestrial and marine regions, which allow scientists to identify particular taxonomic or stratigraphic (sub-)collections from major geographical or marine regions of interest. Finally, to measure the level of digitisation of institutional collections and progress of digitisation through time, the number of digitised specimens for each geographically cross-sectioned (sub-)collection can be derived from institutional collection management systems (CMS). As digitisation has different levels of completeness a ‘Digitisation’ scheme has been adopted to quantify the level of digitisation of a collection from Saarenmaa et al. 2019, ranging from ‘not digitised’ to extensively digitised, recorded in a progressive scale of MIDS (Minimal Information for Digital Specimen). The applicability of this preliminary classification will be discussed and visualized in a Collection Digitisation Dashboards (CDD) to demonstrate how the implementation of a collection description standard allows the identification of existing gaps in taxonomic and geographic coverage and levels of digitisation of natural history collections. This set of common classification schemes and dashboard design (van Egmond et al. 2019) will be contributed to the TDWG Collection Description interest group to ultimately arrive at the common goal of a 'World Collection Catalogue'.


Sign in / Sign up

Export Citation Format

Share Document