scholarly journals Creating a National Biodiversity Database in Gabon and the Challenges of Mobilizing Natural History Data for Francophone Countries

Author(s):  
Elie Tobi ◽  
Geovanne Aymar Nziengui Djiembi ◽  
Anna Feistner ◽  
Donald Midoko Iponga ◽  
Jean Felicien Liwouwou ◽  
...  

Language is a major barrier for researchers wanting to digitize and publish collection data in Africa. Despite being the fifth most spoken language on Earth and the second most common in Africa, resources in French about digitization, data management, and publishing are lacking. Furthermore, French-speaking regions of Africa (primarily Central/West Africa and Madagascar) host some of the highest biodiversity on the continent and therefore are of great importance to scientists and decision-makers. Without having representation in online portals like the Global Biodiversity Information Facility (GBIF) and Integrated Digitized Biocollections (iDigBio), these important collections are effectively invisible. Producing relevant/applicable resources about digitization in French will help shine a light on these valuable natural history records and allow the data-holders in Africa to retain the autonomy of their collections. Awarded a GBIF-BID (Biodiversity Information for Development) grant in 2021, an international, multilingual network of partners has undertaken the important task of digitizing and mobilizing Gabon’s vertebrate collections. There are an estimated 13,500 vertebrate specimens housed in five institutions in different parts of Gabon. To date, the group has mobilized >4,600 vertebrate records to our recently launched Gabon Biodiversity Portal (https://gabonbiota.org/). The portal also hosts French guides for using Symbiota-based portals to manage, georeference, and publish natural history databases. These resources can provide much-needed guidance for other Francophone countries⁠—in Africa and beyond⁠—working to maximize the accessibility and value of their biodiversity collections.

Author(s):  
Matt Woodburn ◽  
Sarah Vincent ◽  
Helen Hardy ◽  
Clare Valentine

The natural science collections community has identified an increasing need for shared, structured and interoperable data standards that can be used to describe the totality of institutional collection holdings, whether digitised or not. Major international initiatives - including the Global Biodiversity Information Facility (GBIF), the Distributed System of Scientific Collections (DiSSCo) and the Consortium of European Taxonomic Facilities (CETAF) - consider the current lack of standards to be a major barrier, which must be overcome to further their strategic aims and contribute to an open, discoverable catalogue of global collections. The Biodiversity Information Standards (TDWG) Collection Descriptions (CD) group is looking to address this issue with a new data standard for collection descriptions. At an institutional level, this concept of collection descriptions aligns strongly with the need to use a structured and more data-driven approach to assessing and working with collections, both to identify and prioritise investment and effort, and to monitor the impact of the work. Use cases include planning conservation and collection moves, prioritising specimen digitisation activities, and informing collection development strategy. The data can be integrated with the collection description framework for ongoing assessments of the state of the collection. This approach was pioneered with the ‘Move the Dots’ methodology by the Smithsonian National Museum of Natural History, started in 2009 and run annually since. The collection is broken down into several hundred discrete subcollections, for each of which the number of objects was estimated and a numeric rank allocated according to a range of assessment criteria. This method has since been adopted by several other institutions, including Naturalis Biodiversity Centre, Museum für Naturkunde and Natural History Museum, London (NHM). First piloted in 2016, and now implemented as a core framework, the NHM’s adaptation, ‘Join the Dots’, divides the collection into approximately 2,600 ‘collection units’. The breakdown uses formal controlled lists and hierarchies, primarily taxonomy, type of object, storage location and (where relevant) stratigraphy, which are mapped to external authorities such as the Catalogue of Life and Paleobiology Database. The collection breakdown is enhanced with estimations of number of items, and ranks from 1 to 5 for each collection unit against 17 different criteria. These are grouped into four categories of ‘Condition’, ‘Information’ (including digital records), ‘Importance and Significance’ and ‘Outreach’. Although requiring significant time investment from collections staff to provide the estimates and assessments, this methodology has yielded a rich dataset that supports both discoverability (collection descriptions) and management (collection assessment). Links to further datasets about the building infrastructure and environmental conditions also make it into a powerful resource for planning activities such as collections moves, pest monitoring and building work. We have developed dynamic dashboards to provide rich visualisations for exploring, analysing and communicating the data. As an ongoing, embedded activity for collections staff, there will also be a build-up of historical data going forward, enabling us to see trends, track changes to the collection, and measure the impact of projects and events. The concept of Join the Dots also offers a generic, institution-agnostic model for enhancing the collection description framework with additional metrics that add value for strategic management and resourcing of the collection. In the design and implementation, we’ve faced challenges that should be highly relevant to the TDWG CD group, such as managing the dynamic breakdown of collections across multiple dimensions. We also face some that are yet to be resolved, such as a robust model for managing the evolving dataset over time. We intend to contribute these use cases into the development of the new TDWG data standard and be an early adopter and reference case. We envisage that this could constitute a common model that, where resources are available, provides the ability to add greater depth and utility to the world catalogue of collections.


Author(s):  
Katharine Barker ◽  
Jonas Astrin ◽  
Gabriele Droege ◽  
Jonathan Coddington ◽  
Ole Seberg

Most successful research programs depend on easily accessible and standardized research infrastructures. Until recently, access to tissue or DNA samples with standardized metadata and of a sufficiently high quality, has been a major bottleneck for genomic research. The Global Geonome Biodiversity Network (GGBN) fills this critical gap by offering standardized, legal access to samples. Presently, GGBN’s core activity is enabling access to searchable DNA and tissue collections across natural history museums and botanic gardens. Activities are gradually being expanded to encompass all kinds of biodiversity biobanks such as culture collections, zoological gardens, aquaria, arboreta, and environmental biobanks. Broadly speaking, these collections all provide long-term storage and standardized public access to samples useful for molecular research. GGBN facilitates sample search and discovery for its distributed member collections through a single entry point. It stores standardized information on mostly geo-referenced, vouchered samples, their physical location, availability, quality, and the necessary legal information on over 50,000 species of Earth’s biodiversity, from unicellular to multicellular organisms. The GGBN Data Portal and the GGBN Data Standard are complementary to existing infrastructures such as the Global Biodiversity Information Facility (GBIF) and International Nucleotide Sequence Database (INSDC). Today, many well-known open-source collection management databases such as Arctos, Specify, and Symbiota, are implementing the GGBN data standard. GGBN continues to increase its collections strategically, based on the needs of the research community, adding over 1.3 million online records in 2018 alone, and today two million sample data are available through GGBN. Together with Consortium of European Taxonomic Facilities (CETAF), Society for the Preservation of Natural History Collections (SPNHC), Biodiversity Information Standards (TDWG), and Synthesis of Systematic Resources (SYNTHESYS+), GGBN provides best practices for biorepositories on meeting the requirements of the Nagoya Protocol on Access and Benefit Sharing (ABS). By collaboration with the Biodiversity Heritage Library (BHL), GGBN is exploring options for tagging publications that reference GGBN collections and associated specimens, made searchable through GGBN’s document library. Through its collaborative efforts, standards, and best practices GGBN aims at facilitating trust and transparency in the use of genetic resources.


Author(s):  
Leonor Venceslau ◽  
Luis Lopes

Major efforts are being made to digitize natural history collections to make these data available online for retrieval and analysis (Beaman and Cellinese 2012). Georeferencing, an important part of the digitization process, consists of obtaining geographic coordinates from a locality description. In many natural history collection specimens, the coordinates of the sampling location are not recorded, rather they contain a description of the site. Inaccurate georeferencing of sampling locations negatively impacts data quality and the accuracy of any geographic analysis on those data. In addition to latitude and longitude, it is important to define a degree of uncertainty of the coordinates, since in most cases it is impossible to pinpoint the exact location retrospectively. This is usually done by defining an uncertainty value represented as a radius around the center of the locality where the sampling took place. Georeferencing is a time-consuming process requiring manual validation; as such, a significant part of all natural history collection data available online are not georeferenced. Of the 161 million records of preserved specimens currently available in the Global Biodiversity Information Facility (GBIF), only 86 million (53.4%) include coordinates. It is therefore important to develop and optimize automatic tools that allow a fast and accurate georeferencing. The objective of this work was to test existing automatic georeferencing services and evaluate their potential to accelerate georeferencing of large collection datasets. For this end, several open-source georeferencing services are currently available, which provide an application programming interface (API) for batch georeferencing. We evaluated five programs: Google Maps, MapQuest, GeoNames, OpenStreetMap, and GEOLocate. A test dataset of 100 records (reference dataset), which had been previously individually georreferenced following Chapman and Wieczorek 2006, was randomly selected from the Museu Nacional de História Natural e da Ciência, Universidade de Lisboa insect collection catalogue (Lopes et al. 2016). An R (R Core Team 2018) script was used to georeference these records using the five services. In cases where multiple results were returned, only the first one was considered and compared with the manually obtained coordinates of the reference dataset. Two factors were considered in evaluating accuracy: Total number of results obtained and Distance to the original location in the reference dataset. Total number of results obtained and Distance to the original location in the reference dataset. Of the five programs tested, Google Maps yielded the most results (99) and was the most accurate with 57 results < 1000 m from the reference location and 79 within the uncertainty radius. GEOLocate provided results for 87 locations, of which 47 were within 1000 m of the correct location, and 57 were within the uncertainty radius. The other 3 services tested all had less than 35 results within 1000 m from the reference location, and less than 50 results within the uncertainty radius. Google Maps and Open Street Map had the lowest average distance from the reference location, both around 5500 m. Google Maps has a usage limit of around 40000 free georeferencing requests per month, beyond which the service is paid, while GEOLocate is free with no usage limit. For large collections, this may be a factor to take into account. In the future, we hope to optimize these methods and test them with larger datasets.


2021 ◽  
Vol 9 ◽  
Author(s):  
Sónia Ferreira ◽  
Pjotr Oosterbroek ◽  
Jaroslav Starý ◽  
Pedro Sousa ◽  
Vanessa Mata ◽  
...  

The InBIO Barcoding Initiative (IBI) Diptera 02 dataset contains records of 412 crane fly specimens belonging to the Diptera families: Limoniidae, Pediciidae and Tipulidae. This dataset is the second release by IBI on Diptera and it greatly increases the knowledge on the DNA barcodes and distribution of crane flies from Portugal. All specimens were collected in Portugal, including six specimens from the Azores and Madeira archipelagos. Sampling took place from 2003 to 2019. Specimens have been morphologically identified to species level by taxonomists and belong to 83 species in total. The species, represented in this dataset, correspond to about 55% of all the crane fly species known from Portugal and 22% of crane fly species known from the Iberian Peninsula. All DNA extractions and most specimens are deposited in the IBI collection at CIBIO, Research Center in Biodiversity and Genetic Resources. Fifty-three species were new additions to the Barcode of Life Data System (BOLD), with another 18 species' barcodes added from under-represented species in BOLD. Furthermore, the submitted sequences were found to cluster in 88 BINs, 54 of which were new to BOLD. All specimens have their DNA barcodes publicly accessible through BOLD online database and its collection data can be accessed through the Global Biodiversity Information Facility (GBIF). One species, Gonomyia tenella (Limoniidae), is recorded for the first time from Portugal, raising the number of crane flies recorded in the country to 145 species.


2021 ◽  
Vol 9 ◽  
Author(s):  
Domingos Sandramo ◽  
Enrico Nicosia ◽  
Silvio Cianciullo ◽  
Bernardo Muatinte ◽  
Almeida Guissamulo

The collections of the Natural History Museum of Maputo have a crucial role in the safeguarding of Mozambique's biodiversity, representing an important repository of data and materials regarding the natural heritage of the country. In this paper, a dataset is described, based on the Museum’s Entomological Collection recording 409 species belonging to seven orders and 48 families. Each specimen’s available data, such as geographical coordinates and taxonomic information, have been digitised to build the dataset. The specimens included in the dataset were obtained between 1914–2018 by collectors and researchers from the Natural History Museum of Maputo (once known as “Museu Alváro de Castro”) in all the country’s provinces, with the exception of Cabo Delgado Province. This paper adds data to the Biodiversity Network of Mozambique and the Global Biodiversity Information Facility, within the objectives of the SECOSUD II Project and the Biodiversity Information for Development Programme. The aforementioned insect dataset is available on the GBIF Engine data portal (https://doi.org/10.15468/j8ikhb). Data were also shared on the Mozambican national portal of biodiversity data BioNoMo (https://bionomo.openscidata.org), developed by SECOSUD II Project.


Author(s):  
Franck Theeten ◽  
Marielle Adam ◽  
Thomas Vandenberghe ◽  
Mathias Dillen ◽  
Patrick Semal ◽  
...  

The Royal Belgian Institute of Natural Sciences (RBINS), the Royal Museum for Central Africa (RMCA) and Meise Botanic Garden house more than 50 million specimens covering all fields of natural history. While many different research topics have their own specificities, throughout the years it became apparent that with regards to collection data management, data publication and exchange via community standards, collection holding institutions face similar challenges (James et al. 2018, Rocha et al. 2014). In the past, these have been tackled in different ways by Belgian natural history institutions. In addition to local and national collaborations, there is a great need for a joint structure to share data between scientific institutions in Europe and beyond. It is the aim of large networks and infrastructures such as the Global Biodiversity Information Facility (GBIF), the Biodiversity Information Standards (TDWG), the Distributed System of Scientific collections (DiSSCo) and the Consortium of European Taxonomic Facilities (CETAF) to further implement and improve these efforts, thereby gaining ever increasing efficiencies. In this context, the three institutions mentioned above, submitted the NaturalHeritage project (http://www.belspo.be/belspo/brain-be/themes_3_HebrHistoScien_en.stm) granted in 2017 by the Belgian Science Policy Service, which runs from 2017 to 2020. The project provides links among databases and services. The unique qualities of each database are maintained, while the information can be concentrated and exposed in a structured way via one access point. This approach aims also to link data that are unconnected at present (e.g. relationship between soil/substrate, vegetation and associated fauna) and to improve the cross-validation of data. (1) The NaturalHeritage prototype (http://www.naturalheritage.be) is a shared research portal with an open access infrastructure, which is still in the development phase. Its backbone is an ElasticSearch catalogue, with Kibana, and a Python aggregator gathering several types of (re)sources: relational databases, REpresentational State Transfer (REST) services of objects databases and bibliographical data, collections metadata and the GBIF Internet Publishing Toolkit (IPT) for observational and taxonomical data. Semi-structured data in English are semantically analysed and linked to a rich autocomplete mechanism. Keywords and identifiers are indexed and grouped in four categories (“what”, “who”, “where”, “when”). The portal can act also as an Open Archives Initiatives Protocol for Metadata Harvesting (OAI-PMH) service and ease indexing of the original webpage on the internet with microdata enrichment. (2) The collection data management system of DaRWIN (Data Research Warehouse Information Network) of RBINS and RMCA has been improved as well. External (meta)data requirements, i.e. foremost publication into or according to the practices and standards of GBIF and OBIS (Ocean Biogeographic Information System: https://obis.org) for biodiversity data, and INSPIRE (https://inspire.ec.europa.eu) for geological data, have been identified and evaluated. New and extended data structures have been created to be compliant with these standards, as well as the necessary procedures developed to expose the data. Quality control tools for taxonomic and geographic names have been developed. Geographic names can be hard to confirm as their lack of context often requires human validation. To address this a similarity measure is used to help map the result. Species, locations, sampling devices and other properties have been mapped to the World Register of Marine Species and DarwinCore (http://www.marinespecies.org), Marine Regions and GeoNames, the AGRO Agronomy and Vertebrate trait ontologies and the British Oceanographic Data Centre (BODC) vocabularies (http://www.obofoundry.org/ontology/agro.html). Extensive mapping is necessary to make use of the ExtendedMeasurementOrFact Extension of DarwinCore (https://tools.gbif.org/dwca-validator/extensions.do). External (meta)data requirements, i.e. foremost publication into or according to the practices and standards of GBIF and OBIS (Ocean Biogeographic Information System: https://obis.org) for biodiversity data, and INSPIRE (https://inspire.ec.europa.eu) for geological data, have been identified and evaluated. New and extended data structures have been created to be compliant with these standards, as well as the necessary procedures developed to expose the data. Quality control tools for taxonomic and geographic names have been developed. Geographic names can be hard to confirm as their lack of context often requires human validation. To address this a similarity measure is used to help map the result. Species, locations, sampling devices and other properties have been mapped to the World Register of Marine Species and DarwinCore (http://www.marinespecies.org), Marine Regions and GeoNames, the AGRO Agronomy and Vertebrate trait ontologies and the British Oceanographic Data Centre (BODC) vocabularies (http://www.obofoundry.org/ontology/agro.html). Extensive mapping is necessary to make use of the ExtendedMeasurementOrFact Extension of DarwinCore (https://tools.gbif.org/dwca-validator/extensions.do).


2018 ◽  
Vol 2 ◽  
pp. e26060
Author(s):  
Pamela Soltis

Digitized natural history data are enabling a broad range of innovative studies of biodiversity. Large-scale data aggregators such as Global Biodiversity Information facility (GBIF) and Integrated Digitized Biocollections (iDigBio) provide easy, global access to millions of specimen records contributed by thousands of collections. A developing community of eager users of specimen data – whether locality, image, trait, etc. – is perhaps unaware of the effort and resources required to curate specimens, digitize information, capture images, mobilize records, serve the data, and maintain the infrastructure (human and cyber) to support all of these activities. Tracking of specimen information throughout the research process is needed to provide appropriate attribution to the institutions and staff that have supplied and served the records. Such tracking may also allow for annotation and comment on particular records or collections by the global community. Detailed data tracking is also required for open, reproducible science. Despite growing recognition of the value and need for thorough data tracking, both technical and sociological challenges continue to impede progress. In this talk, I will present a brief vision of how application of a DOI to each iteration of a data set in a typical research project could provide attribution to the provider, opportunity for comment and annotation of records, and the foundation for reproducible science based on natural history specimen records. Sociological change – such as journal requirements for data deposition of all iterations of a data set – can be accomplished using community meetings and workshops, along with editorial efforts, as were applied to DNA sequence data two decades ago.


Author(s):  
Donald Hobern ◽  
Deborah L Paul ◽  
Tim Robertson ◽  
Quentin Groom ◽  
Barbara Thiers ◽  
...  

Information about natural history collections helps to map the complex landscape of research resources and assists researchers in locating and contacting the holders of specimens. Collection records contribute to the development of a fully interlinked biodiversity knowledge graph (Page 2016), showcasing the existence and importance of museums and herbaria and supplying context to available data on specimens. These records also potentially open new avenues for fresh use of these collections and for accelerating their full availability online. A number of international (e.g., Index Herbariorum, GRSciColl) regional (e.g. DiSSCo and CETAF) national (e.g., ALA and the Living Atlases, iDigBio US Collections Catalog) and institutional networks (e.g., The Field Museum) separately document subsets of the world's collections, and the Biodiversity Information Standards (TDWG) Collection Descriptions Interest Group is actively developing standards to support information sharing on collections. However, these efforts do not yet combine to deliver a comprehensive and connected view of all collections globally. The Global Biodiversity Information Facility (GBIF) received funding as part of the European Commission-funded SYNTHESYS+ 7 project to explore development of a roadmap towards delivering such a view, in part as a contribution towards the establishment of DiSSCo services within a global ecosystem of collection catalogues. Between 17 and 29 April 2020, a coordination team comprising international representatives from multiple networks ran Advancing the Catalogue of the World’s Natural History Collections, a fully online consultation using the GBIF Discourse forum platform to guide discussion around 26 consultation topics identified in an initial Ideas Paper (Hobern et al. 2020). Discussions included support for contributions in Spanish, Chinese and French and were summarised daily throughout the consultation. The consultation confirmed broad agreement around the needs and goals for a comprehensive catalogue of the world’s natural history collections, along with possible strategies to overcome the challenges. This presentation will summarise the results and recommendations.


Author(s):  
Laurence Bénichou ◽  
Isabelle Gerard ◽  
Chloé Chester ◽  
Donat Agosti

The European Journal of Taxonomy (EJT) was initiated by a consortium of European natural history publishers to take advantage of the shift from paper to electronic-only publishing (Benichou et al. 2011). Whilst originally publishing in PDF format has been considered the state of the art, it became recently obvious that complementary dissemination channels help to disseminate taxonomic data - one of the pillars of Natural History institutions research - more widely and efficiently (Côtez et al. 2018). The adoption of semantic markup and assignment of persistent identifiers for content allow more comprehensive citations of the article, including elements therein, such as images, taxonomic treatments, and materials citation. It also allows more in-depth analyses and visualization of the contribution of collections, authors, or specimens to taxonomic output and third parties, such as the Global Biodiversity Information Facility, for reuse of the data or building the catalogue of life. In this presentation, EJT will be used to outline the nature of natural history publishers and their technical set up. This is followed by a description of the post-publishing workflow using the Plazi workflow and dissemination via the Biodiversity Literature Repository (BLR) and TreatmentBank. It outlines switching the publishing workflow to an increased use of extended markup language (XML) and visualization of the output and concludes by publishing guidelines that enable more efficient text and data mining of the content of taxonomic publications.


Author(s):  
Elspeth Haston ◽  
Lorna Mitchell

The specimens held in natural history collections around the world are the direct result of the effort of thousands of people over hundreds of years. However, the way that the names of these people have been recorded within the collections has never been fully standardised, and this makes the process of correctly assigning the event relating to the specimen to an individual difficult at best, and impossible at worst. The events in which people are related to specimens include collecting, identifying, naming, loaning and owning. Whilst there are resources in the botanical community that hold information on many collectors and authors of plant names, the residual number of unknown people and the effort required to disambiguate them is daunting. Moreover, in many cases, the work carried out within the collection to disambiguate the names relating to the specimens is often not recorded and made available, generally due to the lack of a system to do so. This situation is making it extremely difficult to search for collections within the main aggregators, such as GBIF —the Global Biodiversity Information Facility— , and severely hampers our ability to link collections both within and between institutes and disciplines. When we look at benefits of linking collections and people, the need to agree and implement a system of managing people names becomes increasingly urgent.


Sign in / Sign up

Export Citation Format

Share Document