scholarly journals Comparison of Automated Georeferencing Tools Using Insect Collection Data

Author(s):  
Leonor Venceslau ◽  
Luis Lopes

Major efforts are being made to digitize natural history collections to make these data available online for retrieval and analysis (Beaman and Cellinese 2012). Georeferencing, an important part of the digitization process, consists of obtaining geographic coordinates from a locality description. In many natural history collection specimens, the coordinates of the sampling location are not recorded, rather they contain a description of the site. Inaccurate georeferencing of sampling locations negatively impacts data quality and the accuracy of any geographic analysis on those data. In addition to latitude and longitude, it is important to define a degree of uncertainty of the coordinates, since in most cases it is impossible to pinpoint the exact location retrospectively. This is usually done by defining an uncertainty value represented as a radius around the center of the locality where the sampling took place. Georeferencing is a time-consuming process requiring manual validation; as such, a significant part of all natural history collection data available online are not georeferenced. Of the 161 million records of preserved specimens currently available in the Global Biodiversity Information Facility (GBIF), only 86 million (53.4%) include coordinates. It is therefore important to develop and optimize automatic tools that allow a fast and accurate georeferencing. The objective of this work was to test existing automatic georeferencing services and evaluate their potential to accelerate georeferencing of large collection datasets. For this end, several open-source georeferencing services are currently available, which provide an application programming interface (API) for batch georeferencing. We evaluated five programs: Google Maps, MapQuest, GeoNames, OpenStreetMap, and GEOLocate. A test dataset of 100 records (reference dataset), which had been previously individually georreferenced following Chapman and Wieczorek 2006, was randomly selected from the Museu Nacional de História Natural e da Ciência, Universidade de Lisboa insect collection catalogue (Lopes et al. 2016). An R (R Core Team 2018) script was used to georeference these records using the five services. In cases where multiple results were returned, only the first one was considered and compared with the manually obtained coordinates of the reference dataset. Two factors were considered in evaluating accuracy: Total number of results obtained and Distance to the original location in the reference dataset. Total number of results obtained and Distance to the original location in the reference dataset. Of the five programs tested, Google Maps yielded the most results (99) and was the most accurate with 57 results < 1000 m from the reference location and 79 within the uncertainty radius. GEOLocate provided results for 87 locations, of which 47 were within 1000 m of the correct location, and 57 were within the uncertainty radius. The other 3 services tested all had less than 35 results within 1000 m from the reference location, and less than 50 results within the uncertainty radius. Google Maps and Open Street Map had the lowest average distance from the reference location, both around 5500 m. Google Maps has a usage limit of around 40000 free georeferencing requests per month, beyond which the service is paid, while GEOLocate is free with no usage limit. For large collections, this may be a factor to take into account. In the future, we hope to optimize these methods and test them with larger datasets.

PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0261130
Author(s):  
Anton Güntsch ◽  
Quentin Groom ◽  
Marcus Ernst ◽  
Jörg Holetschek ◽  
Andreas Plank ◽  
...  

Natural history collection data available digitally on the web have so far only made limited use of the potential of semantic links among themselves and with cross-disciplinary resources. In a pilot study, botanical collections of the Consortium of European Taxonomic Facilities (CETAF) have therefore begun to semantically annotate their collection data, starting with data on people, and to link them via a central index system. As a result, it is now possible to query data on collectors across different collections and automatically link them to a variety of external resources. The system is being continuously developed and is already in production use in an international collection portal.


2021 ◽  
Vol 9 ◽  
Author(s):  
Fernanda Herrera Mesías ◽  
Alexander Weigand

Museums and other institutions curating natural history collections (NHCs) are fundamental entities to many scientific disciplines, as they house data and reference material for varied research projects. As such, biological specimens preserved in NHCs represent accessible physical records of the living world's history. They provide useful information regarding the presence and distribution of different taxonomic groups through space and time. Despite the importance of biological museum specimens, their potential to answer scientific questions, pertinent to the necessities of our current historical context, is often under-explored. The currently-known wild bee fauna of Luxembourg comprises 341 registered species distributed amongst 38 different genera. However, specimens stored in the archives of local NHCs represent an untapped resource to update taxonomic lists, including potentially overlooked findings relevant to the development of national conservation strategies. We re-investigated the wild bee collection of the Zoology Department of the National Museum of Natural History Luxembourg by using morphotaxonomy and DNA barcoding. The collection revision led to the discovery of four species so far not described for the country: Andrena lagopus (Latreille, 1809), Nomada furva (Panzer, 1798), Hoplitis papaveris (Latreille, 1799) and Sphecodes majalis (Pérez, 1903). Additionally, the presence of Nomada sexfasciata (Panzer, 1799), which inexplicably had been omitted by the most current species list, can be re-confirmed. Altogether, our findings increase the number of recorded wild bee species in Luxembourg to 346. Moreover, the results highlight the crucial role of NHCs as repositories of our knowledge of the natural world.


Author(s):  
Elie Tobi ◽  
Geovanne Aymar Nziengui Djiembi ◽  
Anna Feistner ◽  
Donald Midoko Iponga ◽  
Jean Felicien Liwouwou ◽  
...  

Language is a major barrier for researchers wanting to digitize and publish collection data in Africa. Despite being the fifth most spoken language on Earth and the second most common in Africa, resources in French about digitization, data management, and publishing are lacking. Furthermore, French-speaking regions of Africa (primarily Central/West Africa and Madagascar) host some of the highest biodiversity on the continent and therefore are of great importance to scientists and decision-makers. Without having representation in online portals like the Global Biodiversity Information Facility (GBIF) and Integrated Digitized Biocollections (iDigBio), these important collections are effectively invisible. Producing relevant/applicable resources about digitization in French will help shine a light on these valuable natural history records and allow the data-holders in Africa to retain the autonomy of their collections. Awarded a GBIF-BID (Biodiversity Information for Development) grant in 2021, an international, multilingual network of partners has undertaken the important task of digitizing and mobilizing Gabon’s vertebrate collections. There are an estimated 13,500 vertebrate specimens housed in five institutions in different parts of Gabon. To date, the group has mobilized >4,600 vertebrate records to our recently launched Gabon Biodiversity Portal (https://gabonbiota.org/). The portal also hosts French guides for using Symbiota-based portals to manage, georeference, and publish natural history databases. These resources can provide much-needed guidance for other Francophone countries⁠—in Africa and beyond⁠—working to maximize the accessibility and value of their biodiversity collections.


Author(s):  
Niels Raes ◽  
Emily van Egmond ◽  
Ana Casino ◽  
Matt Woodburn ◽  
Deborah L Paul

With digitisation of natural history collections over the past decades, their traditional roles — for taxonomic studies and public education — have been greatly expanded into the fields of biodiversity assessments, climate change impact studies, trait analyses, sequencing, 3D object analyses etc. (Nelson and Ellis 2019; Watanabe 2019). Initial estimates of the global natural history collection range between 1.2 and 2.1 billion specimens (Ariño 2010), of which 169 million (8-14% - as of April 2019) are available at some level of digitisation through the Global Biodiversity Information Facility (GBIF). With iDigBio (Integrated Digitized Biocollections) established in the United States and with the European DiSSCo (Distributed Systems of Scientific Collections) accepted on the ESFRI roadmap, it has become a priority to digitize natural history collections at an industrialized scale. Both iDigBio and DiSSCo aim at mobilising, unifying and delivering bio- and geo-diversity information at the scale, form and precision required by scientific communities, and thereby transform a fragmented landscape into a coherent and responsive research infrastructure. In order to prioritise digitisation based on scientific demand, and efficiency using industrial digitisation pipelines, it is required to arrive at a uniform and unambiguously accepted collection description standard that would allow comparing, grouping and analysing natural history collections at diverse levels. Several initiatives attempt to unambiguously describe natural history collections using taxonomic and storage classification schemes. These initiatives include One World Collection, Global Registry of Scientific Collections (GRSciColl), TDWG (Taxonomic Databases Working Group) Natural Collection Descriptions (NCD) and CETAF (Consortium of European Taxonomy Facilities) passports, among others. In a collaborative effort of DiSSCo, ICEDIG (Innovation and consolidation for large scale digitisation of natural heritage), iDigBio, TDWG and the Task Group Collection Digitisation Dashboards, the various schemes were compared in a cross-walk analysis to propose a preliminary natural collection description standard that is supported by the wider community. In the process, two main user groups of collection descriptions standards were identified; scientists and collection managers. The classification produced intends to meet requirements from them both, resulting in three classification schemes that exist in parallel to each other (van Egmond et al. 2019). For scientific purposes a ‘Taxonomic’ and ‘Stratigraphic’ classification were defined, and for management purposes a ‘Storage’ classification. The latter is derived from specimen preservation types (e.g. dried, liquid preserved) defining storage requirements and the physical location of specimens in collection holding facilities. The three parallel collection classifications can be cross-sectioned with a ‘Geographic’ classification to assign sub-collections to major terrestrial and marine regions, which allow scientists to identify particular taxonomic or stratigraphic (sub-)collections from major geographical or marine regions of interest. Finally, to measure the level of digitisation of institutional collections and progress of digitisation through time, the number of digitised specimens for each geographically cross-sectioned (sub-)collection can be derived from institutional collection management systems (CMS). As digitisation has different levels of completeness a ‘Digitisation’ scheme has been adopted to quantify the level of digitisation of a collection from Saarenmaa et al. 2019, ranging from ‘not digitised’ to extensively digitised, recorded in a progressive scale of MIDS (Minimal Information for Digital Specimen). The applicability of this preliminary classification will be discussed and visualized in a Collection Digitisation Dashboards (CDD) to demonstrate how the implementation of a collection description standard allows the identification of existing gaps in taxonomic and geographic coverage and levels of digitisation of natural history collections. This set of common classification schemes and dashboard design (van Egmond et al. 2019) will be contributed to the TDWG Collection Description interest group to ultimately arrive at the common goal of a 'World Collection Catalogue'.


2018 ◽  
Vol 6 ◽  
Author(s):  
Vaughn Shirey

Natural history collections contain estimated billions of records representing a large body of knowledge about the diversity and distribution of life on Earth. Assessments of various forms of bias within the aggregated data associated with specimens in these collections have been conducted across temporal, taxonomic, and spatial domains. Considering that these biases are the sum of biases across all contributing collections to aggregate datasets, the assessment of bias at the collection level is warranted. Interactive visualization provides a powerful tool for the assessment of these biases and insight into the historical development of natural history collections, providing context for where sources of bias may originate and developing historical narratives to clarify our understanding of our own knowledge about life on Earth. Here, I present a case study on using Sankey diagrams to illustrate the development of the entomology type collection at the Academy of Natural Sciences of Drexel University in Philadelphia, Pennsylvania with the hope that extensions of these practices among individual natural history collections are modified and adopted.


Sign in / Sign up

Export Citation Format

Share Document