scholarly journals A botanical demonstration of the potential of linking data using unique identifiers for people

PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0261130
Author(s):  
Anton Güntsch ◽  
Quentin Groom ◽  
Marcus Ernst ◽  
Jörg Holetschek ◽  
Andreas Plank ◽  
...  

Natural history collection data available digitally on the web have so far only made limited use of the potential of semantic links among themselves and with cross-disciplinary resources. In a pilot study, botanical collections of the Consortium of European Taxonomic Facilities (CETAF) have therefore begun to semantically annotate their collection data, starting with data on people, and to link them via a central index system. As a result, it is now possible to query data on collectors across different collections and automatically link them to a variety of external resources. The system is being continuously developed and is already in production use in an international collection portal.

Author(s):  
Leonor Venceslau ◽  
Luis Lopes

Major efforts are being made to digitize natural history collections to make these data available online for retrieval and analysis (Beaman and Cellinese 2012). Georeferencing, an important part of the digitization process, consists of obtaining geographic coordinates from a locality description. In many natural history collection specimens, the coordinates of the sampling location are not recorded, rather they contain a description of the site. Inaccurate georeferencing of sampling locations negatively impacts data quality and the accuracy of any geographic analysis on those data. In addition to latitude and longitude, it is important to define a degree of uncertainty of the coordinates, since in most cases it is impossible to pinpoint the exact location retrospectively. This is usually done by defining an uncertainty value represented as a radius around the center of the locality where the sampling took place. Georeferencing is a time-consuming process requiring manual validation; as such, a significant part of all natural history collection data available online are not georeferenced. Of the 161 million records of preserved specimens currently available in the Global Biodiversity Information Facility (GBIF), only 86 million (53.4%) include coordinates. It is therefore important to develop and optimize automatic tools that allow a fast and accurate georeferencing. The objective of this work was to test existing automatic georeferencing services and evaluate their potential to accelerate georeferencing of large collection datasets. For this end, several open-source georeferencing services are currently available, which provide an application programming interface (API) for batch georeferencing. We evaluated five programs: Google Maps, MapQuest, GeoNames, OpenStreetMap, and GEOLocate. A test dataset of 100 records (reference dataset), which had been previously individually georreferenced following Chapman and Wieczorek 2006, was randomly selected from the Museu Nacional de História Natural e da Ciência, Universidade de Lisboa insect collection catalogue (Lopes et al. 2016). An R (R Core Team 2018) script was used to georeference these records using the five services. In cases where multiple results were returned, only the first one was considered and compared with the manually obtained coordinates of the reference dataset. Two factors were considered in evaluating accuracy: Total number of results obtained and Distance to the original location in the reference dataset. Total number of results obtained and Distance to the original location in the reference dataset. Of the five programs tested, Google Maps yielded the most results (99) and was the most accurate with 57 results < 1000 m from the reference location and 79 within the uncertainty radius. GEOLocate provided results for 87 locations, of which 47 were within 1000 m of the correct location, and 57 were within the uncertainty radius. The other 3 services tested all had less than 35 results within 1000 m from the reference location, and less than 50 results within the uncertainty radius. Google Maps and Open Street Map had the lowest average distance from the reference location, both around 5500 m. Google Maps has a usage limit of around 40000 free georeferencing requests per month, beyond which the service is paid, while GEOLocate is free with no usage limit. For large collections, this may be a factor to take into account. In the future, we hope to optimize these methods and test them with larger datasets.


2021 ◽  
Vol 9 ◽  
Author(s):  
Fernanda Herrera Mesías ◽  
Alexander Weigand

Museums and other institutions curating natural history collections (NHCs) are fundamental entities to many scientific disciplines, as they house data and reference material for varied research projects. As such, biological specimens preserved in NHCs represent accessible physical records of the living world's history. They provide useful information regarding the presence and distribution of different taxonomic groups through space and time. Despite the importance of biological museum specimens, their potential to answer scientific questions, pertinent to the necessities of our current historical context, is often under-explored. The currently-known wild bee fauna of Luxembourg comprises 341 registered species distributed amongst 38 different genera. However, specimens stored in the archives of local NHCs represent an untapped resource to update taxonomic lists, including potentially overlooked findings relevant to the development of national conservation strategies. We re-investigated the wild bee collection of the Zoology Department of the National Museum of Natural History Luxembourg by using morphotaxonomy and DNA barcoding. The collection revision led to the discovery of four species so far not described for the country: Andrena lagopus (Latreille, 1809), Nomada furva (Panzer, 1798), Hoplitis papaveris (Latreille, 1799) and Sphecodes majalis (Pérez, 1903). Additionally, the presence of Nomada sexfasciata (Panzer, 1799), which inexplicably had been omitted by the most current species list, can be re-confirmed. Altogether, our findings increase the number of recorded wild bee species in Luxembourg to 346. Moreover, the results highlight the crucial role of NHCs as repositories of our knowledge of the natural world.


2018 ◽  
Vol 6 ◽  
Author(s):  
Vaughn Shirey

Natural history collections contain estimated billions of records representing a large body of knowledge about the diversity and distribution of life on Earth. Assessments of various forms of bias within the aggregated data associated with specimens in these collections have been conducted across temporal, taxonomic, and spatial domains. Considering that these biases are the sum of biases across all contributing collections to aggregate datasets, the assessment of bias at the collection level is warranted. Interactive visualization provides a powerful tool for the assessment of these biases and insight into the historical development of natural history collections, providing context for where sources of bias may originate and developing historical narratives to clarify our understanding of our own knowledge about life on Earth. Here, I present a case study on using Sankey diagrams to illustrate the development of the entomology type collection at the Academy of Natural Sciences of Drexel University in Philadelphia, Pennsylvania with the hope that extensions of these practices among individual natural history collections are modified and adopted.


2019 ◽  
Vol 12 (3) ◽  
pp. 202-211
Author(s):  
Yuancheng Li ◽  
Rong Huang ◽  
Xiangqian Nie

Background: With the rapid development of the Internet, the number of web spam has increased dramatically in recent years, which has wasted search engine storage and computing power on a massive scale. To identify the web spam effectively, the content features, link features, hidden features and quality features of web page are integrated to establish the corresponding web spam identification index system. However, the index system is highly correlation dimension. Methods: An improved method of autoencoder named stacked autoencoder neural network (SAE) is used to realize the reduction of the web spam identification index system. Results: The experiment results show that our method could reduce effectively the index of web spam and significantly improves the recognition rate in the following work. Conclusion: An autoencoder based web spam indexes reduction method is proposed in this paper. The experimental results show that it greatly reduces the temporal and spatial complexity of the future web spam detection model.


World Futures ◽  
2018 ◽  
Vol 74 (6) ◽  
pp. 392-411 ◽  
Author(s):  
Maria Letizia Cesana ◽  
Francesca Giordano ◽  
Diego Boerchi ◽  
Marta Rivolta ◽  
Cristina Castelli

Bibliosphere ◽  
2020 ◽  
pp. 49-60
Author(s):  
A. V. Glushanovskiy

The article analyzes the changes in the bibliometric characteristics of the array of Russian publications reflected in the Web of Science Core Collection (WoS CC database) in the field of physics in 2018, compared to the same characteristics in 2010. The main parameter to assess the quality (research level) of arrays with bibliometric point of view was “Comprehensive index of quality” (CIQ) for the array of publications, calculated on the basis of one of the parameters in “Method for calculating the qualitative indicator of the state task “the Comprehensive performance score publication”...”, used by the Ministry of science and higher education of the Russian Federation. It was found that with an almost twofold increase in the volume of the array, there was a slight decrease in its quality in terms of CIQ in 2018 in comparison with 2010. The author also compared the characteristics of the array of Russian publications in 2018 with similar ones of the arrays on physical publications in Germany, India and Great Britain, located close to Russia in the ranking by the number of publications included in the WoS array (in this ranking, Russia was on the fourth place in 2018). In the ranking based on the CIQ indicator, the arrays of these countries are significantly ahead of the Russian one, and our country is only on the sixth place. The main reasons for this lag in the Russian publications array are identified. They are: a lower percentage of Russian publications in high-quartile journals and a greater number of publications from conference proceedings. The conclusion is made about the applicability of bibliometric analysis to identify trends in publishing activities in the scientific field.


Sign in / Sign up

Export Citation Format

Share Document