scholarly journals arakno - An R package for effective spider nomenclature, distribution and trait data retrieval from online resources

2021 ◽  
Author(s):  
Pedro Cardoso ◽  
Stano Pekar

Online open databases are increasing in number, usefulness, and ease of use. There are currently two main global databases exclusive for spiders, the World Spider Catalogue (WSC) and the World Spider Trait (WST) database. Both are regularly used by thousands of researchers. Computational tools that allow effective processing of large data are now part of the workflow of any researcher and R is becoming a de facto standard for data manipulation, analysis, and presentation. Here we present an R package, arakno, that allows interface with the two databases. Implemented tools include checking species names against nomenclature of the WSC, obtaining and mapping data on distribution of species from both WST and the Global Biodiversity Information Facility (GBIF), and downloading trait data from the WST. A set of tools are also provided to prepare data for further statistical analysis.

2021 ◽  
Author(s):  
Manuela Mejía Estrada ◽  
Luz Fernanda Jiménez-Segura ◽  
Iván Soto Calderón

The Barcoding was proposed motivated by the mismatch between the low number of taxonomists that contrasts with the large number of species, the method requires the construction of reference collections of DNA sequences that represent existing biodiversity. Freshwater fishes are key indicators for understanding biogeography around the world. Colombia with 1610 species of freshwater fishes is the second richest country in the world in this group. However, genetic information of the species continues to be limited, the contribution to a reference library of DNA barcodes for Colombian freshwater fishes highlights the importance of biological collections and seeks to strengthen inventories and taxonomy of such collections in future studies. This dataset contributes to the knowledge on the DNA barcodes and occurrence records of 96 species of Freshwater fishes from Colombia. The species represented in this dataset correspond to an addition to BOLD public databases of 39 species. Forty-nine specimens were collected in Atrato bassin and 708 in Magdalena-Cauca bassin during the period of 2010 to 2020, two species (Loricariichthys brunneus and Poecilia sphenops) are considered exotic to the Atrato, Cauca and Magdalena basins and four species (Oncorhynchu mykiss, Oreochromis niloticus, Parachromis friedrichsthalii and Xiphophorus helleri) are exotic to Colombian hydrogeographic regions. All specimens are deposited in the CIUA collection at University of Antioquia and have their DNA barcodes made publicly available in the Barcode of Life Data System (BOLD) online database and the distribution dataset can be freely accessed through the Global Biodiversity Information Facility (GBIF).


Author(s):  
David Mitchell ◽  
Thomas Orrell

The Integrated Taxonomic Information System (ITIS) provides a regularly updated, global database that currently contains over 868,000 scientific names and their hierarchy. The program exists to communicate a comprehensive taxonomy of global species across 7 kingdoms that enables biodiversity information to be discovered, indexed, and connected across all human endeavors. ITIS partners with taxonomists and experts across the world to assemble scientific names and their taxonomic relationships, and then distributes that data through publicly available software. A single taxon may be represented by multiple scientific names, so ITIS makes it a priority to provide synonymy. Linking valid or accepted names with their subjective and objective synonyms is a key component of name translation and increases the precision of searches and organization of information. ITIS and its partner Species2000 create the Catalogue of Life (CoL) checklist that provides quality scientific name data for over 2.2M species. The CoL is the Global Biodiversity Information Facility (GBIF) taxonomic backbone. Providing automated open access to complete, current, literature-referenced, and expert-validated taxonomic information enables biological data management systems, and is elemental to enhancing the utility of the amassed scientific data across the world. Fully leveraging this information for the public good is crucial for empowering the global digital society to confront the most pressing social and environmental challenges.


Author(s):  
Teresa Mayfield-Meyer ◽  
Phyllis Sharp ◽  
Dusty McDonald

The reality is that there is no single “taxonomic backbone”, there are many: the Global Biodiversity Information Facility (GBIF) Backbone Taxonomy, the World Register of Marine Species (WoRMS) and MolluscaBase, to name a few. We could view each one of these as a vertebra on the taxonomic backbone, but even that isn’t quite correct as some of these are nested within others (MolluscaBase contributes to WoRMS, which contributes to Catalogue of Life, which contributes to the GBIF Backbone Taxonomy). How is a collection manager without expertise in a given set of taxa and a limited amount of time devoted to finding the “most current” taxonomy supposed to maintain research grade identifications when there are so many seemingly authoritative taxonomic resources? And once a resource is chosen, how can they seamlessly use the information in that resource? This presentation will document how the Arctos community’s use of the taxon name matching service Global Names Architecture (GNA) led one volunteer team leader in a marine invertebrate collection to attempt to make use of WoRMS taxonomy and how her persistence brought better identifications and classifications to a community of collections. It will also provide insight into some of the technical and curatorial challenges involved in using an outside resource as well as the ongoing struggle to keep up with changes as they occur in the curated resource.


Author(s):  
John Waller

I will cover how the Global Biodiversity Information Facility (GBIF) handles data quality issues, with specific focus on coordinate location issues, such as gridded datasets (Fig. 1) and country centroids. I will highlight the challenges GBIF faces identifying potential data quality problems and what we and others (Zizka et al. 2019) are doing to discover and address them. GBIF is the largest open-data portal of biodiversity data, which is a large network of individual datasets (> 40k) from various sources and publishers. Since these datasets are variable both within themselves and dataset-to-dataset, this creates a challenge for users wanting to use data collected from museums, smartphones, atlases, satellite tracking, DNA sequencing, and various other sources for research or analysis. Data quality at GBIF will always be a moving target (Chapman 2005), and GBIF already handles many obvious errors such as zero/impossible coordinates, empty or invalid data fields, and fuzzy taxon matching. Since GBIF primarily (but not exclusively) serves lat-lon location information, there is an expectation that occurrences fall somewhat close to where the species actually occurs. This is not always the case. Occurrence data can be hundereds of kilometers away from where the species naturally occur, and there can be multiple reasons for why this can happen, which might not be entirely obvious to users. One reasons is that many GBIF datasets are gridded. Gridded datasets are datasets that have low resolution due to equally-spaced sampling. This can be a data quality issue because a user might assume an occurrence record was recorded exactly at its coordinates. Country centroids are another reason why a species occurrence record might be far from where it occurs naturally. GBIF does not yet flag country centroids, which are records where the dataset publishers has entered the lat-long center of a country instead of leaving the field blank. I will discuss the challenges surrounding locating these issues and the current solutions (such as the CoordinateCleaner R package). I will touch on how existing DWCA terms like coordinateUncertaintyInMeters and footprintWKT are being utilized to highlight low coordinate resolution. Finally, I will highlight some other emerging data quality issues and how GBIF is beginning to experiment with dataset-level flagging. Currently we have flagged around 500 datasets as gridded and around 400 datasets as citizen science, but there are many more potential dataset flags.


2019 ◽  
Vol 7 ◽  
Author(s):  
Valéria da Silva ◽  
Manoel Aguiar-Neto ◽  
Dan Teixeira ◽  
Cleverson Santos ◽  
Marcos de Sousa ◽  
...  

We present a dataset with information from the Opiliones collection of the Museu Paraense Emílio Goeldi, Northern Brazil. This collection currently has 6,400 specimens distributed in 13 families, 30 genera and 32 species and holotypes of four species: Imeri ajuba Coronato-Ribeiro, Pinto-da-Rocha & Rheims, 2013, Phareicranaus patauateua Pinto-da-Rocha & Bonaldo, 2011, Protimesius trocaraincola Pinto-da-Rocha, 1997 and Sickesia tremembe Pinto-da-Rocha & Carvalho, 2009. The material of the collection is exclusive from Brazil, mostly from the Amazon Region. The dataset is now available for public consultation on the Sistema de Informação sobre a Biodiversidade Brasileira (SiBBr) (https://ipt.sibbr.gov.br/goeldi/resource?r=museuparaenseemiliogoeldi-collection-aracnologiaopiliones). SiBBr is the Brazilian Biodiversity Information System, an initiative of the government and the Brazilian node of the Global Biodiversity Information Facility (GBIF), which aims to consolidate and make primary biodiversity data available on a platform (Dias et al. 2017). Harvestmen or Opiliones constitute the third largest arachnid order, with approximately 6,500 described species. Brazil is the holder of the greatest diversity in the world, with more than 1,000 described species, 95% (960 species) of which are endemic to the country. Of these, 32 species were identified and deposited in the collection of the Museu Paraense Emílio Goeldi.


2021 ◽  
Author(s):  
Renato Augusto Ferreira Lima ◽  
Andrea Sanchez-Tapia ◽  
Sara R. Mortara ◽  
Hans Steege ◽  
Marinez F. Siqueira

Species records from biological collections are becoming increasingly available online. This unprecedented availability of records has largely supported recent studies in taxonomy, bio-geography, macro-ecology, and biodiversity conservation. Biological collections vary in their documentation and notation standards, which have changed through time. For different reasons, neither collections nor data repositories perform the editing, formatting and standardization of the data, leaving these tasks to the final users of the species records (e.g. taxonomists, ecologists and conservationists). These tasks are challenging, particularly when working with millions of records from hundreds of biological collections. To help collection curators and final users to perform those tasks, we introduce plantR an open-source package that provides a comprehensive toolbox to manage species records from biological collections. The package is accompanied by the proposal of a reproducible workflow to manage this type of data in taxonomy, ecology and biodiversity conservation. It is implemented in R and designed to handle relatively large data sets as fast as possible. Initially designed to handle plant species records, many of the plantR features also apply to other groups of organisms, given that the data structure is similar. The plantR workflow includes tools to (1) download records from different data repositories, (2) standardize typical fields associated with species records, (3) validate the locality, geographical coordinates, taxonomic nomenclature and species identifications, including the retrieval of duplicates across collections, and (4) summarize and export records, including the construction of species checklists with vouchers. Other R packages provide tools to tackle some of the workflow steps described above. But in addition to the new features and resources related to the data editing and validation, the greatest strength of plantR is to provide a comprehensive and user-friendly workflow in one single environment, performing all tasks from data retrieval to export. Thus, plantR can help researchers to better assess data quality and avoid data leakage in a wide variety of studies using species records.


Author(s):  
Constance Rinaldo

This will be a short introduction to the symposium: Improving access to hidden scientific data in the Biodiversity Heritage Library. The symposium will present examples of how the Biodiversity Heritage Library (BHL) collaborates across the international consortium and with community partners around the world to help enhance access to the biodiversity literature. Literature repositories, particularly the BHL collections, have been recognized as critical to the global scientific community. A diverse global user community propels BHL and BHL users to develop access tools beyond the standard “title, author, subject” search. BHL utilizes the Global Names Recognition and Discovery (GNRD) service to identify taxonomic names within text rendered by Optical Character Recognition (OCR) software, enabling scientific name searches and creation of species-specific bibliographies, critical to systematics research. In this symposium, we will hear from international partners and creative users making data from the BHL globally accessible for the kinds of larger-scale analysis enabled by BHL’s full-text search capabilities and Application Program Interface (API) protocols. In addition to taxonomic name services already incorporated in BHL, the consortium has also begun exploring georeferencing strategies for better searching and potential connections with key biodiversity resources such as the Global Biodiversity Information Facility (GBIF). With many different institutions around the world participating, the ability to work together virtually is critical for a seamless end product that meets the demands of the international community as well as the needs of local institutions.


Author(s):  
Elspeth Haston ◽  
Lorna Mitchell

The specimens held in natural history collections around the world are the direct result of the effort of thousands of people over hundreds of years. However, the way that the names of these people have been recorded within the collections has never been fully standardised, and this makes the process of correctly assigning the event relating to the specimen to an individual difficult at best, and impossible at worst. The events in which people are related to specimens include collecting, identifying, naming, loaning and owning. Whilst there are resources in the botanical community that hold information on many collectors and authors of plant names, the residual number of unknown people and the effort required to disambiguate them is daunting. Moreover, in many cases, the work carried out within the collection to disambiguate the names relating to the specimens is often not recorded and made available, generally due to the lack of a system to do so. This situation is making it extremely difficult to search for collections within the main aggregators, such as GBIF —the Global Biodiversity Information Facility— , and severely hampers our ability to link collections both within and between institutes and disciplines. When we look at benefits of linking collections and people, the need to agree and implement a system of managing people names becomes increasingly urgent.


Author(s):  
Agung Riyadi

The One of many way to connect to the database through the android application is using volleyball and RESTAPI. By using RestAPI, the android application does not directly connect to the database but there is an intermediary in the form of an API. In android development, Android-volley has the disadvantage of making requests from large and large data, so an evaluation is needed to test the capabilities of the Android volley. This research was conducted to test android-volley to retrieve data through RESTAPI presented in the form of an application to retrieve medicinal plant data. From the test results can be used by volley an error occurs when the back button is pressed, in this case another process is carried out if the previous volley has not been loaded. This error occurred on several android versions such as lollipops and marshmallows also on some brands of devices. So that in using android-volley developer need to check the request queue process that is carried out by the user, if the data retrieval process by volley has not been completed, it is necessary to stop the process to download data using volley so that there is no Android Not Responding (ANR) error.Keywords: Android, Volley, WP REST API, ANR Error


2021 ◽  
pp. 000276422110216
Author(s):  
Jasmine Lorenzini ◽  
Hanspeter Kriesi ◽  
Peter Makarov ◽  
Bruno Wüest

Protest event analysis is a key method to study social movements, allowing to systematically analyze protest events over time and space. However, the manual coding of protest events is time-consuming and resource intensive. Recently, advances in automated approaches offer opportunities to code multiple sources and create large data sets that span many countries and years. However, too often the procedures used are not discussed in details and, therefore, researchers have a limited capacity to assess the validity and reliability of the data. In addition, many researchers highlighted biases associated with the study of protest events that are reported in the news. In this study, we ask how social scientists can build on electronic news databases and computational tools to create reliable PEA data that cover a large number of countries over a long period of time. We provide a detailed description our semiautomated approach and we offer an extensive discussion of potential biases associated with the study of protest events identified in international news sources.


Sign in / Sign up

Export Citation Format

Share Document