scholarly journals "Look what they've done to our data!" — How Aggregators Change Data Items in Collection Records

2018 ◽  
Vol 2 ◽  
pp. e25906
Author(s):  
Robert Mesibov

Aggregators such as the Atlas of Living Australia (ALA) and the Global Biodiversity Information Facility (GBIF) have recently been criticised for imposing "backbone taxonomies" on records provided by museums, herbaria and other sources. Taxon names may be changed to suit the backbone, with the result that the taxon rank of the record may change and the originally provided name may no longer be searchable online through the aggregator. Aggregators may also delete data items, either by omitting entire fields or rejecting data items not conforming to aggregator-specific data standards. Modifications are more common than deletions and are particularly worrying in geospatial, date and recorder data fields. It can be difficult to locate originally provided data on aggregator websites, even for individual records, and bulk downloads from aggregators typically mask the changes made. In this presentation I document the loss and modification of biodiversity data items by aggregators and suggest strategies for museums and herbaria to counter data loss and modification.

ZooKeys ◽  
2018 ◽  
Vol 751 ◽  
pp. 129-146 ◽  
Author(s):  
Robert Mesibov

A total of ca 800,000 occurrence records from the Australian Museum (AM), Museums Victoria (MV) and the New Zealand Arthropod Collection (NZAC) were audited for changes in selected Darwin Core fields after processing by the Atlas of Living Australia (ALA; for AM and MV records) and the Global Biodiversity Information Facility (GBIF; for AM, MV and NZAC records). Formal taxon names in the genus- and species-groups were changed in 13–21% of AM and MV records, depending on dataset and aggregator. There was little agreement between the two aggregators on processed names, with names changed in two to three times as many records by one aggregator alone compared to records with names changed by both aggregators. The type status of specimen records did not change with name changes, resulting in confusion as to the name with which a type was associated. Data losses of up to 100% were found after processing in some fields, apparently due to programming errors. The taxonomic usefulness of occurrence records could be improved if aggregators included both original and the processed taxonomic data items for each record. It is recommended that end-users check original and processed records for data loss and name replacements after processing by aggregators.


2021 ◽  
Vol 9 ◽  
Author(s):  
Domingos Sandramo ◽  
Enrico Nicosia ◽  
Silvio Cianciullo ◽  
Bernardo Muatinte ◽  
Almeida Guissamulo

The collections of the Natural History Museum of Maputo have a crucial role in the safeguarding of Mozambique's biodiversity, representing an important repository of data and materials regarding the natural heritage of the country. In this paper, a dataset is described, based on the Museum’s Entomological Collection recording 409 species belonging to seven orders and 48 families. Each specimen’s available data, such as geographical coordinates and taxonomic information, have been digitised to build the dataset. The specimens included in the dataset were obtained between 1914–2018 by collectors and researchers from the Natural History Museum of Maputo (once known as “Museu Alváro de Castro”) in all the country’s provinces, with the exception of Cabo Delgado Province. This paper adds data to the Biodiversity Network of Mozambique and the Global Biodiversity Information Facility, within the objectives of the SECOSUD II Project and the Biodiversity Information for Development Programme. The aforementioned insect dataset is available on the GBIF Engine data portal (https://doi.org/10.15468/j8ikhb). Data were also shared on the Mozambican national portal of biodiversity data BioNoMo (https://bionomo.openscidata.org), developed by SECOSUD II Project.


2019 ◽  
Vol 7 ◽  
Author(s):  
Valéria da Silva ◽  
Manoel Aguiar-Neto ◽  
Dan Teixeira ◽  
Cleverson Santos ◽  
Marcos de Sousa ◽  
...  

We present a dataset with information from the Opiliones collection of the Museu Paraense Emílio Goeldi, Northern Brazil. This collection currently has 6,400 specimens distributed in 13 families, 30 genera and 32 species and holotypes of four species: Imeri ajuba Coronato-Ribeiro, Pinto-da-Rocha & Rheims, 2013, Phareicranaus patauateua Pinto-da-Rocha & Bonaldo, 2011, Protimesius trocaraincola Pinto-da-Rocha, 1997 and Sickesia tremembe Pinto-da-Rocha & Carvalho, 2009. The material of the collection is exclusive from Brazil, mostly from the Amazon Region. The dataset is now available for public consultation on the Sistema de Informação sobre a Biodiversidade Brasileira (SiBBr) (https://ipt.sibbr.gov.br/goeldi/resource?r=museuparaenseemiliogoeldi-collection-aracnologiaopiliones). SiBBr is the Brazilian Biodiversity Information System, an initiative of the government and the Brazilian node of the Global Biodiversity Information Facility (GBIF), which aims to consolidate and make primary biodiversity data available on a platform (Dias et al. 2017). Harvestmen or Opiliones constitute the third largest arachnid order, with approximately 6,500 described species. Brazil is the holder of the greatest diversity in the world, with more than 1,000 described species, 95% (960 species) of which are endemic to the country. Of these, 32 species were identified and deposited in the collection of the Museu Paraense Emílio Goeldi.


2018 ◽  
Vol 2 ◽  
pp. e25488
Author(s):  
Anne-Sophie Archambeau ◽  
Fabien Cavière ◽  
Kourouma Koura ◽  
Marie-Elise Lecoq ◽  
Sophie Pamerlon ◽  
...  

Atlas of Living Australia (ALA) (https://www.ala.org.au/) is the Global Biodiversity Information Facility (GBIF) node of Australia. They developed an open and free platform for sharing and exploring biodiversity data. All the modules are publicly available for reuse and customization on their GitHub account (https://github.com/AtlasOfLivingAustralia). GBIF Benin, hosted at the University of Abomey-Calavi, has published more than 338 000 occurrence records from 87 datasets and 2 checklists. Through the GBIF Capacity Enhancement Support Programme (https://www.gbif.org/programme/82219/capacity-enhancement-support-programme), GBIF Benin, with the help of GBIF France, is in the process of deploying the Beninese data portal using the GBIF France back-end architecture. GBIF Benin is the first African country to implement this module of the ALA infrastructure. In this presentation, we will show you an overview of the registry and the occurrence search engine using the Beninese data portal. We will begin with the administration interface and how to manage metadata, then we will continue with the user interface of the registry and how you can find Beninese occurrences through the hub.


2018 ◽  
Vol 2 ◽  
pp. e25486
Author(s):  
Nick dos Remedios ◽  
Marie-Elise Lecoq ◽  
David Martin ◽  
Sophia Ratcliffe

Atlas of Living Australia (ALA) (https://www.ala.org.au/) is the Global Biodiversity Information Facility (GBIF) node of Australia. Since 2010, they have developed and improved a platform for sharing and exploring biodiversity information. All the modules are publicly available for reuse and customization on their GitHub account (https://github.com/AtlasOfLivingAustralia). The National Biodiversity Network, a registered charity, is the UK GBIF node and has been sharing biodiversity data since 2000. They published more than 79 million occurrences from 818 datasets. In 2016, they launched the NBN Atlas Scotland (https://scotland.nbnatlas.org/) based on the Atlas of Living Australia infrastructure. Since then, they released the NBN Atlas (https://nbnatlas.org/), the NBN Atlas Wales (https://wales.nbnatlas.org/) and soon the NBN Atlas Isle of Man. In addition to the occurrence/species search engine and the metadata registry, they put in place several tools that help users to work with data published in the network: the spatial portal and "explore your region" module. Both elements are based on Atlas of Living Australia developments. Because the Atlas of Living Australia platform is really powerful an reusable, we want to show you these two applications used to make geographical analyses. In order to perform this, we will present you the specificities of each component by giving examples of some functionalities.


2021 ◽  
Vol 118 (6) ◽  
pp. e2018093118
Author(s):  
J. Mason Heberling ◽  
Joseph T. Miller ◽  
Daniel Noesgaard ◽  
Scott B. Weingart ◽  
Dmitry Schigel

The accessibility of global biodiversity information has surged in the past two decades, notably through widespread funding initiatives for museum specimen digitization and emergence of large-scale public participation in community science. Effective use of these data requires the integration of disconnected datasets, but the scientific impacts of consolidated biodiversity data networks have not yet been quantified. To determine whether data integration enables novel research, we carried out a quantitative text analysis and bibliographic synthesis of >4,000 studies published from 2003 to 2019 that use data mediated by the world’s largest biodiversity data network, the Global Biodiversity Information Facility (GBIF). Data available through GBIF increased 12-fold since 2007, a trend matched by global data use with roughly two publications using GBIF-mediated data per day in 2019. Data-use patterns were diverse by authorship, geographic extent, taxonomic group, and dataset type. Despite facilitating global authorship, legacies of colonial science remain. Studies involving species distribution modeling were most prevalent (31% of literature surveyed) but recently shifted in focus from theory to application. Topic prevalence was stable across the 17-y period for some research areas (e.g., macroecology), yet other topics proportionately declined (e.g., taxonomy) or increased (e.g., species interactions, disease). Although centered on biological subfields, GBIF-enabled research extends surprisingly across all major scientific disciplines. Biodiversity data mobilization through global data aggregation has enabled basic and applied research use at temporal, spatial, and taxonomic scales otherwise not possible, launching biodiversity sciences into a new era.


2020 ◽  
Vol 15 (4) ◽  
pp. 411-437 ◽  
Author(s):  
Marcos Zárate ◽  
Germán Braun ◽  
Pablo Fillottrani ◽  
Claudio Delrieux ◽  
Mirtha Lewis

Great progress to digitize the world’s available Biodiversity and Biogeography data have been made recently, but managing data from many different providers and research domains still remains a challenge. A review of the current landscape of metadata standards and ontologies in Biodiversity sciences suggests that existing standards, such as the Darwin Core terminology, are inadequate for describing Biodiversity data in a semantically meaningful and computationally useful way. As a contribution to fill this gap, we present an ontology-based system, called BiGe-Onto, designed to manage data together from Biodiversity and Biogeography. As data sources, we use two internationally recognized repositories: the Global Biodiversity Information Facility (GBIF) and the Ocean Biogeographic Information System (OBIS). BiGe-Onto system is composed of (i) BiGe-Onto Architecture (ii) a conceptual model called BiGe-Onto specified in OntoUML, (iii) an operational version of BiGe-Onto encoded in OWL 2, and (iv) an integrated dataset for its exploitation through a SPARQL endpoint. We will show use cases that allow researchers to answer questions that manage information from both domains.


Author(s):  
Marie-Elise Lecoq ◽  
Vicente Ruiz Jurado

Managers and developers from organizations within the Global Biodiversity Information Facility (GBIF) network nodes using the Atlas of Living Australia (ALA) modules have created the Living Atlases (LA) community. Since the beginning, two of our priorities have been the technical guides and communication inside and outside our network. A community can not be sustainable without useful technical documentation, as members must work by themselves as much as possible. Without communication, a community cannot grow either. More than one year ago, the Living Atlases community hired a technical coordinator, Vicente J. Ruiz Jurado. With the help of other participants, he greatly improved our technical documentation with the Living Atlas Quick Start Guide and increased communication with remote support sessions. The helpdesk, through the use of the LA Slack channel, has been improved as well. We have also increased our visibility on the Internet with our website and our Twitter account. Over the last few years, we have focused our work on end-users, with dedicated workshops, including exercises made by participants for their users and two videos showing how a Living Atlas works (How to search and download biodiversity data in an Atlas and How to use regions/spatial module in an Atlas).


Author(s):  
Paula Zermoglio ◽  
Anabela Plos ◽  
Néstor Acosta ◽  
Leisy Amaya ◽  
Dairo Escobar ◽  
...  

Historically, some of the most successful biodiversity data sharing initiatives have been developed particularly in North America, Europe, and Australia. In parallel, and driven by necessity, tools, practices and standards were shared across othes communities. In the last decade, great efforts have been made by countries in other regions to join the biodiversity data network and share their data worldwide. Although knowledge, tools, and documentation are broadly distributed, language is the main constraint for their use, as most of it is only available in English. English may be the first most spoken language worldwide (Eberhard et al. 2020), but it is not native to most of the population, including a sizable proportion of the United States (Ryan 2013). For instance, Spanish is listed as the second most spoken native language worldwide, after Mandarin Chinese (Eberhard et al. 2020). While recognizing that English is currently considered the “universal language” for scientifically-related activities, it has been pointed out that a large proportion of biodiversity scientific knowledge is not produced in English, and that language constitutes a barrier to sharing knowledge (Amano et al. 2016). Actions to overcome this have been called for, for example by the 2nd Global Biodiversity Informatics Conference (GBIC2) in its list of ambitions for supporting international collaboration (Hobern et al. 2019), but are still largely missing in the broad community. Language affects the understanding and use of biodiversity data standards and related documentation for all the community, both English and non-English speakers. Our findings in the Latin American region suggest that the availability of materials in other languages, namely Spanish and Portuguese, would greatly benefit the region and improve our involvement in biodiversity data sharing. Also, on the other hand, the English speaking community would benefit from better understanding knowledge in other non-English languages, allowing broader use of data from all regions. This work also constitutes a plea from the Latin American and the Spanish-speaking community at large to the Biodiversity Information Standards (TDWG) to explore and incorporate other languages, hence fostering understanding, and therefore widening the use of TDWG standards in our region. We provide a list of people supporting the petition as Supplementary Material (Suppl. material 1). In the petition we also identify people (more than 60% of the signatories) who are willing to contribute to translating TDWG resources into Spanish. There is no single, best mechanism to move this initiative forward, but the approaches of some other initiatives (e.g., the Global Biodiversity Information Facility (GBIF) translators network) are being explored, weighing resources needed both from the volunteers and the management perspectives. We will present the different options for the community to evaluate and decide upon a suitable action plan.


Sign in / Sign up

Export Citation Format

Share Document