scholarly journals A Google Sheet Add-on for Biodiversity Data Standardization and Sharing

Author(s):  
José Augusto Salim ◽  
Antonio Saraiva

For those biologists and biodiversity data managers who are unfamiliar with information science data practices of data standardization, the use of complex software to assist in the creation of standardized datasets can be a barrier to sharing data. Since the ratification of the Darwin Core Standard (DwC) (Darwin Core Task Group 2009) by the Biodiversity Information Standards (TDWG) in 2009, many datasets have been published and shared through a variety of data portals. In the early stages of biodiversity data sharing, the protocol Distributed Generic Information Retrieval (DiGIR), progenitor of DwC, and later the protocols BioCASe and TDWG Access Protocol for Information Retrieval (TAPIR) (De Giovanni et al. 2010) were introduced for discovery, search and retrieval of distributed data, simplifying data exchange between information systems. Although these protocols are still in use, they are known to be inefficient for transferring large amounts of data (GBIF 2017). Because of that, in 2011 the Global Biodiversity Information Facility (GBIF) introduced the Darwin Core Archive (DwC-A), which allows more efficient data transfer, and has become the preferred format for publishing data in the GBIF network. DwC-A is a structured collection of text files, which makes use of the DwC terms to produce a single, self-contained dataset. Many tools for assisting data sharing using DwC-A have been introduced, such as the Integrated Publishing Toolkit (IPT) (Robertson et al. 2014), the Darwin Core Archive Assistant (GBIF 2010) and the Darwin Core Archive Validator. Despite promoting and facilitating data sharing, many users have difficulties using such tools, mainly because of the lack of training in information science in the biodiversity curriculum (Convention on Biological Diversiity 2012, Enke et al. 2012). However, most users are very familiar with spreadsheets to store and organize their data, but the adoption of the available solutions requires data transformation and training in information science and more specifically, biodiversity informatics. For an example of how spreadsheets can simplify data sharing see Stoev et al. (2016). In order to provide a more "familiar" approach to data sharing using DwC-A, we introduce a new tool as a Google Sheet Add-on. The Add-on, called Darwin Core Archive Assistant Add-on can be installed in the user's Google Account from the G Suite MarketPlace and used in conjunction with the Google Sheets application. The Add-on assists the mapping of spreadsheet columns/fields to DwC terms (Fig. 1), similar to IPT, but with the advantage that it does not require the user to export the spreadsheet and import it into another software. Additionally, the Add-on facilitates the creation of a star schema in accordance with DwC-A, by the definition of a "CORE_ID" (e.g. occurrenceID, eventID, taxonID) field between sheets of a document (Fig. 2). The Add-on also provides an Ecological Metadata Language (EML) (Jones et al. 2019) editor (Fig. 3) with minimal fields to be filled in (i.e., mandatory fields required by IPT), and helps users to generate and share DwC-Archives stored in the user's Google Drive, which can be downloaded as a DwC-A or automatically uploaded to another public storage resource like a user's Zenodo Account (Fig. 4). We expect that the Google Sheet Add-on introduced here, in conjunction with IPT, will promote biodiversity data sharing in a standardized format, as it requires minimal training and simplifies the process of data sharing from the user's perspective, mainly for those users not familiar with IPT, but that historically have worked with spreadsheets. Although the DwC-A generated by the add-on still needs to be published using IPT, it does provide a simpler interface (i.e., spreadsheet) for mapping data sets to DwC than IPT. Even though the IPT includes many more features than the Darwin Core Assistant Add-on, we expect that the Add-on can be a "starting point" for users unfamiliar with biodiversity informatics before they move on to more advanced data publishing tools. On the other hand, Zenodo integration allows users to share and cite their standardized data sets without publishing them via IPT, which can be useful for users without access to an IPT installation. Additionally, we are working on new features and future releases will include the automatic generation of Global Unique Identifiers for shared records, the possibility of adding additional data standards and DwC extensions, integration with GBIF REST API and with IPT REST API.

2010 ◽  
Vol 62 (4/5) ◽  
pp. 514-522 ◽  
Author(s):  
K.K.S. Sarinder ◽  
L.H.S. Lim ◽  
A.F. Merican ◽  
K. Dimyati

2018 ◽  
Vol 2 ◽  
pp. e26367
Author(s):  
Yvette Umurungi ◽  
Samuel Kanyamibwa ◽  
Faustin Gashakamba ◽  
Beth Kaplin

Freshwater biodiversity is critically understudied in Rwanda, and to date there has not been an efficient mechanism to integrate freshwater biodiversity information or make it accessible to decision-makers, researchers, private sector or communities, where it is needed for planning, management and the implementation of the National Biodiversity Strategy and Action Plan (NBSAP). A framework to capture and distribute freshwater biodiversity data is crucial to understanding how economic transformation and environmental change is affecting freshwater biodiversity and resulting ecosystem services. To optimize conservation efforts for freshwater ecosystems, detailed information is needed regarding current and historical species distributions and abundances across the landscape. From these data, specific conservation concerns can be identified, analyzed and prioritized. The purpose of this project is to establish and implement a long-term strategy for freshwater biodiversity data mobilization, sharing, processing and reporting in Rwanda. The expected outcome of the project is to support the mandates of the Rwanda Environment Management Authority (REMA), the national agency in charge of environmental monitoring and the implementation of Rwanda’s NBSAP, and the Center of Excellence in Biodiversity and Natural Resources Management (CoEB). The project also aligns with the mission of the Albertine Rift Conservation Society (ARCOS) to enhance sustainable management of natural resources in the Albertine rift region. Specifically, organizational structure, technology platforms, and workflows for the biodiversity data capture and mobilization are enhanced to promote data availability and accessibility to improve Rwanda’s NBSAP and support other decision-making processes. The project is enhancing the capacity of technical staff from relevant government and non-government institutions in biodiversity informatics, strengthening the capacity of CoEB to achieve its mission as the Rwandan national biodiversity knowledge management center. Twelve institutions have been identified as data holders and the digitization of these data using Darwin Core standards is in progress, as well as data cleaning for the data publication through the ARCOS Biodiversity Information System (http://arbmis.arcosnetwork.org/). The release of the first national State of Freshwater Biodiversity Report is the next step. CoEB is a registered publisher to the Global Biodiversity Information Facility (GBIF) and holds an Integrated Publishing Toolkit (IPT) account on the ARCOS portal. This project was developed for the African Biodiversity Challenge, a competition coordinated by the South African National Biodiversity Institute (SANBI) and funded by the JRS Biodiversity Foundation which supports on-going efforts to enhance the biodiversity information management activities of the GBIF Africa network. This project also aligns with SANBI’s Regional Engagement Strategy, and endeavors to strengthen both emerging biodiversity informatics networks and data management capacity on the continent in support of sustainable development.


Author(s):  
Carrie Seltzer

Since 2008, iNaturalist has been crowdsourcing identifications for biodiversity observations collected by citizen scientists. Today iNaturalist has over 25 million records of wild biodiversity with photo or audio evidence, from every country, representing more than 230,000 species, collected by over 700,000 people, and with 90,000 people helping others with identifications. Hundreds of publications have used iNaturalist data to advance research, conservation, and policy. There are three key themes that iNaturalist has embraced: social interaction; shareability of data, tools, and code; and scalability of the platform and community. The keynote will share reflections on what has (and has not) worked for iNaturalist while drawing on other examples from biodiversity informatics and citizen science. Insights about user motivations, synergistic collaborations, and strategic decisions about scaling offer some transferable approaches to address the broadly applicable questions: Which species is represented? How do we make the best use of the available biodiversity information? And how do we build something viable and enduring in the process?


2016 ◽  
Vol 11 ◽  
Author(s):  
Alex Asase ◽  
A. Townsend Peterson

Providing comprehensive, informative, primary, research-grade biodiversity information represents an important focus of biodiversity informatics initiatives. Recent efforts within Ghana have digitized >90% of primary biodiversity data records associated with specimen sheets in Ghanaian herbaria; additional herbarium data are available from other institutions via biodiversity informatics initiatives such as the Global Biodiversity Information Facility. However, data on the plants of Ghana have not as yet been integrated and assessed to establish how complete site inventories are, so that appropriate levels of confidence can be applied. In this study, we assessed inventory completeness and identified gaps in current Digital Accessible Knowledge (DAK) of the plants of Ghana, to prioritize areas for future surveys and inventories. We evaluated the completeness of inventories at ½° spatial resolution using statistics that summarize inventory completeness, and characterized gaps in coverage in terms of geographic distance and climatic difference from well-documented sites across the country. The southwestern and southeastern parts of the country held many well-known grid cells; the largest spatial gaps were found in central and northern parts of the country. Climatic difference showed contrasting patterns, with a dramatic gap in coverage in central-northern Ghana. This study provides a detailed case study of how to prioritize for new botanical surveys and inventories based on existing DAK.


Author(s):  
Filipi Soares ◽  
Benildes Maculan ◽  
Debora Drucker

Agricultural Biodiversity has been defined by the Convention on Biological Diversity as the set of elements of biodiversity that are relevant to agriculture and food production. These elements are arranged into an agro-ecosystem that compasses "the variability among living organisms from all sources including terrestrial, marine and other aquatic ecosystems and the ecological complexes of which they are part: this includes diversity within species, between species and of ecosystems" (UNEP 1992). As with any other field in Biology, Agricultural Biodiversity work produces data. In order to publish data in a way it can be efficiently retrieved on web, one must describe it with proper metadata. A metadata element set is a group of statements made about something. These statements have three elements, named subject (thing represented), predicate (space filled up with data) and object (data itself). This representation is called triples. For example, the title is a metadata element. A book is the subject; title is the predicate; and The Chronicles of Narnia is the object. Some metadata standards have been developed to describe biodiversity data, as ABCD Data Schema, Darwin Core (DwC) and Ecological Metadata Language (EML). The DwC is said to be the most used metadata standard to publish data about species occurrence worldwide (Global Biodiversity Information Facility 2019). "Darwin Core is a standard maintained by the Darwin Core maintenance group. It includes a glossary of terms (in other contexts these might be called properties, elements, fields, columns, attributes, or concepts) intended to facilitate the sharing of information about biological diversity by providing identifiers, labels, and definitions. Darwin Core is primarily based on taxa, their occurrence in nature as documented by observations, specimens, samples, and related information" (Biodiversity Information Standards (TDWG) 2014). Within this thematic context, a master research project is in progress at the Federal University of Minas Gerais in partnership with the Brazilian Agricultural Research Corporation (EMBRAPA). It aims to apply the DwC on Brazil’s Agricultural Biodiversity data. A pragmatic analysis of DwC and DwC Extensions demonstrated that important concepts and relations from Agricultural Biodiversity are not represented in DwC elements. For example, DwC does not have significant metadata to describe biological interactions, to convey important information about relations between organisms in an ecological perspective. Pollination is one of the biological interactions relevant to Agricultural Biodiversity, for which we need enhanced metadata. Given these problems, the principles of metadata construction of DwC will be followed in order to develop a metadata extension able to represent data about Agricultural Biodiversity. These principles are the Dublin Core Abstract Model, which present propositions for creating the triples (subject-predicate-object). The standard format of DwC Extensions (see Darwin Core Archive Validator) will be followed to shape the metadata extension. At the end of the research, we expect to present a model of DwC metadata record to publish data about Agricultural Biodiversity in Brazil, including metadata already existent in Simple DwC and the new metadata of Brazil’s Agricultural Biodiversity Metadata Extension. The resulting extension will be useful to represent Agricultural Diversity worldwide.


Author(s):  
Jean Ganglo

Benin became member of the Global Biodiversity Information Facility (GBIF) in 2004 and acceded to the status of voting member in 2011. GBIF Benin through the constant efforts of its node is now very active in GBIF community with respect to capacity building, data mobilization and data uses. GBIF Benin published more than 400 000 occurrence data from about 125 datasets on GBIF portal . As for capacity building, GBIF Benin yearly organizes at least 2 (two) workshops to enhance the capacities of national and regional partners in data mobilization and data uses. At regional level, GBIF Benin is leading a consortium of many countries (Senegal, Côte-d’Ivoire, Niger, Democratic Republic of Congo, Guinea, and Madagascar etc.) to help overcome the challenges of data mobilization and data uses at regional level. From the academic year 2017-2018, GBIF Benin, through its node manager, successfully cooperated with the University of Kansas to create a master program in biodiversity informatics. Biodiversity informatics is a field of investigation relatively new in science and is concerned with massive occurrence data collection on biodiversity as well as on environment; data treatments, analysis, and representations so as to derive sound research products to inform decisions on biodiversity conservation and sustainable uses in the context of climate and global changes. In Benin, the master program in biodiversity informatics is a permanent two-year program structured in teaching units with the following contents: 1) Basics concepts of biodiversity; 2) Biodiversity data capture; 3) Biodiversity inventories; 4) Biodiversity data analysis; 5) Climate change and biodiversity; 6) Ecological niche modeling and strategies for biodiversity conservation; 7) Data-science-policy interface; 8) Public Health and Applications of biodiversity data etc. At completion of their studies, students graduated in the program will be capacitated so as to achieve the following innovative objectives: 1) Use Geographic Information System to map spatial distribution of species; 2) Model the current and the future ecological niche of species in the context of climate and global changes; 3) Characterize biodiversity on scales ranging from local to global; 4) Assess geographic patterns among suites of species (i.e., communities); 5) Refine the knowledge on particular taxonomic groups; 6) Define priority zones of biodiversity conservation; 7) Develop strategies of species conservation; 8) Implement biodiversity conservation strategies; 9) Predict the risks of propagation of infectious diseases (Lassa fever, Ebola fever etc.) which vectors are living organisms, so as to support preventive actions, etc. With such capacities, the graduated students of the master program are obviously the new generation of biodiversity information scientists who are able to address the needs of information so as to contribute to biodiversity conservation and its sustainable uses. Furthermore, in their respective countries and the rest of Africa, they will successfully contribute to the achievements of the Sustainable Development Goals as defined by the United Nations in 2015. With respect to data uses, more and more research products are piling up in Benin and are being integrated into decision makers’ arena. In 2018, the results of our data uses were integrated in the elaboration of the second communication on climate change of Benin.


2020 ◽  
Vol 15 (4) ◽  
pp. 411-437 ◽  
Author(s):  
Marcos Zárate ◽  
Germán Braun ◽  
Pablo Fillottrani ◽  
Claudio Delrieux ◽  
Mirtha Lewis

Great progress to digitize the world’s available Biodiversity and Biogeography data have been made recently, but managing data from many different providers and research domains still remains a challenge. A review of the current landscape of metadata standards and ontologies in Biodiversity sciences suggests that existing standards, such as the Darwin Core terminology, are inadequate for describing Biodiversity data in a semantically meaningful and computationally useful way. As a contribution to fill this gap, we present an ontology-based system, called BiGe-Onto, designed to manage data together from Biodiversity and Biogeography. As data sources, we use two internationally recognized repositories: the Global Biodiversity Information Facility (GBIF) and the Ocean Biogeographic Information System (OBIS). BiGe-Onto system is composed of (i) BiGe-Onto Architecture (ii) a conceptual model called BiGe-Onto specified in OntoUML, (iii) an operational version of BiGe-Onto encoded in OWL 2, and (iv) an integrated dataset for its exploitation through a SPARQL endpoint. We will show use cases that allow researchers to answer questions that manage information from both domains.


2018 ◽  
Vol 2 ◽  
pp. e27087
Author(s):  
Donald Hobern ◽  
Andrea Hahn ◽  
Tim Robertson

For more than a decade, the biodiversity informatics community has recognised the importance of stable resolvable identifiers to enable unambiguous references to data objects and the associated concepts and entities, including museum/herbarium specimens and, more broadly, all records serving as evidence of species occurrence in time and space. Early efforts built on the Darwin Core institutionCode, collectionCode and catalogueNumber terms, treated as a triple and expected to uniquely to identify a specimen. Following review of current technologies for globally unique identifiers, TDWG adopted Life Science Identifiers (LSIDs) (Pereira et al. 2009). Unfortunately, the key stakeholders in the LSID consortium soon withdrew support for the technology, leaving TDWG committed to a moribund technology. Subsequently, publishers of biodiversity data have adopted a range of technologies to provide unique identifiers, including (among others) HTTP Universal Resource Identifiers (URIs), Universal Unique Identifiers (UUIDs), Archival Resource Keys (ARKs), and Handles. Each of these technologies has merit but they do not provide consistent guarantees of persistence or resolvability. More importantly, the heterogeneity of these solutions hampers delivery of services that can treat all of these data objects as part of a consistent linked-open-data domain. The geoscience community has established the System for Earth Sample Registration (SESAR) that enables collections to publish standard metadata records for their samples and for each of these to be associated with an International Geo Sample Number (IGSN http://www.geosamples.org/igsnabout). IGSNs follow a standard format, distribute responsibility for uniqueness between SESAR and the publishing collections, and support resolution via HTTP URI or Handles. Each IGSN resolves to a standard metadata page, roughly equivalent in detail to a Darwin Core specimen record. The standardisation of identifiers has allowed the community to secure support from some journal publishers for promotion and use of IGSNs within articles. The biodiversity informatics community encompasses a much larger number of publishers and greater pre-existing variation in identifier formats. Nevertheless, it would be possible to deliver a shared global identifier scheme with the same features as IGSNs by building off the aggregation services offered by the Global Biodiversity Information Facility (GBIF). The GBIF data index includes normalised Darwin Core metadata for all data records from registered data sources and could serve as a platform for resolution of HTTP URIs and/or Handles for all specimens and for all occurrence records. The most significant trade-off requiring consideration would be between autonomy for collections and other publishers in how they format identifiers within their own data and the benefits that may arise from greater consistency and predictability in the form of resolvable identifiers.


2018 ◽  
Vol 6 ◽  
Author(s):  
A. Townsend Peterson ◽  
Alex Asase ◽  
Dora Canhos ◽  
Sidnei de Souza ◽  
John Wieczorek

The field of biodiversity informatics is in a massive, “grow-out” phase of creating and enabling large-scale biodiversity data resources. Because perhaps 90% of existing biodiversity data nonetheless remains unavailable for science and policy applications, the question arises as to how these existing and available data records can be mobilized most efficiently and effectively. This situation led to our analysis of several large-scale biodiversity datasets regarding birds and plants, detecting information gaps and documenting data “leakage” or attrition, in terms of data on taxon, time, and place, in each data record. We documented significant data leakage in each data dimension in each dataset. That is, significant numbers of data records are lacking crucial information in terms of taxon, time, and/or place; information on place was consistently the least complete, such that geographic referencing presently represents the most significant factor in degradation of usability of information from biodiversity information resources. Although the full process of digital capture, quality control, and enrichment is important to developing a complete digital record of existing biodiversity information, payoffs in terms of immediate data usability will be greatest with attention paid to the georeferencing challenge.


Author(s):  
Michael Trizna ◽  
Torsten Dikow

Taxonomic revisions contain crucial biodiversity data in the material examined sections for each species. In entomology, material examined lists minimally include the collecting locality, date of collection, and the number of specimens of each collection event. Insect species might be represented in taxonomic revisions by only a single specimen or hundreds to thousands of specimens. Furthermore, revisions of insect genera might treat small genera with few species or include tens to hundreds of species. Summarizing data from such large and complex material examined lists and revisions is cumbersome, time-consuming, and prone to errors. However, providing data on the seasonal incidence, abundance, and collecting period of species is an important way to mobilize primary biodiversity data to understand a species’s occurrence or rarity. Here, we present SpOccSum (Species Occurrence Summary)—a tool to easily obtain metrics of seasonal incidence from specimen occurrence data in taxonomic revisions. SpOccSum is written in Python (Python Software Foundation 2019) and accessible through the Anaconda Python/R Data Science Platform as a Jupyter Notebook (Kluyver et al. 2016). The tool takes a simple list of specimen data containing species name, locality, date of collection (preferably separated by day, month, and year), and number of specimens in CSV format and generates a series of tables and graphs summarizing: number of specimens per species, number of specimens collected per month, number of unique collection events, as well as earliest, and most recent collecting year of each species. number of specimens per species, number of specimens collected per month, number of unique collection events, as well as earliest, and most recent collecting year of each species. The results can be exported as graphics or as csv-formatted tables and can easily be included in manuscripts for publication. An example of an early version of the summary produced by SpOccSum can be viewed in Tables 1, 2 from Markee and Dikow (2018). To accommodate seasonality in the Northern and Southern Hemispheres, users can choose to start the data display with either January or July. When geographic coordinates are available and species have widespread distributions spanning, for example, the equator, the user can itemize particular regions such as North of Tropic of Cancer (23.5˚N), Tropic of Cancer to the Equator, Equator to Tropic of Capricorn, and South of Tropic of Capricorn (23.5˚S). Other features currently in development include the ability to produce distribution maps from the provided data (when geographic coordinates are included) and the option to export specimen occurrence data as a Darwin-Core Archive ready for upload to the Global Biodiversity Information Facility (GBIF).


Sign in / Sign up

Export Citation Format

Share Document