biodiversity informatics
Recently Published Documents


TOTAL DOCUMENTS

132
(FIVE YEARS 37)

H-INDEX

16
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Laura Brenskelle ◽  
John Wieczorek ◽  
Edward Davis ◽  
Kitty Emery ◽  
Neill J. Wallis ◽  
...  

Darwin Core, the data standard used for sharing modern biodiversity and paleodiversity occurrence records, has previously lacked proper mechanisms for reporting what is known about the estimated age range of specimens from deep time. This has led to data providers putting these data in fields where they cannot easily be found by users, which impedes the reuse and improvement of these data by other researchers. Here we describe the development of the Chronometric Age Extension to Darwin Core, a ratified, community-developed extension that enables the reporting of ages of specimens from deeper time and the evidence supporting these estimates. The extension standardizes reporting about the methods or assays used to determine an age and other critical information like uncertainty. It gives data providers flexibility about the level of detail reported, focusing on the minimum information needed for reuse while still allowing for significant detail if providers have it. Providing a standardized format for reporting these data will make them easier to find and search and enable researchers to pinpoint specimens of interest for data improvement or accumulate more data for broad temporal studies. The Chronometric Age Extension was also the first community-managed vocabulary to undergo the new Biodiversity Informatics Standards (TDWG) review and ratification process, thus providing a blueprint for future Darwin Core extension development.


2021 ◽  
Vol 9 ◽  
Author(s):  
Fatima Parker-Allie ◽  
Francisco Pando ◽  
Anders Telenius ◽  
Jean Ganglo ◽  
Danny Vélez ◽  
...  

Biodiversity informatics is a new and evolving field, requiring efforts to develop capacity and a curriculum for this field of science. The main objective was to summarise the level of activity and the efforts towards developing biodiversity informatics curricula, for work-based training and/or academic teaching at universities, taking place within the Global Biodiversity Information Facility (GBIF) countries and its associated network. A survey approach was used to identify existing capacities and resources within the network. Most of GBIF Nodes survey respondents (80%) are engaged in onsite training activities, with a focus on work-based professionals, mostly researchers, policy-makers and students. Training topics include data mobilisation, digitisation, management, publishing, analysis and use, to enable the accessibility of analogue and digital biological data that currently reside as scattered datasets. An initial assessment of academic teaching activities highlighted that countries in most regions, to varying degrees, were already engaged in the conceptualisation, development and/or implementation of formal academic programmes in biodiversity informatics, including programmes in Benin, Colombia, Costa Rica, Finland, France, India, Norway, South Africa, Sweden, Taiwan and Togo. Digital e-learning platforms were an important tool to help build capacity in many countries. In terms of the potential in the Nodes network, 60% expressed willingness to be recruited or commissioned for capacity enhancement purposes. Contributions and activities of various country nodes across the network have been highlighted and a working curriculum framework has been defined.


Author(s):  
Ian Engelbrecht ◽  
Hester Steyn

RESTful APIs (REpresentational State Transfer Application Programming Interfaces) are the most commonly used mechanism for biodiversity informatics databases to provide open access to their content. In its simplest form an API provides an interface based on the HTTP protocol whereby any client can perform an action on a data resource identified by a URL using an HTTP verb (GET, POST, PUT, DELETE) to specify the intended action. For example, a GET request to a particular URL (informally called an endpoint) will return data to the client, typically in JSON format, which the client converts to the format it needs. A client can either be custom written software or commonly used programs for data analysis such as R (programming language), Microsoft Excel (everybody’s favorite data management tool), OpenRefine, or business intelligence software. APIs are therefore a valuable mechanism for making biodiversity data FAIR (findable, accessible, interoperable, reusable). There is currently no standard specifying how RESTful APIs should be designed, resulting in a variety of URL and response data formats for different APIs. This presents a challenge for API users who are not technically proficient or familiar with programming if they have to work with many different and inconsistent data sources. We undertook a brief review of eight existing APIs that provide data about taxa to assess consistency and the extent to which the Darwin Core standard (Wieczorek et al. 2021) for data exchange is applied. We assessed each API based on aspects of URL construction and the format of the response data (Fig. 1). While only cursory and limited in scope, our survey suggests that consistency across APIs is low. For example, some APIs use nouns for their endpoints (e.g. ‘taxon’ or ‘species’), emphasising their content, whereas others use verbs (e.g. ‘search’), emphasising their functionality. Response data seldom use Darwin Core terms (two out of eight examples) and a wide range of terms can be used to represent the same concept (e.g. six different terms are used for dwc:scientificNameAuthorship). Terms that can be considered metadata for a response, such as pagination details, also vary considerably. Interestingly, the public interfaces for the majority of APIs assessed do not provide POST, PUT or DELETE endpoints that modify the database. POST is only used for providing more detailed request bodies to retrieve data than possible with GET. This indicates the primary use of APIs by biodiversity informatics platforms for data sharing. An API design guideline is a document that provides a set of rules or recommendations for how APIs should be designed in order to improve their consistency and useability. API design guidelines are typically created by particular organizations to standardize API development within the organization, or as a guideline for programmers using an organization’s software to build APIs (e.g., Microsoft and Google). The API Stylebook is an online resource that provides access to a wide range of existing design guidelines, and there is an abundance of other resources available online. This presentation will cover some of the general concepts of API design, demonstrate some examples of how existing APIs vary, and discuss potential options to encourage standardization. We hope our analysis, the available body of knowledge on API design, and the collective experience of the biodiversity informatics community working with APIs may help answer the question “Does TDWG need an API design guideline?”


Author(s):  
Ben Norton

Web APIs (Application Programming Interfaces) facilitate the exchange of resources (data) between two functionally independent entities across a common programmatic interface. In more general terms, Web APIs can connect almost anything to the world wide web. Unlike traditional software, APIs are not compiled, installed, or run. Instead, data are read (or consumed in API speak) through a web-based transaction, where a client makes a request and a server responds. Web APIs can be loosely grouped into two categories within the scope of biodiversity informatics, based on purpose. First, Product APIs deliver data products to end-users. Examples include the Global Biodiversity Information Facility (GBIF) and iNaturalist APIs. Designed and built to solve specific problems, web-based Service APIs are the second type and the focus of this presentation (referred to as Service APIs). Their primary function is to provide on-demand support to existing programmatic processes. Examples of this type include Elasticsearch Suggester API and geolocation, a service that delivers geographic locations from spatial input (latitude and longitude coordinates) (Pejic et al. 2010). Many challenges lie ahead for biodiversity informatics and the sharing of global biodiversity data (e.g., Blair et al. 2020). Service-driven, standardized web-based Service APIs that adhere to best practices within the scope of biodiversity informatics can provide the transformational change needed to address many of these issues. This presentation will highlight several critical areas of interest in the biodiversity data community, describing how Service APIs can address each individually. The main topics include: standardized vocabularies, interoperability of heterogeneous data sources and data quality assessment and remediation. standardized vocabularies, interoperability of heterogeneous data sources and data quality assessment and remediation. Fundamentally, the value of any innovative technical solution can be measured by the extent of community adoption. In the context of Service APIs, adoption takes two primary forms: financial and temporal investment in the construction of clients that utilize Service APIs and willingness of the community to integrate Service APIs into their own systems and workflows. financial and temporal investment in the construction of clients that utilize Service APIs and willingness of the community to integrate Service APIs into their own systems and workflows. To achieve this, Service APIs must be simple, easy to use, pragmatic, and designed with all major stakeholder groups in mind, including users, providers, aggregators, and architects (Anderson et al. 2020Anderson et al. 2020; this study). Unfortunately, many innovative and promising technical solutions have fallen short not because of an inability to solve problems (Verner et al. 2008), rather, they were difficult to use, built in isolation, and/or designed without effective communication with stakeholders. Fortunately, projects such as Darwin Core (Wieczorek et al. 2012), the Integrated Publishing Toolkit (Robertson et al. 2014), and Megadetector (Microsoft 2021) provide the blueprint for successful community adoption of a technological solution within the biodiversity community. The final section of this presentation will examine the often overlooked non-technical aspects of this technical endeavor. Within this context, specifically how following these models can broaden community engagement and bridge the knowledge gap between the major stakeholders, resulting in the successful implementation of Service APIs.


Author(s):  
Thomas Chen

Biodiversity informatics have emerged as a key asset in wildlife and ecological conservation around the world. This is especially true in Antarctica, where climate change continues to threaten marine and terrestrial species. It is well documented that the polar regions experience the most drastic rate of climate change compared to the rest of the world (IPCC 2021). Research approaches within the scope of polar biodiversity informatics consist of computational architectures and systems, analysis and modelling methods, and human-computer interfaces, ranging from more traditional statistical techniques to more recent machine learning and artificial intelligence-based imaging techniques. Ongoing discussions include making datasets findable, accessible, interoperable and reusable (FAIR) (Wilkinson et al. 2016). The deployment of biodiversity informatics systems and coordination of standards around their utilization in the Antarctic are important areas of consideration. To bring together scientists and practitioners working at the nexus of informatics and Antarctic biodiversity, the Expert Group on Antarctic Biodiversity Informatics (EG-ABI) was formed under the Scientific Committee on Antarctic Research (SCAR). EG-ABI was created during the SCAR Life Sciences Standing Scientific Group meeting at the SCAR Open Science Conference in Portland Oregon, in July 2012, to advance work at this intersection by coordinating and participating in a range of projects across the SCAR biodiversity science portfolio. SCAR, meanwhile, is a thematic organisation of the International Science Council (ISC), which is the primary entity tasked with coordinating high-quality scientific research on all aspects of Antarctic sciences and humanities, including the Southern Ocean and the interplay between Antarctica and the other six continents. The expert group is led by an international steering committee of roughly ten members, who take an active role in leading related initiatives. Currently, researchers from Australia, Belgium, the United Kingdom, Chile, Germany, France, and the United States are represented on the committee. The current steering committee is comprised of a diverse range of scientists, including early-career researchers and scientists that have primary focuses in both the computational and ecological aspects of Antarctic biodiversity informatics. Current projects that are being coordinated or co-coordinated by EG-ABI include the SCAR/rOpenSci initiative, which is a collaboration with the rOpenSci community to improve resources for users of the R software package in Antarctic and Southern Ocean science. Additionally, EG-ABI has contributed to the POLA3R project (Polar Omics Linkages Antarctic Arctic and Alpine Regions), which is an information system dedicated to aid in the access and discovery of molecular microbial diversity data generated by Antarctic scientists. Furthermore, EG-ABI has trained and helped collate additional species trait information such as feeding and diet information, development, mobility and their importance to society, documented through Vulnerable Marine Ecosystem (VME) indicator taxa, in The Register of Antarctic Species (http://ras.biodiversity.aq/), and the comprehensive inventory of Antarctic and Southern Ocean organisms, which is also a component of the World Register of Marine Species (https://marinespecies.org/). The efforts highlighted are only some of the projects that the expert groups have contributed to. In our presentation, we discuss the previous accomplishments of the EG-ABI from the perspective of a currently serving steering committee member and outline its state in the status quo including collaborations and coordinated activities. We also highlight opportunities for engagement and the benefits for various stakeholders in terms of interacting with EG-ABI on multiple levels, within the SCAR ecosystem and elsewhere. Developing consistent and practical standards for data use in Antarctic ecology, in addition to fostering interdisciplinary and cross-sectoral collaborations for the successful deployment of conservation mechanisms, are key to a sustainable and biodiverse Antarctica, and EG-ABI is one of the premier organizations working towards these aims.


Author(s):  
Vincent Smith ◽  
Aino Juslén ◽  
Ana Casino ◽  
Francois Dusoulier ◽  
Lisa French ◽  
...  

In an effort to characterise the various dimensions of activity within the biodiversity informatics landscape, we developed a framework to survey these dimensions for ten major organisations*1 relative to both their current activities and long-term strategic ambitions. This survey assessed the contact between these infrastructure organisations by capturing the breadth of activities for each infrastructure across five categories (data, standards, software, hardware and policy), for nine types of data (specimens, collection descriptions, opportunistic observations, systematic observations, taxonomies, traits, geological data, molecular data, and literature), and for seven phases of activity (creation, aggregation, access, annotation, interlinkage, analysis, and synthesis). This generated a dataset of 6,300 verified observations, which have been scored and validated by leading members of each infrastructure organisation. In this analysis of the resulting data, we address a set of high-level questions about the overall biodiversity informatics landscape, looking at the greatest gaps, overlap and possible rate-limiting steps. Across the infrastructure organisations, we also explore how far each is in relation to achieving its ambitions and the extent of its niche relative to other organisations. Our results show that when viewed by scope, most infrastructures occupy a relatively narrow niche in the overall landscape of activity, with the notable exception of the Global Biodiversity Information Facility (GBIF) and possibly LifeWatch. Niches associated with molecular data and biological taxonomy are very well filled, suggesting there is still considerable room for growth in other areas, with the Distributed System of Scientific Collections (DiSSCo) and the Integrated European Long-Term Ecosystem Research Infrastructure (eLTER RI) showing the highest levels of difference between their current activities and stated ambitions, potentially reflecting the relative youth of these organisations. iNaturalist, the Biodiversity Heritage Library and Catalogue of Life all occupy narrow and tightly circumscribed niches. These organisations are also amongst the closest to achieving their stated ambitions within their respective areas of activity. The largest gaps in infrastructure activity relate to the development of hardware and standards, with many gaps set to be addressed if the stated ambitions of those surveyed come to fruition. Nevertheless, some gaps persist, outlining a potential role for this survey as a planning tool to help coordinate and align investment in future biodiversity informatics activities. GBIF and LifeWatch are the two infrastructures where there is the most similarity in ambition with DiSSCo, with the greatest overlap concentrated on activities related to data/content, specimen data and their shared ambition to interlink information. While overlap appears intense, the analysis is limited by the resolution of the survey framework and ignores existing collaborations between infrastructures. In addition to presenting the results of this survey, we outline our plans to publish this work and a proposal to develop the methodology as an interactive web-based tool. This would allow other projects and infrastructures to self-score their activities and visualise their niche within the current landscape, encouraging better global alignment of activities. For example, our results should make it easier for initiatives to strengthen collaboration and differentiate work when their activities overlap. Likewise, this approach would be useful for funding agencies when targeting gaps in the informatics landscape or increasing the technical maturity of certain critical activities, e.g., to improve immature data standards. While no framework is perfect, we hope to encourage a dialogue on the potential for taking an algorithmic approach to community alignment and see this as a means of strengthening community cooperation when addressing problems that require global cooperation.


Author(s):  
Luiz M. R. Gadelha ◽  
Pedro C. Siracusa ◽  
Eduardo Couto Dalcin ◽  
Luís Alexandre Estevão Silva ◽  
Douglas A. Augusto ◽  
...  

Author(s):  
Matthew Yoder ◽  
Hernán Pereira ◽  
José Luis Pereira ◽  
Dmitry Dmitriev ◽  
Geoffrey Ower ◽  
...  

TaxonWorks is a web-based workbench facilitating curation of a broad cross-section of biodiversity informatics concepts. Its development is currently led by the Species File Group. TaxonWorks has a large, JSON serving, application programming interface (API). This API is slowly being exposed for external use. The API is documented at https://api.taxonworks.org. Here we highlight some existing key features of the API focusing on the TaxonWorks concepts of People, Sources, Collection Objects, Taxon Names, and Downloads and provide a brief roadmap for upcoming additions. Highlights include the ability for data curators to produce shareable bibliographies, DarwinCore Archives (DwC-A), and Catalogue of Life-formatted datasets, access their nomenclature as autocompletes and via many filter facets, share Person metadata including numerous identifier types, and perform basic Geo-JSON and simple DwC-A parameter-based filtering on Collection Objects. As examples of what can be done with the API, we provide several visualizations that are straightforward to implement by those with basic R, Python, Javascript, or Ruby programming skills.


Author(s):  
Matthew Yoder

Specimen digitization software and tooling is moving well past its third decade of development, yet in many ways new tools have yet to leapfrog or overcome the initial innovation realized years ago. Here I argue that a biodiversity informatics bubble has emerged, creating demands of digitization tools that are not always in line with the requirements of physical specimen curators (or others doing actual science). Pressuring tools to keep up with concepts that have emerged from this bubble, for example Life Science Identifiers (LSIDs), and its parallels in the tech industry, for example microservices, has detracted from advancements that could be made with respect to day-to-day workflows and practices of the curators themselves. These advances in turn might provide a more enjoyable, intuitive, and ultimately sustainable foundation perhaps more immune to inevitable bubble bursts, hype-based derailments, and changes in scientific goals. How then should development proceed? We can observe that existing digitization software largely fits into two sides of the spectrum: commercial monoliths like EMu and "home-grown" efforts, e.g. Specify, Arctos, and Symbiota. I argue the latter are much more in-tune with user needs, because they were first built by the users themselves. Our approach, therefore, should be to go back to the well, the curator, the digitizer, the student hourly, and the person who has to fulfill requests of those using the physical collection itself, and seek their needs, and understand their experiences. With this understanding in place, i.e., a solid user-interface/experience foundation, we can build out tooling (and standards) that developers will want to utilize in their own software. These arguments and ideas are contextualized against TaxonWorks (http://taxonworks.org) and the experiences of the five collections now using it to digitize collections to illustrate their shortcomings and potentialities.


Author(s):  
José Augusto Salim ◽  
Antonio Saraiva

For those biologists and biodiversity data managers who are unfamiliar with information science data practices of data standardization, the use of complex software to assist in the creation of standardized datasets can be a barrier to sharing data. Since the ratification of the Darwin Core Standard (DwC) (Darwin Core Task Group 2009) by the Biodiversity Information Standards (TDWG) in 2009, many datasets have been published and shared through a variety of data portals. In the early stages of biodiversity data sharing, the protocol Distributed Generic Information Retrieval (DiGIR), progenitor of DwC, and later the protocols BioCASe and TDWG Access Protocol for Information Retrieval (TAPIR) (De Giovanni et al. 2010) were introduced for discovery, search and retrieval of distributed data, simplifying data exchange between information systems. Although these protocols are still in use, they are known to be inefficient for transferring large amounts of data (GBIF 2017). Because of that, in 2011 the Global Biodiversity Information Facility (GBIF) introduced the Darwin Core Archive (DwC-A), which allows more efficient data transfer, and has become the preferred format for publishing data in the GBIF network. DwC-A is a structured collection of text files, which makes use of the DwC terms to produce a single, self-contained dataset. Many tools for assisting data sharing using DwC-A have been introduced, such as the Integrated Publishing Toolkit (IPT) (Robertson et al. 2014), the Darwin Core Archive Assistant (GBIF 2010) and the Darwin Core Archive Validator. Despite promoting and facilitating data sharing, many users have difficulties using such tools, mainly because of the lack of training in information science in the biodiversity curriculum (Convention on Biological Diversiity 2012, Enke et al. 2012). However, most users are very familiar with spreadsheets to store and organize their data, but the adoption of the available solutions requires data transformation and training in information science and more specifically, biodiversity informatics. For an example of how spreadsheets can simplify data sharing see Stoev et al. (2016). In order to provide a more "familiar" approach to data sharing using DwC-A, we introduce a new tool as a Google Sheet Add-on. The Add-on, called Darwin Core Archive Assistant Add-on can be installed in the user's Google Account from the G Suite MarketPlace and used in conjunction with the Google Sheets application. The Add-on assists the mapping of spreadsheet columns/fields to DwC terms (Fig. 1), similar to IPT, but with the advantage that it does not require the user to export the spreadsheet and import it into another software. Additionally, the Add-on facilitates the creation of a star schema in accordance with DwC-A, by the definition of a "CORE_ID" (e.g. occurrenceID, eventID, taxonID) field between sheets of a document (Fig. 2). The Add-on also provides an Ecological Metadata Language (EML) (Jones et al. 2019) editor (Fig. 3) with minimal fields to be filled in (i.e., mandatory fields required by IPT), and helps users to generate and share DwC-Archives stored in the user's Google Drive, which can be downloaded as a DwC-A or automatically uploaded to another public storage resource like a user's Zenodo Account (Fig. 4). We expect that the Google Sheet Add-on introduced here, in conjunction with IPT, will promote biodiversity data sharing in a standardized format, as it requires minimal training and simplifies the process of data sharing from the user's perspective, mainly for those users not familiar with IPT, but that historically have worked with spreadsheets. Although the DwC-A generated by the add-on still needs to be published using IPT, it does provide a simpler interface (i.e., spreadsheet) for mapping data sets to DwC than IPT. Even though the IPT includes many more features than the Darwin Core Assistant Add-on, we expect that the Add-on can be a "starting point" for users unfamiliar with biodiversity informatics before they move on to more advanced data publishing tools. On the other hand, Zenodo integration allows users to share and cite their standardized data sets without publishing them via IPT, which can be useful for users without access to an IPT installation. Additionally, we are working on new features and future releases will include the automatic generation of Global Unique Identifiers for shared records, the possibility of adding additional data standards and DwC extensions, integration with GBIF REST API and with IPT REST API.


Sign in / Sign up

Export Citation Format

Share Document