Considerable Progress in Russian GBIF Community

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37015 ◽

2019 ◽

Vol 3 ◽

Author(s):

Maxim Shashkov ◽

Natalya Ivanova

Keyword(s):

Komi Republic ◽

Russian Language ◽

Data Sources ◽

Data Publishing ◽

Software Project ◽

Biodiversity Informatics ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Science System ◽

Global Biodiversity

Russia is a huge gap on the open access global biodiversity map of the Global Biodiversity Information Facility (GBIF). National biodiversity data are stored in various sources including museums, herbaria, scientific literature and reports as well as in the private collections and local databases. The best known and largest of the Russian herbarium collections are the collections stored in Komarov Botanical Institute of the Russian Academy of Science (>6 M sheets) and Moscow University (>1 M sheets). The largest zoological collection is located in Zoological institute of the Russian Academy of Science, with >60 M specimens. But most of the national biodiversity data is not yet digitized. The national biodiversity portal as well as the list of Russian biodiversity data sources are still absent. Despite this, projects and other activities are implemented to mobilize a national data using international biodiversity data standards. Currently Russia is not a GBIF member, but in the last 5 years, more than 1.6 M occurrences were published by Russian publishers through GBIF.org (69 datasets at the end of March 2019). The largest GBIF data provider in Russia is the Lomonosov Moscow State University. The Digital Moscow University Herbarium includes 971,732 specimens collected from Russia and many other countries. The Russian GBIF community is steadily expanding (Fig. 1); this is reflected in an increase in the number of publishers and published datasets. The current GBIF network infrastructure in Russia includes 5 IPT (Integrated Publishing Toolkit) installations in Saint Petersburg (two), Pushchino (Moscow region), Moscow, and Syktyvkar (Komi Republic). Russian-language biodiversity informatics materials are collected and presented from an informal web site http://gbif.ru/ with three main sections: data publishing through GBIF, Russian GBIF activities, and Russian biodiversity data sources. data publishing through GBIF, Russian GBIF activities, and Russian biodiversity data sources. Additional sections are dedicated to iNaturalist citizen science system and Russian Specify Software Project community. We provide technical helpdesk support not only for Russian publishers, but also for Russian speakers from the former USSR. The national mailing-list (via google groups) aims to provide a platform for news sharing. Now it includes >240 subscribers. Since the end of 2014, regular biodiversity informatics events are being held in Russia. Last year, two data training courses, funded by GBIF (project ID Russia-02 - "GBIF.ru data mobilization activities") and ForBIO (Research school in biosystematics), were organized in Moscow and Irkutsk region with the participation of 29 Russian researchers. National biodiversity informatics conferences were held in Apatity (2017) and Irkutsk (2018). We believe Russia already has a well established community that can become the basis for further development when Russia becomes a GBIF member.

Get full-text (via PubEx)

Mapping and Publishing Sequence-Derived Data through Biodiversity Data Platforms

Biodiversity Information Science and Standards ◽

10.3897/biss.4.59212 ◽

2020 ◽

Vol 4 ◽

Author(s):

Dmitry Schigel ◽

Anders Andersson ◽

Andrew Bissett ◽

Anders Finstad ◽

Frode Fossøy ◽

...

Keyword(s):

Policy Design ◽

Molecular Ecology ◽

Added Value ◽

Data Publishing ◽

Ecological Knowledge ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Wide Range ◽

Global Biodiversity ◽

Derived Data

Most users will foresee the use of genetic sequences in the context of molecular ecology or phylogenetic research, however, a sequence with coordinates and a timestamp is a valuable biodiversity occurrence that is useful in a much broader context than its original purpose. To uncover this potential, sequence-derived data need to become findable, accessible, interoperable, and reusable through generalist biodiversity data platforms. Stimulated by the Biodiversity_Next discussions in 2019, we have worked for about 10 months to put together practical data mapping and data publishing experiences in Norway, Australia, Sweden, and Denmark, as well as in the UNITE and the GBIF (Global Biodiversity Information Facility) networks. The resulting guide was put together to provide practical instruction for mapping sequence-derived data. Biodiversity data communities remain dominated by the macroscopic, easily detectable, morphologically identifiable species. This is not only true for citizen science and other forms of biodiversity popularization, but is also visible in the university and museum department structures, financial resource allocations, biodiversity legislation, and policy design. Recent decades of molecular advances have increased the power of genetic methods for detecting, describing, and documenting global biodiversity. We have yet to see the wide shift of data generating efforts from the traditional taxonomic foci of biodiversity assesments to the more balanced and inclusive systems focusing on all functionally important taxa and environments. These include soil, limnic and marine environments, decomposing plants and deadwood, and all life therein. Environmental DNA data enable recording of present and past presence of micro- and macroscopic organisms with minimal effort and by non-invasive methods. The apparent ease of these methods requires a cautious approach to the resulting data and their interpretation. It remains important to define and agree on the organism recording and reporting routines for genetic data. DNA data represent a major addition to the many ways in which GBIF and other biodiversity data platforms index the living world. Our guide is resting on the shoulders of those who have been developing and improving MIxS (Minimum Information about any (x) Sequence), GGBN (Global Genome Biodiversity Network) and other data standards. The added value of publishing sequence-derived data through non-genetic biodiversity discovery platforms relates to spatio-temporal occurrences and sequence-based names. Reporting sequence-derived occurrences in an open and reproducible way has a wide range of benefits: notably, it increases citability, highlights the taxa concerned in the context of biological conservation, and contributes to taxonomic and ecological knowledge.

Get full-text (via PubEx)

BiGe-Onto: An ontology-based system for managing biodiversity and biogeography data1

Applied Ontology ◽

10.3233/ao-200228 ◽

2020 ◽

Vol 15 (4) ◽

pp. 411-437 ◽

Cited By ~ 3

Author(s):

Marcos Zárate ◽

Germán Braun ◽

Pablo Fillottrani ◽

Claudio Delrieux ◽

Mirtha Lewis

Keyword(s):

Data Sources ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Sparql Endpoint ◽

Darwin Core ◽

Metadata Standards ◽

Great Progress ◽

Global Biodiversity ◽

Research Domains ◽

Biodiversity Information

Great progress to digitize the world’s available Biodiversity and Biogeography data have been made recently, but managing data from many different providers and research domains still remains a challenge. A review of the current landscape of metadata standards and ontologies in Biodiversity sciences suggests that existing standards, such as the Darwin Core terminology, are inadequate for describing Biodiversity data in a semantically meaningful and computationally useful way. As a contribution to fill this gap, we present an ontology-based system, called BiGe-Onto, designed to manage data together from Biodiversity and Biogeography. As data sources, we use two internationally recognized repositories: the Global Biodiversity Information Facility (GBIF) and the Ocean Biogeographic Information System (OBIS). BiGe-Onto system is composed of (i) BiGe-Onto Architecture (ii) a conceptual model called BiGe-Onto specified in OntoUML, (iii) an operational version of BiGe-Onto encoded in OWL 2, and (iv) an integrated dataset for its exploitation through a SPARQL endpoint. We will show use cases that allow researchers to answer questions that manage information from both domains.

Get full-text (via PubEx)

Connecting data and expertise: a new alliance for biodiversity knowledge

Biodiversity Data Journal ◽

10.3897/bdj.7.e33679 ◽

2019 ◽

Vol 7 ◽

Cited By ~ 19

Author(s):

Donald Hobern ◽

Brigitte Baptiste ◽

Kyle Copas ◽

Robert Guralnick ◽

Andrea Hahn ◽

...

Keyword(s):

Research Policy ◽

Coordination Mechanism ◽

Environmental Research ◽

Biodiversity Informatics ◽

Biodiversity Data ◽

Global Alliance ◽

Global Biodiversity Information Facility ◽

Sustainable Solutions ◽

Biodiversity Knowledge ◽

Global Biodiversity

There has been major progress over the last two decades in digitising historical knowledge of biodiversity and in making biodiversity data freely and openly accessible. Interlocking efforts bring together international partnerships and networks, national, regional and institutional projects and investments and countless individual contributors, spanning diverse biological and environmental research domains, government agencies and non-governmental organisations, citizen science and commercial enterprise. However, current efforts remain inefficient and inadequate to address the global need for accurate data on the world's species and on changing patterns and trends in biodiversity. Significant challenges include imbalances in regional engagement in biodiversity informatics activity, uneven progress in data mobilisation and sharing, the lack of stable persistent identifiers for data records, redundant and incompatible processes for cleaning and interpreting data and the absence of functional mechanisms for knowledgeable experts to curate and improve data. Recognising the need for greater alignment between efforts at all scales, the Global Biodiversity Information Facility (GBIF) convened the second Global Biodiversity Informatics Conference (GBIC2) in July 2018 to propose a coordination mechanism for developing shared roadmaps for biodiversity informatics. GBIC2 attendees reached consensus on the need for a global alliance for biodiversity knowledge, learning from examples such as the Global Alliance for Genomics and Health (GA4GH) and the open software communities under the Apache Software Foundation. These initiatives provide models for multiple stakeholders with decentralised funding and independent governance to combine resources and develop sustainable solutions that address common needs. This paper summarises the GBIC2 discussions and presents a set of 23 complementary ambitions to be addressed by the global community in the context of the proposed alliance. The authors call on all who are responsible for describing and monitoring natural systems, all who depend on biodiversity data for research, policy or sustainable environmental management and all who are involved in developing biodiversity informatics solutions to register interest at https://biodiversityinformatics.org/ and to participate in the next steps to establishing a collaborative alliance. The supplementary materials include brochures in a number of languages (English, Arabic, Spanish, Basque, French, Japanese, Dutch, Portuguese, Russian, Traditional Chinese and Simplified Chinese). These summarise the need for an alliance for biodiversity knowledge and call for collaboration in its establishment.

Get full-text (via PubEx)

APIs: A Common Interface for the Global Biodiversity Informatics Community

Biodiversity Information Science and Standards ◽

10.3897/biss.5.75267 ◽

2021 ◽

Vol 5 ◽

Author(s):

Ben Norton

Keyword(s):

Data Quality ◽

Quality Assessment ◽

Heterogeneous Data ◽

Data Sources ◽

Biodiversity Informatics ◽

Biodiversity Data ◽

Web Based ◽

Heterogeneous Data Sources ◽

Data Quality Assessment ◽

Global Biodiversity

Web APIs (Application Programming Interfaces) facilitate the exchange of resources (data) between two functionally independent entities across a common programmatic interface. In more general terms, Web APIs can connect almost anything to the world wide web. Unlike traditional software, APIs are not compiled, installed, or run. Instead, data are read (or consumed in API speak) through a web-based transaction, where a client makes a request and a server responds. Web APIs can be loosely grouped into two categories within the scope of biodiversity informatics, based on purpose. First, Product APIs deliver data products to end-users. Examples include the Global Biodiversity Information Facility (GBIF) and iNaturalist APIs. Designed and built to solve specific problems, web-based Service APIs are the second type and the focus of this presentation (referred to as Service APIs). Their primary function is to provide on-demand support to existing programmatic processes. Examples of this type include Elasticsearch Suggester API and geolocation, a service that delivers geographic locations from spatial input (latitude and longitude coordinates) (Pejic et al. 2010). Many challenges lie ahead for biodiversity informatics and the sharing of global biodiversity data (e.g., Blair et al. 2020). Service-driven, standardized web-based Service APIs that adhere to best practices within the scope of biodiversity informatics can provide the transformational change needed to address many of these issues. This presentation will highlight several critical areas of interest in the biodiversity data community, describing how Service APIs can address each individually. The main topics include: standardized vocabularies, interoperability of heterogeneous data sources and data quality assessment and remediation. standardized vocabularies, interoperability of heterogeneous data sources and data quality assessment and remediation. Fundamentally, the value of any innovative technical solution can be measured by the extent of community adoption. In the context of Service APIs, adoption takes two primary forms: financial and temporal investment in the construction of clients that utilize Service APIs and willingness of the community to integrate Service APIs into their own systems and workflows. financial and temporal investment in the construction of clients that utilize Service APIs and willingness of the community to integrate Service APIs into their own systems and workflows. To achieve this, Service APIs must be simple, easy to use, pragmatic, and designed with all major stakeholder groups in mind, including users, providers, aggregators, and architects (Anderson et al. 2020Anderson et al. 2020; this study). Unfortunately, many innovative and promising technical solutions have fallen short not because of an inability to solve problems (Verner et al. 2008), rather, they were difficult to use, built in isolation, and/or designed without effective communication with stakeholders. Fortunately, projects such as Darwin Core (Wieczorek et al. 2012), the Integrated Publishing Toolkit (Robertson et al. 2014), and Megadetector (Microsoft 2021) provide the blueprint for successful community adoption of a technological solution within the biodiversity community. The final section of this presentation will examine the often overlooked non-technical aspects of this technical endeavor. Within this context, specifically how following these models can broaden community engagement and bridge the knowledge gap between the major stakeholders, resulting in the successful implementation of Service APIs.

Get full-text (via PubEx)

SEINet: A Centralized Specimen Resource Managed by a Distributed Network of Researchers

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37424 ◽

2019 ◽

Vol 3 ◽

Author(s):

Edward Gilbert ◽

Corinna Gries ◽

Nico Franz ◽

Landrum Leslie R. ◽

Thomas H. Nash III

Keyword(s):

Data Distribution ◽

Data Publishing ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Diverse Range ◽

Darwin Core ◽

Global Biodiversity ◽

Specimen Management ◽

Biodiversity Information

The SEINet Portal Network has a complex social and development history spanning nearly two decades. Initially established as a basic online search engine for a select handful of biological collections curated within the southwestern United States, SEINet has since matured into a biodiversity data network incorporating more than 330 institutions and 1,900 individual data contributors. Participating institutions manage and publish over 14 million specimen records, 215,000 observations, and 8 million images. Approximately 70% of the collections make use of the data portal as their primary "live" specimen management platform. The SEINet interface now supports 13 regional data portals distributed across the United States and northern Mexico (http://symbiota.org/docs/seinet/). Through many collaborative efforts, it has matured into a tool for biodiversity data exploration, which includes species inventories, interactive identification keys, specimen and field images, taxonomic information, species distribution maps, and taxonomic descriptions. SEINet’s initial developmental goals were to construct a read-only interface that integrated specimen records harvested from a handful of distributed natural history databases. Intermittent network conductivity and inconsistent data exchange protocols frequently restricted data persistence. National funding opportunities supported a complete redesign towards the development of a centralized data cache model with periodic "snapshot" updates from original data sources. A service-based management infrastructure was integrated into the interface to mobilize small- to medium-sized collections (<1 million specimen records) that commonly lack consistent infrastructure and technical expertise to maintain a standard compliant specimen database. These developments were the precursors to the Symbiota software project (Gries et al. 2014). Through further development of Symbiota, SEINet transformed into a robust specimen management system specifically geared toward specimen digitization with features including data entry from label images, harvesting data from specimen duplicates, batch georeferencing, data validation and cleaning, generating progress reports, and additional tools to improve the efficiency of the digitization process. The central developmental paradigm focused on data mobilization through the production of: a versatile import module capable of ingesting a diverse range of data structures, a robust toolkit to assist in digitizing and managing specimen data and images, and a Darwin Core Archive (DwC-A) compliant data publishing and export toolkit to facilitate data distribution to global aggregators such as Global Biodiversity Information Facility (GBIF) and iDigBio. a versatile import module capable of ingesting a diverse range of data structures, a robust toolkit to assist in digitizing and managing specimen data and images, and a Darwin Core Archive (DwC-A) compliant data publishing and export toolkit to facilitate data distribution to global aggregators such as Global Biodiversity Information Facility (GBIF) and iDigBio. User interfaces consist of a decentralized network of regional data portals, all connecting to a centralized shared data source. Each of the 13 data portals are configured to present a regional perspective specifically tailored to represent the needs of the local research community. This infrastructure has supported the formation of regional consortia, who provide network support to aid local institutions in digitizing and publishing their collections within the network. The community-based infrastructure creates a sense of ownership – perhaps even good-natured competition – by the data providers and provides extra incentive to improve data quality and expand the network. Certain areas of development remain challenging in spite of the project's overall success. For instance, data managers continuously struggle to maintain a current local taxonomic thesaurus used for name validation, data cleaning, and to resolve taxonomic discrepancies commonly encountered when integrating collection datasets. We will discuss the successes and challenges associated with the long-term sustainability model and explore potential future paths for SEINet that support the long-term goal of maintaining a data provider that is in full compliance with the FAIR use principles of making the datasets findable, accessible, interoperable, and reusable (Wilkinson et al. 2016).

Get full-text (via PubEx)

BioDATA - Biodiversity Data for Internationalisation in Higher Education

Research Ideas and Outcomes ◽

10.3897/rio.5.e36276 ◽

2019 ◽

Vol 5 ◽

Author(s):

Oleh Prylutskyi ◽

Armine Abrahamyan ◽

Nina Voronova ◽

Tatevik Aloyan ◽

Oleg Borodin ◽

...

Keyword(s):

Higher Education ◽

Data Management ◽

Publishing Research ◽

Data Publishing ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Postgraduate Students ◽

Intensive Courses ◽

Global Biodiversity ◽

Biodiversity Information

BioDATA is an international project on developing skills in biodiversity data management and data publishing. Between 2018 and 2021, undergraduate and postgraduate students from Armenia, Belarus, Tajikistan, and Ukraine, have an opportunity to take part in the intensive courses to become certified professionals in biodiversity data management. They will gain practical skills and obtain appropriate knowledge on: international data standards (Darwin Core); data cleaning software, data publishing software such as the Integrated Publishing Toolkit (IPT), and preparation of data papers. Working with databases, creating datasets, managing data for statistical analyses and publishing research papers are essential for the everyday tasks of a modern biologist. At the same time, these skills are rarely taught in higher education. Most of the contemporary professionals in biodiversity have to gain these skills independently, through colleagues, or through supervision. In addition, all the participants familiarize themselves with one of the important international research data infrastructures such as the Global Biodiversity Information Facility (GBIF). The project is coordinated by the University of Oslo (Norway) and supported by the Global Biodiversity Information Facility (GBIF). The project is funded by the Norwegian Agency for International Cooperation and Quality Enhancement in Higher Education (DIKU).

Get full-text (via PubEx)

Completeness of Digital Accessible Knowledge of the Plants of Ghana

Biodiversity Informatics ◽

10.17161/bi.v11i0.5860 ◽

2016 ◽

Vol 11 ◽

Cited By ~ 7

Author(s):

Alex Asase ◽

A. Townsend Peterson

Keyword(s):

Geographic Distance ◽

Northern Ghana ◽

Biodiversity Informatics ◽

Primary Research ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Herbarium Data ◽

Research Grade ◽

Biodiversity Information

Providing comprehensive, informative, primary, research-grade biodiversity information represents an important focus of biodiversity informatics initiatives. Recent efforts within Ghana have digitized >90% of primary biodiversity data records associated with specimen sheets in Ghanaian herbaria; additional herbarium data are available from other institutions via biodiversity informatics initiatives such as the Global Biodiversity Information Facility. However, data on the plants of Ghana have not as yet been integrated and assessed to establish how complete site inventories are, so that appropriate levels of confidence can be applied. In this study, we assessed inventory completeness and identified gaps in current Digital Accessible Knowledge (DAK) of the plants of Ghana, to prioritize areas for future surveys and inventories. We evaluated the completeness of inventories at ½° spatial resolution using statistics that summarize inventory completeness, and characterized gaps in coverage in terms of geographic distance and climatic difference from well-documented sites across the country. The southwestern and southeastern parts of the country held many well-known grid cells; the largest spatial gaps were found in central and northern parts of the country. Climatic difference showed contrasting patterns, with a dramatic gap in coverage in central-northern Ghana. This study provides a detailed case study of how to prioritize for new botanical surveys and inventories based on existing DAK.

Get full-text (via PubEx)

Towards a Post-Graduate Level Curriculum for Biodiversity Informatics. Perspectives from the Global Biodiversity Information Facility (GBIF) Community

Biodiversity Data Journal ◽

10.3897/bdj.9.e68010 ◽

2021 ◽

Vol 9 ◽

Author(s):

Fatima Parker-Allie ◽

Francisco Pando ◽

Anders Telenius ◽

Jean Ganglo ◽

Danny Vélez ◽

...

Keyword(s):

Biological Data ◽

Initial Assessment ◽

Biodiversity Informatics ◽

Global Biodiversity Information Facility ◽

Policy Makers ◽

Academic Teaching ◽

E Learning ◽

Learning Platforms ◽

Global Biodiversity ◽

Biodiversity Information

Biodiversity informatics is a new and evolving field, requiring efforts to develop capacity and a curriculum for this field of science. The main objective was to summarise the level of activity and the efforts towards developing biodiversity informatics curricula, for work-based training and/or academic teaching at universities, taking place within the Global Biodiversity Information Facility (GBIF) countries and its associated network. A survey approach was used to identify existing capacities and resources within the network. Most of GBIF Nodes survey respondents (80%) are engaged in onsite training activities, with a focus on work-based professionals, mostly researchers, policy-makers and students. Training topics include data mobilisation, digitisation, management, publishing, analysis and use, to enable the accessibility of analogue and digital biological data that currently reside as scattered datasets. An initial assessment of academic teaching activities highlighted that countries in most regions, to varying degrees, were already engaged in the conceptualisation, development and/or implementation of formal academic programmes in biodiversity informatics, including programmes in Benin, Colombia, Costa Rica, Finland, France, India, Norway, South Africa, Sweden, Taiwan and Togo. Digital e-learning platforms were an important tool to help build capacity in many countries. In terms of the potential in the Nodes network, 60% expressed willingness to be recruited or commissioned for capacity enhancement purposes. Contributions and activities of various country nodes across the network have been highlighted and a working curriculum framework has been defined.

Get full-text (via PubEx)

Data integration enables global biodiversity synthesis

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2018093118 ◽

2021 ◽

Vol 118 (6) ◽

pp. e2018093118

Author(s):

J. Mason Heberling ◽

Joseph T. Miller ◽

Daniel Noesgaard ◽

Scott B. Weingart ◽

Dmitry Schigel

Keyword(s):

Data Integration ◽

Species Interactions ◽

Large Scale ◽

Data Use ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Research Areas ◽

Global Biodiversity ◽

Biodiversity Information ◽

Global Data

The accessibility of global biodiversity information has surged in the past two decades, notably through widespread funding initiatives for museum specimen digitization and emergence of large-scale public participation in community science. Effective use of these data requires the integration of disconnected datasets, but the scientific impacts of consolidated biodiversity data networks have not yet been quantified. To determine whether data integration enables novel research, we carried out a quantitative text analysis and bibliographic synthesis of >4,000 studies published from 2003 to 2019 that use data mediated by the world’s largest biodiversity data network, the Global Biodiversity Information Facility (GBIF). Data available through GBIF increased 12-fold since 2007, a trend matched by global data use with roughly two publications using GBIF-mediated data per day in 2019. Data-use patterns were diverse by authorship, geographic extent, taxonomic group, and dataset type. Despite facilitating global authorship, legacies of colonial science remain. Studies involving species distribution modeling were most prevalent (31% of literature surveyed) but recently shifted in focus from theory to application. Topic prevalence was stable across the 17-y period for some research areas (e.g., macroecology), yet other topics proportionately declined (e.g., taxonomy) or increased (e.g., species interactions, disease). Although centered on biological subfields, GBIF-enabled research extends surprisingly across all major scientific disciplines. Biodiversity data mobilization through global data aggregation has enabled basic and applied research use at temporal, spatial, and taxonomic scales otherwise not possible, launching biodiversity sciences into a new era.

Get full-text (via PubEx)

BIDDSAT: visualizing the content of biodiversity data publishers in the Global Biodiversity Information Facility network

Bioinformatics ◽

10.1093/bioinformatics/bts359 ◽

2012 ◽

Vol 28 (16) ◽

pp. 2207-2208 ◽

Cited By ~ 6

Author(s):

J. Otegui ◽

A. H. Arino

Keyword(s):

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Global Biodiversity ◽

Biodiversity Information

Get full-text (via PubEx)