APIs: A Common Interface for the Global Biodiversity Informatics Community

Biodiversity Information Science and Standards ◽

10.3897/biss.5.75267 ◽

2021 ◽

Vol 5 ◽

Author(s):

Ben Norton

Keyword(s):

Data Quality ◽

Quality Assessment ◽

Heterogeneous Data ◽

Data Sources ◽

Biodiversity Informatics ◽

Biodiversity Data ◽

Web Based ◽

Heterogeneous Data Sources ◽

Data Quality Assessment ◽

Global Biodiversity

Web APIs (Application Programming Interfaces) facilitate the exchange of resources (data) between two functionally independent entities across a common programmatic interface. In more general terms, Web APIs can connect almost anything to the world wide web. Unlike traditional software, APIs are not compiled, installed, or run. Instead, data are read (or consumed in API speak) through a web-based transaction, where a client makes a request and a server responds. Web APIs can be loosely grouped into two categories within the scope of biodiversity informatics, based on purpose. First, Product APIs deliver data products to end-users. Examples include the Global Biodiversity Information Facility (GBIF) and iNaturalist APIs. Designed and built to solve specific problems, web-based Service APIs are the second type and the focus of this presentation (referred to as Service APIs). Their primary function is to provide on-demand support to existing programmatic processes. Examples of this type include Elasticsearch Suggester API and geolocation, a service that delivers geographic locations from spatial input (latitude and longitude coordinates) (Pejic et al. 2010). Many challenges lie ahead for biodiversity informatics and the sharing of global biodiversity data (e.g., Blair et al. 2020). Service-driven, standardized web-based Service APIs that adhere to best practices within the scope of biodiversity informatics can provide the transformational change needed to address many of these issues. This presentation will highlight several critical areas of interest in the biodiversity data community, describing how Service APIs can address each individually. The main topics include: standardized vocabularies, interoperability of heterogeneous data sources and data quality assessment and remediation. standardized vocabularies, interoperability of heterogeneous data sources and data quality assessment and remediation. Fundamentally, the value of any innovative technical solution can be measured by the extent of community adoption. In the context of Service APIs, adoption takes two primary forms: financial and temporal investment in the construction of clients that utilize Service APIs and willingness of the community to integrate Service APIs into their own systems and workflows. financial and temporal investment in the construction of clients that utilize Service APIs and willingness of the community to integrate Service APIs into their own systems and workflows. To achieve this, Service APIs must be simple, easy to use, pragmatic, and designed with all major stakeholder groups in mind, including users, providers, aggregators, and architects (Anderson et al. 2020Anderson et al. 2020; this study). Unfortunately, many innovative and promising technical solutions have fallen short not because of an inability to solve problems (Verner et al. 2008), rather, they were difficult to use, built in isolation, and/or designed without effective communication with stakeholders. Fortunately, projects such as Darwin Core (Wieczorek et al. 2012), the Integrated Publishing Toolkit (Robertson et al. 2014), and Megadetector (Microsoft 2021) provide the blueprint for successful community adoption of a technological solution within the biodiversity community. The final section of this presentation will examine the often overlooked non-technical aspects of this technical endeavor. Within this context, specifically how following these models can broaden community engagement and bridge the knowledge gap between the major stakeholders, resulting in the successful implementation of Service APIs.

Get full-text (via PubEx)

The Online Pollen Catalogs Network (RCPol) data quality assurance system

Biodiversity Information Science and Standards ◽

10.3897/biss.2.25657 ◽

2018 ◽

Vol 2 ◽

pp. e25657

Author(s):

Allan Veiga ◽

Antonio Saraiva ◽

Cláudia da Silva

Keyword(s):

Quality Assurance ◽

Data Quality ◽

Quality Assessment ◽

Quality Measurement ◽

Expert Assessment ◽

Web Based ◽

Data Quality Assessment ◽

Measurement Validation ◽

Machine Readable ◽

Quality Reports

The Online Pollen Catalogs Network (RCPol) (http://rcpol.org.br) was conceived to promote interaction among researchers and the integration of data from pollen collections, herbaria and bee collections. In order to structure RCPol work, researchers and collaborators have organized information on Palynology in four branches: palynoecology, paleopalynology, palynotaxonomy and spores. This information is collaboratively digitized and managed using standardized Google Spreadsheets. These datasets are assessed by the RCPol palynology experts and when a dataset is compliant with the RCPol data quality policy, it is published to http://chaves.rcpol.org.br. Data quality assessment used to be performed manually by the experts and was time-consuming and inconsistent in detecting data quality problemas such as incomplete and inconsistent information. In order to support data quality assessment in a more automated and effective way, we are developing a data quality tool which implements a series of mechanisms to measure, validate and improve completeness, consistency, conformity, accessibility and uniqueness of data, prior to a manual expert assessment. The system was designed according to the conceptual framework proposed by Task Group 1 of the Biodiversity Data Quality Interest Group Veiga et al. 2017. For each sheet in the Google Spreadsheet, the system generates a set of assertions of measures, validations and amendments for the records (rows) and datasets (sheets), according to a profile defined for RCPol. The profile follows the policies of data quality measurement, validation and enhancement. The data quality measurement policy encompassess the dimensions of completeness, consistency, conformity, accessibility and uniqueness. RCPol uses a quality assurance approach: only data that are compliant with all the quality requirements are published in the system. Therefore, its data quality validation policy only considers datasets with 100% completeness, consistency, conformity, accessibility and uniqueness. In order to improve the quality in each relevant dimension, a set of enhancements was defined in the data quality enhancement policy. Based on this RCPol profile, the system is able to generate reports that contain measures, validations and amendments assertions with the method and tool used to generate the assertion. This web-based system can be tested at http://chaves.rcpol.org.br/admin/data-quality with the dataset https://docs.google.com/spreadsheets/u/1/d/1gH0aa2qqnAgfAixGom3Gnx6Qp 91ZvWhUHPb_QeoIreQ. This system is able to assure that only data compliant with the data quality profile defined by RCPol are fit for use and can be published. This system contributes significantly to decreasing the workload of the experts. Some data may still contain values that cannot be easily automatically assessed, e.g. validate if the content of an image matches the respective scientific name, so expert manual assessment remains necessary. After the system reports that data are compliant with the profile, a manual assessment must be performed by the experts, using the data quality report as support, and only after that will the data be published. The next steps include archival of the data quality reports in a database, improving the web interface to enable searching and sorting of assertions, and to provide a machine readable interface for the data quality reports.

Get full-text (via PubEx)

A Data Quality Assessment Model and Its Application to Cybersecurity Data Sources

13th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2020) - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-030-57805-3_25 ◽

2020 ◽

pp. 263-272

Author(s):

Noemí DeCastro-García ◽

Enrique Pinto

Keyword(s):

Data Quality ◽

Quality Assessment ◽

Assessment Model ◽

Data Sources ◽

Data Quality Assessment

Get full-text (via PubEx)

Considerable Progress in Russian GBIF Community

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37015 ◽

2019 ◽

Vol 3 ◽

Author(s):

Maxim Shashkov ◽

Natalya Ivanova

Keyword(s):

Komi Republic ◽

Russian Language ◽

Data Sources ◽

Data Publishing ◽

Software Project ◽

Biodiversity Informatics ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Science System ◽

Global Biodiversity

Russia is a huge gap on the open access global biodiversity map of the Global Biodiversity Information Facility (GBIF). National biodiversity data are stored in various sources including museums, herbaria, scientific literature and reports as well as in the private collections and local databases. The best known and largest of the Russian herbarium collections are the collections stored in Komarov Botanical Institute of the Russian Academy of Science (>6 M sheets) and Moscow University (>1 M sheets). The largest zoological collection is located in Zoological institute of the Russian Academy of Science, with >60 M specimens. But most of the national biodiversity data is not yet digitized. The national biodiversity portal as well as the list of Russian biodiversity data sources are still absent. Despite this, projects and other activities are implemented to mobilize a national data using international biodiversity data standards. Currently Russia is not a GBIF member, but in the last 5 years, more than 1.6 M occurrences were published by Russian publishers through GBIF.org (69 datasets at the end of March 2019). The largest GBIF data provider in Russia is the Lomonosov Moscow State University. The Digital Moscow University Herbarium includes 971,732 specimens collected from Russia and many other countries. The Russian GBIF community is steadily expanding (Fig. 1); this is reflected in an increase in the number of publishers and published datasets. The current GBIF network infrastructure in Russia includes 5 IPT (Integrated Publishing Toolkit) installations in Saint Petersburg (two), Pushchino (Moscow region), Moscow, and Syktyvkar (Komi Republic). Russian-language biodiversity informatics materials are collected and presented from an informal web site http://gbif.ru/ with three main sections: data publishing through GBIF, Russian GBIF activities, and Russian biodiversity data sources. data publishing through GBIF, Russian GBIF activities, and Russian biodiversity data sources. Additional sections are dedicated to iNaturalist citizen science system and Russian Specify Software Project community. We provide technical helpdesk support not only for Russian publishers, but also for Russian speakers from the former USSR. The national mailing-list (via google groups) aims to provide a platform for news sharing. Now it includes >240 subscribers. Since the end of 2014, regular biodiversity informatics events are being held in Russia. Last year, two data training courses, funded by GBIF (project ID Russia-02 - "GBIF.ru data mobilization activities") and ForBIO (Research school in biosystematics), were organized in Moscow and Irkutsk region with the participation of 29 Russian researchers. National biodiversity informatics conferences were held in Apatity (2017) and Irkutsk (2018). We believe Russia already has a well established community that can become the basis for further development when Russia becomes a GBIF member.

Get full-text (via PubEx)

TACTICS FOR DYNAMIC DATA CLEANSING AND DATA PROFILING USING DIMENSIONS FOR DATA QUALITY ASSESSMENT

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i4.271276 ◽

2018 ◽

Vol 6 (4) ◽

pp. 271-276

Author(s):

A. Ghouse Mohiddin S. Ramakrishna ◽

Keyword(s):

Data Quality ◽

Quality Assessment ◽

Data Cleansing ◽

Dynamic Data ◽

Data Profiling ◽

Data Quality Assessment

Get full-text (via PubEx)

Information Credibility Assessment and Meta Data Modeling in Integrating Heterogeneous Data Sources

10.21236/ada409695 ◽

2002 ◽

Cited By ~ 1

Author(s):

Peter P. Chen

Keyword(s):

Data Modeling ◽

Heterogeneous Data ◽

Data Sources ◽

Credibility Assessment ◽

Meta Data ◽

Heterogeneous Data Sources ◽

Information Credibility

Get full-text (via PubEx)

Methodology of Big Data Integration from A Priori Unknown Heterogeneous Data Sources

Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence - CSAI '18 ◽

10.1145/3297156.3297249 ◽

2018 ◽

Author(s):

Alexey Samoylov ◽

Nikolay Sergeev ◽

Margarita Kucherova ◽

Boris Denisov

Keyword(s):

Big Data ◽

Data Integration ◽

A Priori ◽

Heterogeneous Data ◽

Data Sources ◽

Heterogeneous Data Sources

Get full-text (via PubEx)

Importance of GNSS data quality assessment with novel control criteria in professional soccer match-play

International Journal of Performance Analysis in Sport ◽

10.1080/24748668.2021.1947017 ◽

2021 ◽

pp. 1-11

Author(s):

Aman Singh Shergill ◽

Jamie Twist ◽

Craig Highton

Keyword(s):

Data Quality ◽

Quality Assessment ◽

Match Play ◽

Soccer Match ◽

Professional Soccer ◽

Data Quality Assessment ◽

Control Criteria ◽

Gnss Data

Get full-text (via PubEx)

Matching disparate dimensions for analytical integration of heterogeneous data sources

Proceedings of the 11th International Conference on Management of Digital EcoSystems ◽

10.1145/3297662.3365809 ◽

2019 ◽

Author(s):

Anna Korobko ◽

Aleksei Korobko

Keyword(s):

Heterogeneous Data ◽

Data Sources ◽

Heterogeneous Data Sources ◽

Analytical Integration

Get full-text (via PubEx)

Privacy Preserving Data Quality Assessment for High-Fidelity Data Sharing

Proceedings of the 2014 ACM Workshop on Information Sharing & Collaborative Security - WISCS '14 ◽

10.1145/2663876.2663885 ◽

2014 ◽

Cited By ~ 1

Author(s):

Julien Freudiger ◽

Shantanu Rane ◽

Alejandro E. Brito ◽

Ersin Uzun

Keyword(s):

Data Quality ◽

Quality Assessment ◽

Data Sharing ◽

Privacy Preserving ◽

High Fidelity ◽

Data Quality Assessment

Get full-text (via PubEx)

Towards Configurable Composite Data Quality Assessment

2019 IEEE 21st Conference on Business Informatics (CBI) ◽

10.1109/cbi.2019.00035 ◽

2019 ◽

Cited By ~ 1

Author(s):

Paolo Ceravolo ◽

Emanuele Bellini

Keyword(s):

Data Quality ◽

Quality Assessment ◽

Data Quality Assessment ◽

Composite Data

Get full-text (via PubEx)