scholarly journals Delivering Fit-for-Use Data: Quality control

Author(s):  
Felipe Simoes ◽  
Donat Agosti ◽  
Marcus Guidoti

Automatic data mining is not an easy task and its success in the biodiversity world is deeply tied to the standardization and consistency of scientific journals' layout structure. The various formatting styles found in the over 500 million pages of published biodiversity information (Kalfatovich 2010), pose a remarkable challenge towards the goal of automating the liberation of data currently trapped on the printed page. Regular expressions and other pattern-recognition strategies invariably fail to cope with this diverse landscape of academic publishing. Challenges such as incomplete data and taxonomic uncertainty add several additional layers of complexity. However, in the era of big data, the liberation of all the different facts contained in biodiversity literature is of crucial importance. Plazi tackles this daunting task by providing workflows and technology to automatically process biodiversity publications and annotate the information therein, all within the principles of FAIR (findable, accessible, interoperable, and reusable) data usage (Agosti and Egloff 2009). It uses the concept of taxonomic treatments (Catapano 2019) as the most fundamental unit in biodiversity literature, to provide a framework that reflects the reality of taxonomic data for linking the different pieces of information contained in these taxonomic treatments. Treatment citations, composed of a taxonomic name and a bibliographic reference, and material citations carrying all specimen-related information are additional conceptual cornerstones for this framework. The resulting enhanced data are added to TreatmentBank. Figures and treatments are made Findable, Accessible, Interoperable and Reuseable (FAIR) by depositing them including specific metadata to the Biodiversity Literature Repository community (BLR) at the European Organization for Nuclear Research (CERN) repository Zenodo, and pushed to GBIF. The automation, however, is error prone due to the constraints explained above. In order to cope with this remarkable task without compromising data quality, Plazi has established a quality control process, based on logical rules that check the components of the extracted document raising errors in four different levels of severity. These errors are also used in a data transit control mechanism, “the gatekeeper”, which blocks certain data transits to create deposits (e.g., BLR) or reuse of data (e.g., GBIF) in the presence of specific errors. Finally, a set of automatic notifications were included in the plazi/community Github repository, in order to provide a channel that empowers external users to report data issues directly to a dedicated team of data miners, which will in turn and in a timely manner, fix these issues, improving data quality on demand. In this talk, we aim to explain Plazi’s internal quality control process and phases, the data transits that are potentially affected, as well as statistics on the most common issues raised by this automated endeavor and how we use the generated data to continuously improve this important step in Plazi's workflow.

1991 ◽  
Vol 75 (Appendix) ◽  
pp. 114-115
Author(s):  
Hiroshi Nakamura ◽  
Yasuko Koga ◽  
Yo Shibata ◽  
Yuichiro Ota ◽  
Toru Otsuru

Author(s):  
H. Visuri ◽  
J. Jokela ◽  
N. Mesterton ◽  
P. Latvala ◽  
T. Aarnio

<p><strong>Abstract.</strong> The amount and the quality of 3D spatial data are growing constantly, but the data is collected and stored in a distributed fashion by various data collecting organizations. This may lead to problems regarding interoperability, usability and availability of the data. Traditionally, national spatial data infrastructures have focused on 2D data, but recently there has been great progress towards introducing also 3D spatial data in governmental services. This paper studies the process of creating a country-wide 3D data repository in Finland and visualizing it for the public by using an open source map application. The 3D spatial data is collected and stored into one national topographic database that provides information for the whole society. The data quality control process is executed with an automated data quality module as a part of the import process to the database. The 3D spatial data is served from the database for the visualization via 3D service and the visualization is piloted in the National Geoportal.</p>


Water ◽  
2021 ◽  
Vol 13 (20) ◽  
pp. 2820
Author(s):  
Gimoon Jeong ◽  
Do-Guen Yoo ◽  
Tae-Woong Kim ◽  
Jin-Young Lee ◽  
Joon-Woo Noh ◽  
...  

In our intelligent society, water resources are being managed using vast amounts of hydrological data collected through telemetric devices. Recently, advanced data quality control technologies for data refinement based on hydrological observation history, such as big data and artificial intelligence, have been studied. However, these are impractical due to insufficient verification and implementation periods. In this study, a process to accurately identify missing and false-reading data was developed to efficiently validate hydrological data by combining various conventional validation methods. Here, false-reading data were reclassified into suspected and confirmed groups by combining the results of individual validation methods. Furthermore, an integrated quality control process that links data validation and reconstruction was developed. In particular, an iterative quality control feedback process was proposed to achieve highly reliable data quality, which was applied to precipitation and water level stations in the Daecheong Dam Basin, South Korea. The case study revealed that the proposed approach can improve the quality control procedure of hydrological database and possibly be implemented in practice.


2021 ◽  
pp. 1-11
Author(s):  
Song Gang ◽  
Wang Xiaoming ◽  
Wu Junfeng ◽  
Li Shufang ◽  
Liu Zhuowen ◽  
...  

In view of the production quality management of filter rods in the manufacturing and execution process of cigarette enterprises, this paper analyzes the necessity of implementing the manufacturing execution system (MES) in the production process of filter rods. In this paper, the filter rod quality system of cigarette enterprise based on MES is fully studied, and the constructive information management system demand analysis, cigarette quality control process, system function module design, implementation and test effect are given. This paper utilizes the Fuzzy analytic hierarchy process to find the optimal system for processing the manufacturing of cigarette. The implementation of MSE based filter rod quality information management system for a cigarette enterprise ensures the quality control in the cigarette production process. Through visualization, real-time and dynamic way, the information management of cigarette production is completed, which greatly improves the quality of cigarette enterprise manufacturing process.


Sign in / Sign up

Export Citation Format

Share Document