Use of an automatic data quality control algorithm for crude oil viscosity data

2004 ◽  
Vol 219 (2) ◽  
pp. 113-121 ◽  
Author(s):  
Maria A. Barrufet ◽  
Dominique Dexheimer
1991 ◽  
Vol 75 (Appendix) ◽  
pp. 114-115
Author(s):  
Hiroshi Nakamura ◽  
Yasuko Koga ◽  
Yo Shibata ◽  
Yuichiro Ota ◽  
Toru Otsuru

2015 ◽  
Vol 32 (6) ◽  
pp. 1209-1223 ◽  
Author(s):  
Valliappa Lakshmanan ◽  
Christopher Karstens ◽  
John Krause ◽  
Kim Elmore ◽  
Alexander Ryzhkov ◽  
...  

AbstractRecently, a radar data quality control algorithm has been devised to discriminate between weather echoes and echoes due to nonmeteorological phenomena, such as bioscatter, instrument artifacts, and ground clutter (Lakshmanan et al.), using the values of polarimetric moments at and around a range gate. Because the algorithm was created by optimizing its weights over a large reference dataset, statistical methods can be employed to examine the importance of the different variables in the context of discriminating between weather and no-weather echoes. Among the variables studied for their impact on the ability to identify and censor nonmeteorological artifacts from weather radar data, the method of successive permutations ranks the variance of Zdr, the reflectivity structure of the virtual volume scan, and the range derivative of the differential phase on propagation [PhiDP (Kdp)] as the most important. The same statistical framework can be used to study the impact of calibration errors in variables such as Zdr. The effects of Zdr calibration errors were found to be negligible.


Author(s):  
Felipe Simoes ◽  
Donat Agosti ◽  
Marcus Guidoti

Automatic data mining is not an easy task and its success in the biodiversity world is deeply tied to the standardization and consistency of scientific journals' layout structure. The various formatting styles found in the over 500 million pages of published biodiversity information (Kalfatovich 2010), pose a remarkable challenge towards the goal of automating the liberation of data currently trapped on the printed page. Regular expressions and other pattern-recognition strategies invariably fail to cope with this diverse landscape of academic publishing. Challenges such as incomplete data and taxonomic uncertainty add several additional layers of complexity. However, in the era of big data, the liberation of all the different facts contained in biodiversity literature is of crucial importance. Plazi tackles this daunting task by providing workflows and technology to automatically process biodiversity publications and annotate the information therein, all within the principles of FAIR (findable, accessible, interoperable, and reusable) data usage (Agosti and Egloff 2009). It uses the concept of taxonomic treatments (Catapano 2019) as the most fundamental unit in biodiversity literature, to provide a framework that reflects the reality of taxonomic data for linking the different pieces of information contained in these taxonomic treatments. Treatment citations, composed of a taxonomic name and a bibliographic reference, and material citations carrying all specimen-related information are additional conceptual cornerstones for this framework. The resulting enhanced data are added to TreatmentBank. Figures and treatments are made Findable, Accessible, Interoperable and Reuseable (FAIR) by depositing them including specific metadata to the Biodiversity Literature Repository community (BLR) at the European Organization for Nuclear Research (CERN) repository Zenodo, and pushed to GBIF. The automation, however, is error prone due to the constraints explained above. In order to cope with this remarkable task without compromising data quality, Plazi has established a quality control process, based on logical rules that check the components of the extracted document raising errors in four different levels of severity. These errors are also used in a data transit control mechanism, “the gatekeeper”, which blocks certain data transits to create deposits (e.g., BLR) or reuse of data (e.g., GBIF) in the presence of specific errors. Finally, a set of automatic notifications were included in the plazi/community Github repository, in order to provide a channel that empowers external users to report data issues directly to a dedicated team of data miners, which will in turn and in a timely manner, fix these issues, improving data quality on demand. In this talk, we aim to explain Plazi’s internal quality control process and phases, the data transits that are potentially affected, as well as statistics on the most common issues raised by this automated endeavor and how we use the generated data to continuously improve this important step in Plazi's workflow.


Sign in / Sign up

Export Citation Format

Share Document