scholarly journals Laying the Groundwork for Developing International Community Guidelines to Effectively Share and Reuse Digital Data Quality Information – Case Statement, Workshop Summary Report, and Path Forward

2020 ◽  
Author(s):  
Ge Peng ◽  
Carlo Lacagnina ◽  
Robert R. Downs ◽  
Ivana Ivanova ◽  
David F. Moroni ◽  
...  

This document provides background for and summarizes main takeaways of a workshop held virtually to kick off the development of community guidelines for consistently curating and representing dataset quality information in a way that is in line with the FAIR principles.

2021 ◽  
Author(s):  
Carlo Lacagnina ◽  
Ge Peng ◽  
Robert R. Downs ◽  
Hampapuram Ramapriyan ◽  
Ivana Ivanova ◽  
...  

<p>The knowledge of data quality and the quality of the associated information, including metadata, is critical for data use and reuse. Assessment of data and metadata quality is key for ensuring credible available information, establishing a foundation of trust between the data provider and various downstream users, and demonstrating compliance with requirements established by funders and federal policies.</p><p>Data quality information should be consistently curated, traceable, and adequately documented to provide sufficient evidence to guide users to address their specific needs. The quality information is especially important for data used to support decisions and policies, and for enabling data to be truly findable, accessible, interoperable, and reusable (FAIR).</p><p>Clear documentation of the quality assessment protocols used can promote the reuse of quality assurance practices and thus support the generation of more easily-comparable datasets and quality metrics. To enable interoperability across systems and tools, the data quality information should be machine-actionable. Guidance on the curation of dataset quality information can help to improve the practices of various stakeholders who contribute to the collection, curation, and dissemination of data.</p><p>This presentation outlines a global community effort to develop international guidelines to curate data quality information that is consistent with the FAIR principles throughout the entire data life cycle and inheritable by any derivative product.</p>


Author(s):  
Syed Mustafa Ali ◽  
Farah Naureen ◽  
Arif Noor ◽  
Maged Kamel N. Boulos ◽  
Javariya Aamir ◽  
...  

Background Increasingly, healthcare organizations are using technology for the efficient management of data. The aim of this study was to compare the data quality of digital records with the quality of the corresponding paper-based records by using data quality assessment framework. Methodology We conducted a desk review of paper-based and digital records over the study duration from April 2016 to July 2016 at six enrolled TB clinics. We input all data fields of the patient treatment (TB01) card into a spreadsheet-based template to undertake a field-to-field comparison of the shared fields between TB01 and digital data. Findings A total of 117 TB01 cards were prepared at six enrolled sites, whereas just 50% of the records (n=59; 59 out of 117 TB01 cards) were digitized. There were 1,239 comparable data fields, out of which 65% (n=803) were correctly matched between paper based and digital records. However, 35% of the data fields (n=436) had anomalies, either in paper-based records or in digital records. 1.9 data quality issues were calculated per digital patient record, whereas it was 2.1 issues per record for paper-based record. Based on the analysis of valid data quality issues, it was found that there were more data quality issues in paper-based records (n=123) than in digital records (n=110). Conclusion There were fewer data quality issues in digital records as compared to the corresponding paper-based records. Greater use of mobile data capture and continued use of the data quality assessment framework can deliver more meaningful information for decision making.


Data quality is a main issue in quality information management. Data quality problems occur anywhere in information systems. These problems are solved by Data Cleaning (DC). DC is a process used to determine inaccurate, incomplete or unreasonable data and then improve the quality through correcting of detected errors and omissions. Various process of DC have been discussed in the previous studies, but there is no standard or formalized the DC process. The Domain Driven Data Mining (DDDM) is one of the KDD methodology often used for this purpose. This paper review and emphasize the important of DC in data preparation. The future works was also being highlight.


2021 ◽  
Author(s):  
Aurore Lafond ◽  
Maurice Ringer ◽  
Florian Le Blay ◽  
Jiaxu Liu ◽  
Ekaterina Millan ◽  
...  

Abstract Abnormal surface pressure is typically the first indicator of a number of problematic events, including kicks, losses, washouts and stuck pipe. These events account for 60–70% of all drilling-related nonproductive time, so their early and accurate detection has the potential to save the industry billions of dollars. Detecting these events today requires an expert user watching multiple curves, which can be costly, and subject to human errors. The solution presented in this paper is aiming at augmenting traditional models with new machine learning techniques, which enable to detect these events automatically and help the monitoring of the drilling well. Today’s real-time monitoring systems employ complex physical models to estimate surface standpipe pressure while drilling. These require many inputs and are difficult to calibrate. Machine learning is an alternative method to predict pump pressure, but this alone needs significant labelled training data, which is often lacking in the drilling world. The new system combines these approaches: a machine learning framework is used to enable automated learning while the physical models work to compensate any gaps in the training data. The system uses only standard surface measurements, is fully automated, and is continuously retrained while drilling to ensure the most accurate pressure prediction. In addition, a stochastic (Bayesian) machine learning technique is used, which enables not only a prediction of the pressure, but also the uncertainty and confidence of this prediction. Last, the new system includes a data quality control workflow. It discards periods of low data quality for the pressure anomaly detection and enables to have a smarter real-time events analysis. The new system has been tested on historical wells using a new test and validation framework. The framework runs the system automatically on large volumes of both historical and simulated data, to enable cross-referencing the results with observations. In this paper, we show the results of the automated test framework as well as the capabilities of the new system in two specific case studies, one on land and another offshore. Moreover, large scale statistics enlighten the reliability and the efficiency of this new detection workflow. The new system builds on the trend in our industry to better capture and utilize digital data for optimizing drilling.


Sign in / Sign up

Export Citation Format

Share Document