Missing Observations and Data Quality Improvement

Author(s):  
Carlos N. Bouza-Herrera
Author(s):  
Alla Andrianova ◽  
Maxim Simonov ◽  
Dmitry Perets ◽  
Andrey Margarit ◽  
Darya Serebryakova ◽  
...  

Author(s):  
Suranga C. H. Geekiyanage ◽  
Dan Sui ◽  
Bernt S. Aadnoy

Drilling industry operations heavily depend on digital information. Data analysis is a process of acquiring, transforming, interpreting, modelling, displaying and storing data with an aim of extracting useful information, so that the decision-making, actions executing, events detecting and incident managing of a system can be handled in an efficient and certain manner. This paper aims to provide an approach to understand, cleanse, improve and interpret the post-well or realtime data to preserve or enhance data features, like accuracy, consistency, reliability and validity. Data quality management is a process with three major phases. Phase I is an evaluation of pre-data quality to identify data issues such as missing or incomplete data, non-standard or invalid data and redundant data etc. Phase II is an implementation of different data quality managing practices such as filtering, data assimilation, and data reconciliation to improve data accuracy and discover useful information. The third and final phase is a post-data quality evaluation, which is conducted to assure data quality and enhance the system performance. In this study, a laboratory-scale drilling rig with a control system capable of drilling is utilized for data acquisition and quality improvement. Safe and efficient performance of such control system heavily relies on quality of the data obtained while drilling and its sufficient availability. Pump pressure, top-drive rotational speed, weight on bit, drill string torque and bit depth are available measurements. The data analysis is challenged by issues such as corruption of data due to noises, time delays, missing or incomplete data and external disturbances. In order to solve such issues, different data quality improvement practices are applied for the testing. These techniques help the intelligent system to achieve better decision-making and quicker fault detection. The study from the laboratory-scale drilling rig clearly demonstrates the need for a proper data quality management process and clear understanding of signal processing methods to carry out an intelligent digitalization in oil and gas industry.


2021 ◽  
Author(s):  
Qing Xie ◽  
Chengong Han ◽  
Victor Jin ◽  
Shili Lin

Single cell Hi-C techniques enable one to study cell to cell variability in chromatin interactions. However, single cell Hi-C (scHi-C) data suffer severely from sparsity, that is, the existence of excess zeros due to insufficient sequencing depth. Complicate things further is the fact that not all zeros are created equal, as some are due to loci truly not interacting because of the underlying biological mechanism (structural zeros), whereas others are indeed due to insufficient sequencing depth (sampling zeros), especially for loci that interact infrequently. Differentiating between structural zeros and sampling zeros is important since correct inference would improve downstream analyses such as clustering and discovery of subtypes. Nevertheless, distinguishing between these two types of zeros has received little attention in the single cell Hi-C literature, where the issue of sparsity has been addressed mainly as a data quality improvement problem. To fill this gap, in this paper, we propose HiCImpute, a Bayesian hierarchy model that goes beyond data quality improvement by also identifying observed zeros that are in fact structural zeros. HiCImpute takes spatial dependencies of scHi-C 2D data structure into account while also borrowing information from similar single cells and bulk data, when such are available. Through an extensive set of analyses of synthetic and real data, we demonstrate the ability of HiCImpute for identifying structural zeros with high sensitivity, and for accurate imputation of dropout values in sampling zeros. Downstream analyses using data improved from HiCImpute yielded much more accurate clustering of cell types compared to using observed data or data improved by several comparison methods. Most significantly, HiCImpute-improved data has led to the identification of subtypes within each of the excitatory neuronal cells of L4 and L5 in the prefrontal cortex.


Tunas Agraria ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 168-174
Author(s):  
Maslusatun Mawadah

The South Jakarta Administrative City Land Office is one of the cities targeted to be a city with complete land administration in 2020. The current condition of land parcel data demands an update, namely improving the quality of data from KW1 to KW6 towards KW1 valid. The purpose of this study is to determine the condition of land data quality in South Jakarta, the implementation of data quality improvement, as well as problems and solutions in implementing data quality improvement. The research method used is qualitative with a descriptive approach. The results showed that the condition of the data quality after the implementation of the improvement, namely KW1 increased from 86.45% to 87.01%. The roles of man, material, machine, and method have been fulfilled and the implementation of data quality improvement is not in accordance with the 2019 Complete City Guidelines in terms of territorial boundary inventory, and there are still obstacles in the implementation of improving the quality of land parcel data, namely the absence of buku tanah, surat ukur, and gambar ukur at the land office, the existence of regional division, the boundaries of the sub district are not yet certain, and the existence of land parcels that have been separated from mapping without being noticed by the office administrator.


2019 ◽  
Vol 8 (3) ◽  
pp. e000490 ◽  
Author(s):  
Aidan Christopher Tan ◽  
Elizabeth Armstrong ◽  
Jacqueline Close ◽  
Ian Andrew Harris

ObjectivesThe value of a clinical quality registry is contingent on the quality of its data. This study aims to pilot methodology for data quality audits of the Australian and New Zealand Hip Fracture Registry, a clinical quality registry of hip fracture clinical care and secondary fracture prevention.MethodsA data quality audit was performed by independently replicating the data collection and entry process for 163 randomly selected patient records from three contributing hospitals, and then comparing the replicated data set to the registry data set. Data agreement, as a proxy indicator of data accuracy, and data completeness were assessed.ResultsAn overall data agreement of 82.3% and overall data completeness of 95.6% were found, reflecting a moderate level of data accuracy and a very high level of data completeness. Half of all data disagreements were caused by information discrepancies, a quarter by missing discrepancies and a quarter by time, date and number discrepancies. Transcription discrepancies only accounted for 1 in every 50 data disagreements. The sources of inaccurate and incomplete data have been identified with the intention of implementing data quality improvement.ConclusionsRegular audits of data abstraction are necessary to improve data quality, assure data validity and reliability and guarantee the integrity and credibility of registry outputs. A generic framework and model for data quality audits of clinical quality registries is proposed, consisting of a three-step data abstraction audit, registry coverage audit and four-step data quality improvement process. Factors to consider for data abstraction audits include: central, remote or local implementation; single-stage or multistage random sampling; absolute, proportional, combination or alternative sample size calculation; data quality indicators; regular or ad hoc frequency; and qualitative assessment.


Author(s):  
Brian Andrew Cattle ◽  
Paul D Baxter ◽  
Thomas J Flemming ◽  
Christopher Peter Gale ◽  
David C Mitchell ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document