Extending statistical data quality improvement with explicit domain models

Author(s):  
Nina Solomakhina ◽  
Thomas Hubauer ◽  
Steffen Lamparter ◽  
Mikhail Roshchin ◽  
Stephan Grimm
Author(s):  
Alla Andrianova ◽  
Maxim Simonov ◽  
Dmitry Perets ◽  
Andrey Margarit ◽  
Darya Serebryakova ◽  
...  

Author(s):  
Suranga C. H. Geekiyanage ◽  
Dan Sui ◽  
Bernt S. Aadnoy

Drilling industry operations heavily depend on digital information. Data analysis is a process of acquiring, transforming, interpreting, modelling, displaying and storing data with an aim of extracting useful information, so that the decision-making, actions executing, events detecting and incident managing of a system can be handled in an efficient and certain manner. This paper aims to provide an approach to understand, cleanse, improve and interpret the post-well or realtime data to preserve or enhance data features, like accuracy, consistency, reliability and validity. Data quality management is a process with three major phases. Phase I is an evaluation of pre-data quality to identify data issues such as missing or incomplete data, non-standard or invalid data and redundant data etc. Phase II is an implementation of different data quality managing practices such as filtering, data assimilation, and data reconciliation to improve data accuracy and discover useful information. The third and final phase is a post-data quality evaluation, which is conducted to assure data quality and enhance the system performance. In this study, a laboratory-scale drilling rig with a control system capable of drilling is utilized for data acquisition and quality improvement. Safe and efficient performance of such control system heavily relies on quality of the data obtained while drilling and its sufficient availability. Pump pressure, top-drive rotational speed, weight on bit, drill string torque and bit depth are available measurements. The data analysis is challenged by issues such as corruption of data due to noises, time delays, missing or incomplete data and external disturbances. In order to solve such issues, different data quality improvement practices are applied for the testing. These techniques help the intelligent system to achieve better decision-making and quicker fault detection. The study from the laboratory-scale drilling rig clearly demonstrates the need for a proper data quality management process and clear understanding of signal processing methods to carry out an intelligent digitalization in oil and gas industry.


2021 ◽  
Author(s):  
Qing Xie ◽  
Chengong Han ◽  
Victor Jin ◽  
Shili Lin

Single cell Hi-C techniques enable one to study cell to cell variability in chromatin interactions. However, single cell Hi-C (scHi-C) data suffer severely from sparsity, that is, the existence of excess zeros due to insufficient sequencing depth. Complicate things further is the fact that not all zeros are created equal, as some are due to loci truly not interacting because of the underlying biological mechanism (structural zeros), whereas others are indeed due to insufficient sequencing depth (sampling zeros), especially for loci that interact infrequently. Differentiating between structural zeros and sampling zeros is important since correct inference would improve downstream analyses such as clustering and discovery of subtypes. Nevertheless, distinguishing between these two types of zeros has received little attention in the single cell Hi-C literature, where the issue of sparsity has been addressed mainly as a data quality improvement problem. To fill this gap, in this paper, we propose HiCImpute, a Bayesian hierarchy model that goes beyond data quality improvement by also identifying observed zeros that are in fact structural zeros. HiCImpute takes spatial dependencies of scHi-C 2D data structure into account while also borrowing information from similar single cells and bulk data, when such are available. Through an extensive set of analyses of synthetic and real data, we demonstrate the ability of HiCImpute for identifying structural zeros with high sensitivity, and for accurate imputation of dropout values in sampling zeros. Downstream analyses using data improved from HiCImpute yielded much more accurate clustering of cell types compared to using observed data or data improved by several comparison methods. Most significantly, HiCImpute-improved data has led to the identification of subtypes within each of the excitatory neuronal cells of L4 and L5 in the prefrontal cortex.


Tunas Agraria ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 168-174
Author(s):  
Maslusatun Mawadah

The South Jakarta Administrative City Land Office is one of the cities targeted to be a city with complete land administration in 2020. The current condition of land parcel data demands an update, namely improving the quality of data from KW1 to KW6 towards KW1 valid. The purpose of this study is to determine the condition of land data quality in South Jakarta, the implementation of data quality improvement, as well as problems and solutions in implementing data quality improvement. The research method used is qualitative with a descriptive approach. The results showed that the condition of the data quality after the implementation of the improvement, namely KW1 increased from 86.45% to 87.01%. The roles of man, material, machine, and method have been fulfilled and the implementation of data quality improvement is not in accordance with the 2019 Complete City Guidelines in terms of territorial boundary inventory, and there are still obstacles in the implementation of improving the quality of land parcel data, namely the absence of buku tanah, surat ukur, and gambar ukur at the land office, the existence of regional division, the boundaries of the sub district are not yet certain, and the existence of land parcels that have been separated from mapping without being noticed by the office administrator.


Sign in / Sign up

Export Citation Format

Share Document