Automated Drilling Data Quality Control Using Application of AI Technologies

2021 ◽  
Author(s):  
Francesco Battocchio ◽  
Jaijith Sreekantan ◽  
Arghad Arnaout ◽  
Abed Benaichouche ◽  
Juma Sulaiman Al Shamsi ◽  
...  

Abstract Drilling data quality is notoriously a challenge for any analytics application, due to complexity of the real-time data acquisition system which routinely generates: (i) Time related issues caused by irregular sampling, (ii) Channel related issues in terms of non-uniform names and units, missing or wrong values, and (iii) Depth related issues caused block position resets, and depth compensation (for floating rigs). On the other hand, artificial intelligence drilling applications typically require a consistent stream of high-quality data as an input for their algorithms, as well as for visualization. In this work we present an automated workflow enhanced by data driven techniques that resolves complex quality issues, harmonize sensor drilling data, and report the quality of the dataset to be used for advanced analytics. The approach proposes an automated data quality workflow which formalizes the characteristics, requirements and constraints of sensor data within the context of drilling operations. The workflow leverages machine learning algorithms, statistics, signal processing and rule-based engines for detection of data quality issues including error values, outliers, bias, drifts, noise, and missing values. Further, once data quality issues are classified, they are scored and treated on a context specific basis in order to recover the maximum volume of data while avoiding information loss. This results into a data quality and preparation engine that organizes drilling data for further advanced analytics, and reports the quality of the dataset through key performance indicators. This novel data processing workflow allowed to recover more than 90% of a drilling dataset made of 18 offshore wells, that otherwise could not be used for analytics. This was achieved by resolving specific issues including, resampling timeseries with gaps and different sampling rates, smart imputation of wrong/missing data while preserving consistency of dataset across all channels. Additional improvement would include recovering data values that felt outside a meaningful range because of sensor drifting or depth resets. The present work automates the end-to-end workflow for data quality control of drilling sensor data leveraging advanced Artificial Intelligence (AI) algorithms. It allows to detect and classify patterns of wrong/missing data, and to recover them through a context driven approach that prevents information loss. As a result, the maximum amount of data is recovered for artificial intelligence drilling applications. The workflow also enables optimal time synchronization of different sensors streaming data at different frequencies, within discontinuous time intervals.

2021 ◽  
Author(s):  
S. H. Al Gharbi ◽  
A. A. Al-Majed ◽  
A. Abdulraheem ◽  
S. Patil ◽  
S. M. Elkatatny

Abstract Due to high demand for energy, oil and gas companies started to drill wells in remote areas and unconventional environments. This raised the complexity of drilling operations, which were already challenging and complex. To adapt, drilling companies expanded their use of the real-time operation center (RTOC) concept, in which real-time drilling data are transmitted from remote sites to companies’ headquarters. In RTOC, groups of subject matter experts monitor the drilling live and provide real-time advice to improve operations. With the increase of drilling operations, processing the volume of generated data is beyond a human's capability, limiting the RTOC impact on certain components of drilling operations. To overcome this limitation, artificial intelligence and machine learning (AI/ML) technologies were introduced to monitor and analyze the real-time drilling data, discover hidden patterns, and provide fast decision-support responses. AI/ML technologies are data-driven technologies, and their quality relies on the quality of the input data: if the quality of the input data is good, the generated output will be good; if not, the generated output will be bad. Unfortunately, due to the harsh environments of drilling sites and the transmission setups, not all of the drilling data is good, which negatively affects the AI/ML results. The objective of this paper is to utilize AI/ML technologies to improve the quality of real-time drilling data. The paper fed a large real-time drilling dataset, consisting of over 150,000 raw data points, into Artificial Neural Network (ANN), Support Vector Machine (SVM) and Decision Tree (DT) models. The models were trained on the valid and not-valid datapoints. The confusion matrix was used to evaluate the different AI/ML models including different internal architectures. Despite the slowness of ANN, it achieved the best result with an accuracy of 78%, compared to 73% and 41% for DT and SVM, respectively. The paper concludes by presenting a process for using AI technology to improve real-time drilling data quality. To the author's knowledge based on literature in the public domain, this paper is one of the first to compare the use of multiple AI/ML techniques for quality improvement of real-time drilling data. The paper provides a guide for improving the quality of real-time drilling data.


2017 ◽  
Author(s):  
Stefan Hunziker ◽  
Stefan Brönnimann ◽  
Juan Marcos Calle ◽  
Isabel Moreno ◽  
Marcos Andrade ◽  
...  

Abstract. Systematic data quality issues may occur at various stages of the data generation process. They may affect large fractions of observational datasets and remain largely undetected with standard data quality control. This study investigates the effects of such undetected data quality issues on the results of climatological analyses. For this purpose, we quality controlled daily observations of manned weather stations from the Central Andean area with a standard and an enhanced approach. The climate variables analysed are minimum and maximum temperature, and precipitation. About 40 % of the observations are inappropriate for the calculation of monthly temperature means and precipitation sums due to data quality issues. These quality problems undetected with the standard quality control method strongly affect climatological analyses, since they reduce the correlation coefficients of station pairs, deteriorate the performance of data homogenization methods, increase the spread of individual station trends, and significantly bias regional temperature trends. Our findings indicate that undetected data quality issues are included in important and frequently used observational datasets, and hence may affect a high number of climatological studies. It is of utmost importance to apply comprehensive and adequate data quality control approaches on manned weather station records in order to avoid biased results and large uncertainties.


2012 ◽  
Vol 9 (12) ◽  
pp. 18175-18210
Author(s):  
J. R. Taylor ◽  
H. L. Loescher

Abstract. National and international networks and observatories of terrestrial-based sensors are emerging rapidly. As such, there is demand for a standardized approach to data quality control, as well as interoperability of data among sensor networks. The National Ecological Observatory Network (NEON) has begun constructing their first terrestrial observing sites with 60 locations expected to be distributed across the US by 2017. This will result in over 14 000 automated sensors recording more than > 100 Tb of data per year. These data are then used to create other datasets and subsequent "higher-level" data products. In anticipation of this challenge, an overall data quality assurance plan has been developed and the first suite of data quality control measures defined. This data-driven approach focuses on automated methods for defining a suite of plausibility test parameter thresholds. Specifically, these plausibility tests scrutinize data range, persistence, and stochasticity on each measurement type by employing a suite of binary checks. The statistical basis for each of these tests is developed and the methods for calculating test parameter thresholds are explored here. While these tests have been used elsewhere, we apply them in a novel approach by calculating their relevant test parameter thresholds. Finally, implementing automated quality control is demonstrated with preliminary data from a NEON prototype site.


2012 ◽  
Vol 2012 ◽  
pp. 1-8 ◽  
Author(s):  
Janet E. Squires ◽  
Alison M. Hutchinson ◽  
Anne-Marie Bostrom ◽  
Kelly Deis ◽  
Peter G. Norton ◽  
...  

Researchers strive to optimize data quality in order to ensure that study findings are valid and reliable. In this paper, we describe a data quality control program designed to maximize quality of survey data collected using computer-assisted personal interviews. The quality control program comprised three phases: (1) software development, (2) an interviewer quality control protocol, and (3) a data cleaning and processing protocol. To illustrate the value of the program, we assess its use in the Translating Research in Elder Care Study. We utilize data collected annually for two years from computer-assisted personal interviews with 3004 healthcare aides. Data quality was assessed using both survey and process data. Missing data and data errors were minimal. Mean and median values and standard deviations were within acceptable limits. Process data indicated that in only 3.4% and 4.0% of cases was the interviewer unable to conduct interviews in accordance with the details of the program. Interviewers’ perceptions of interview quality also significantly improved between Years 1 and 2. While this data quality control program was demanding in terms of time and resources, we found that the benefits clearly outweighed the effort required to achieve high-quality data.


2017 ◽  
Vol 46 (2) ◽  
pp. 69-77 ◽  
Author(s):  
Beth A Reid ◽  
Lee Ridoutt ◽  
Paul O’Connor ◽  
Deirdre Murphy

Introduction: This article presents some of the results of a year-long project in the Republic of Ireland to review the quality of the hospital inpatient enquiry data for its use in activity-based funding (ABF). This is the first of two papers regarding best practice in the management of clinical coding services. Methods: Four methods were used to address this aspect of the project, namely a literature review, a workshop, an assessment of the coding services in 12 Irish hospitals by structured interviews of the clinical coding managers, and a medical record audit of the clinical codes in 10 hospitals. Results: The results included here are those relating to the quality of the medical records, coding work allocation and supervision processes, data quality control measures, communication with clinicians, and the visibility of clinical coders, their managers, and the coding service. Conclusion: The project found instances of best practice in the study hospitals but also found several areas needing improvement. These included improving the structure and content of the medical record, clinician engagement with the clinical coding teams and the ABF process, and the use of data quality control measures.


Author(s):  
C. X. Chen ◽  
H. Zhang ◽  
K. Jiang ◽  
H. T. Zhao ◽  
W. Xie ◽  
...  

Abstract. In recent years, China has promulgated the "Civil Code of the People's Republic of China", "Implementation Rules of the Provisional Regulations on Real Estate Registration" and other laws and regulations, which have protected citizens' rights and obligations in real estate from the legal system. It shows that the quality of real estate registration data is very important. At present, there is no set of standards for evaluating the quality of real estate registration data. This article sorts out the production process of real estate registration data and focuses on the four stages of production: digitization results, field surveys and surveying and mapping results, group building results, integration and association. As a result, the main points of real estate registration data quality control were put forward, and a quality evaluation model was developed. Taking Beijing's real estate registration historical archives integrated data quality inspection as an application case, it shows that the quality evaluation model has been successfully applied to actual projects, ensuring the quality of Beijing real estate registration data. It also provides a reference for the next step in China's quality control of the unified registration of natural resources confirmation.


2018 ◽  
Vol 13 (2) ◽  
pp. 131-146
Author(s):  
Mirwan Rofiq Ginanjar ◽  
Sri Mulat Yuningsih

Planning and management of water resources are dependent on the quality of hydrological data. Hydrological data plays an important role in hydrological analysis. The availability of good and qualified hydrological data is one of the determinants of the results of hydrological analysis. However, the facts indicate that many of the available data do not fit their ideal state. To solve this problem, a hydrological data quality control model should be established in order to improve the quality of national hydrological data. The scope includes quality control of rainfall and discharge data. Analysis of the quality control of rainfall data was conducted on 58 rainfall stations spread on the island of Java. The analysis shows that 41 stations are good categorized, 14 stations are in moderate category and 3 stations are badly categorized. Based on these results, a light improvement scenario was performed, good category Station increased to 46 stations, moderate category decreased to 11 stations and bad category reduced to 1 Stations. Quality control of discharge data analysis was conducted on 14 discharge stations spread on Java Island. Analyzes were performed for QC1, QC2 and QC3 then got final QC value. The results on the final QC show no stations for good category, 2 stations for moderate categories and 12 stations for bad category. Based on the results of the analysis, a light improvement scenario was performed with the result of bad category increased to good category 5 stations, bad category increased to moderate 7 stations, and moderate category 1 stations.


2013 ◽  
Vol 10 (7) ◽  
pp. 4957-4971 ◽  
Author(s):  
J. R. Taylor ◽  
H. L. Loescher

Abstract. National and international networks and observatories of terrestrial-based sensors are emerging rapidly. As such, there is demand for a standardized approach to data quality control, as well as interoperability of data among sensor networks. The National Ecological Observatory Network (NEON) has begun constructing their first terrestrial observing sites, with 60 locations expected to be distributed across the US by 2017. This will result in over 14 000 automated sensors recording more than > 100 Tb of data per year. These data are then used to create other datasets and subsequent "higher-level" data products. In anticipation of this challenge, an overall data quality assurance plan has been developed and the first suite of data quality control measures defined. This data-driven approach focuses on automated methods for defining a suite of plausibility test parameter thresholds. Specifically, these plausibility tests scrutinize the data range and variance of each measurement type by employing a suite of binary checks. The statistical basis for each of these tests is developed, and the methods for calculating test parameter thresholds are explored here. While these tests have been used elsewhere, we apply them in a novel approach by calculating their relevant test parameter thresholds. Finally, implementing automated quality control is demonstrated with preliminary data from a NEON prototype site.


Sign in / Sign up

Export Citation Format

Share Document