How to Land Modern Data Science in Petroleum Engineering

2021 ◽  
Author(s):  
Hongbao Zhang ◽  
Yijin Zeng ◽  
Lulu Liao ◽  
Ruiyao Wang ◽  
Xutian Hou ◽  
...  

Abstract Digitalization and intelligence are attracting increasing attention in petroleum engineering. Amounts of published research indicates modern data science has been applied in almost every corner of petroleum engineering where data generates, however, mature products are few or the performance are not up to peoples’ expectations. Despite the great success in other industries (internet, transportation, and finance, etc.), the "amazing" data science algorithms seem to be challenged when "landing" in petroleum engineering. It is time to calmly analyze current situations and discuss the methodology to apply modern data science in petroleum engineering, for safety ensuring, efficiency improvement and cost saving. Based on the experiences of several data products in petroleum engineering and wide investigation of literatures, the methodology is summarized by answering some important questions: what is the difference between petroleum engineering and other industries and what are the greatest challenges for algorithms "landing"? how could we build a data product development team? why the machine learning models didn't work well in real world, which are derived by typical procedures in textbooks? are current artificial intelligent algorithms perfect and is there any limit? how could we deal with the relationship between prior knowledge and data-driven methods? what is the key point to keep data product competitive? Several specific scenarios are introduced as examples, such as ROP modelling, drilling parameters optimization, text mining of drilling reports and well production prediction, etc. where deep learning, traditional machine learning, incremental learning and natural language processing methods, etc. are used. Besides detailed discussions in the paper, conclusions are summarized as: 1) the strengths and weakness of current artificial intelligence should be viewed objectively, practical suggestions to make up the weakness are provided; 2) the combination of prior knowledge (from lab tests or expert experiences) and data-driven methods are always necessary and methods for the combination are summarized; 3) data volume and solution portability are the key points to improve data product competitiveness; 4) suggestions on how to build a multi-disciplinary R&D team and how to plan a product are provided. This paper conducts an objective analysis on challenges for modern data science applying in petroleum engineering and provides a clear methodology and specific suggestions on how to improve the success rate of R&D projects which apply data science to solve problems in petroleum engineering.

Author(s):  
Ekaterina Kochmar ◽  
Dung Do Vu ◽  
Robert Belfer ◽  
Varun Gupta ◽  
Iulian Vlad Serban ◽  
...  

AbstractIntelligent tutoring systems (ITS) have been shown to be highly effective at promoting learning as compared to other computer-based instructional approaches. However, many ITS rely heavily on expert design and hand-crafted rules. This makes them difficult to build and transfer across domains and limits their potential efficacy. In this paper, we investigate how feedback in a large-scale ITS can be automatically generated in a data-driven way, and more specifically how personalization of feedback can lead to improvements in student performance outcomes. First, in this paper we propose a machine learning approach to generate personalized feedback in an automated way, which takes individual needs of students into account, while alleviating the need of expert intervention and design of hand-crafted rules. We leverage state-of-the-art machine learning and natural language processing techniques to provide students with personalized feedback using hints and Wikipedia-based explanations. Second, we demonstrate that personalized feedback leads to improved success rates at solving exercises in practice: our personalized feedback model is used in , a large-scale dialogue-based ITS with around 20,000 students launched in 2019. We present the results of experiments with students and show that the automated, data-driven, personalized feedback leads to a significant overall improvement of 22.95% in student performance outcomes and substantial improvements in the subjective evaluation of the feedback.


2021 ◽  
Author(s):  
Hongbao Zhang ◽  
Baoping Lu ◽  
Lulu Liao ◽  
Hongzhi Bao ◽  
Zhifa Wang ◽  
...  

Abstract Theoretically, rate of penetration (ROP) model is the basic to drilling parameters design, ROP improvement tools selection and drill time & cost estimation. Currently, ROP modelling is mainly conducted by two approaches: equation-based approach and machine learning approach, and machine learning performs better because of the capacity in high-dimensional and non-linear process modelling. However, in deep or deviated wells, the ROP prediction accuracy of machine learning is always unsatisfied mainly because the energy loss along the wellbore and drill string is non-negligible and it's difficult to consider the effect of wellbore geometry in machine learning models by pure data-driven methods. Therefore, it's necessary to develop robust ROP modelling method for different scenarios. In the paper, the performance of several equation-based methods and machine learning methods are evaluated by data from 82 wells, the technical features and applicable scopes of different methods are analysed. A new machine learning based ROP modelling method suitable for different well path types was proposed. Integrated data processing pipeline was designed to dealing with data noises, data missing, and discrete variables. ROP effecting factors were analysed, including mechanical parameters, hydraulic parameters, bit characteristics, rock properties, wellbore geometry, etc. Several new features were created by classic drilling theories, such as downhole weight on bit (DWOB), hydraulic impact force, formation heterogeneity index, etc. to improve the efficiency of learning from data. A random forest model was trained by cross validation and hyperparameters optimization methods. Field test results shows that the model could predict the ROP in different hole sections (vertical, deviated and horizontal) and different drilling modes (sliding and rotating drilling) and the average accuracy meets the requirement of well planning. A novel data processing and feature engineering workflow was designed according the characteristics of ROP modelling in different well path types. An integrated data-driven ROP modelling and optimization software was developed, including functions of mechanical specific energy analysis, bit wear analysis and predict, 2D & 3D ROP sensitivity analysis, offset wells benchmark, ROP prediction, drilling parameters constraints analysis, cost per meter prediction, etc. and providing quantitative evidences for drilling parameters optimization, drilling tools selection and well time estimation.


2019 ◽  
Vol 3 (Supplement_1) ◽  
pp. S480-S480
Author(s):  
Robert Lucero ◽  
Ragnhildur Bjarnadottir

Abstract Two hundred and fifty thousand older adults die annually in United States hospitals because of iatrogenic conditions (ICs). Clinicians, aging experts, patient advocates and federal policy makers agree that there is a need to enhance the safety of hospitalized older adults through improved identification and prevention of ICs. To this end, we are building a research program with the goal of enhancing the safety of hospitalized older adults by reducing ICs through an effective learning health system. Leveraging unique electronic data and healthcare system and human resources at the University of Florida, we are applying a state-of-the-art practice-based data science approach to identify risk factors of ICs (e.g., falls) from structured (i.e., nursing, clinical, administrative) and unstructured or text (i.e., registered nurse’s progress notes) data. Our interdisciplinary academic-clinical partnership includes scientific and clinical experts in patient safety, care quality, health outcomes, nursing and health informatics, natural language processing, data science, aging, standardized terminology, clinical decision support, statistics, machine learning, and hospital operations. Results to date have uncovered previously unknown fall risk factors within nursing (i.e., physical therapy initiation), clinical (i.e., number of fall risk increasing drugs, hemoglobin level), and administrative (i.e., Charlson Comorbidity Index, nurse skill mix, and registered nurse staffing ratio) structured data as well as patient cognitive, environmental, workflow, and communication factors in text data. The application of data science methods (i.e., machine learning and text-mining) and findings from this research will be used to develop text-mining pipelines to support sustained data-driven interdisciplinary aging studies to reduce ICs.


10.2196/16607 ◽  
2019 ◽  
Vol 21 (11) ◽  
pp. e16607 ◽  
Author(s):  
Christian Lovis

Data-driven science and its corollaries in machine learning and the wider field of artificial intelligence have the potential to drive important changes in medicine. However, medicine is not a science like any other: It is deeply and tightly bound with a large and wide network of legal, ethical, regulatory, economical, and societal dependencies. As a consequence, the scientific and technological progresses in handling information and its further processing and cross-linking for decision support and predictive systems must be accompanied by parallel changes in the global environment, with numerous stakeholders, including citizen and society. What can be seen at the first glance as a barrier and a mechanism slowing down the progression of data science must, however, be considered an important asset. Only global adoption can transform the potential of big data and artificial intelligence into an effective breakthroughs in handling health and medicine. This requires science and society, scientists and citizens, to progress together.


JAMIA Open ◽  
2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Fuchiang R Tsui ◽  
Lingyun Shi ◽  
Victor Ruiz ◽  
Neal D Ryan ◽  
Candice Biernesser ◽  
...  

Abstract Objective Limited research exists in predicting first-time suicide attempts that account for two-thirds of suicide decedents. We aimed to predict first-time suicide attempts using a large data-driven approach that applies natural language processing (NLP) and machine learning (ML) to unstructured (narrative) clinical notes and structured electronic health record (EHR) data. Methods This case-control study included patients aged 10–75 years who were seen between 2007 and 2016 from emergency departments and inpatient units. Cases were first-time suicide attempts from coded diagnosis; controls were randomly selected without suicide attempts regardless of demographics, following a ratio of nine controls per case. Four data-driven ML models were evaluated using 2-year historical EHR data prior to suicide attempt or control index visits, with prediction windows from 7 to 730 days. Patients without any historical notes were excluded. Model evaluation on accuracy and robustness was performed on a blind dataset (30% cohort). Results The study cohort included 45 238 patients (5099 cases, 40 139 controls) comprising 54 651 variables from 5.7 million structured records and 798 665 notes. Using both unstructured and structured data resulted in significantly greater accuracy compared to structured data alone (area-under-the-curve [AUC]: 0.932 vs. 0.901 P < .001). The best-predicting model utilized 1726 variables with AUC = 0.932 (95% CI, 0.922–0.941). The model was robust across multiple prediction windows and subgroups by demographics, points of historical most recent clinical contact, and depression diagnosis history. Conclusions Our large data-driven approach using both structured and unstructured EHR data demonstrated accurate and robust first-time suicide attempt prediction, and has the potential to be deployed across various populations and clinical settings.


2020 ◽  
Vol 73 (4) ◽  
pp. 285-295 ◽  
Author(s):  
Dongwoo Chae

Machine learning (ML) is revolutionizing anesthesiology research. Unlike classical research methods that are largely inference-based, ML is geared more towards making accurate predictions. ML is a field of artificial intelligence concerned with developing algorithms and models to perform prediction tasks in the absence of explicit instructions. Most ML applications, despite being highly variable in the topics that they deal with, generally follow a common workflow. For classification tasks, a researcher typically tests various ML models and compares the predictive performance with the reference logistic regression model. The main advantage of ML lies in its ability to deal with many features with complex interactions and its specific focus on maximizing predictive performance. However, emphasis on data-driven prediction can sometimes neglect mechanistic understanding. This article mainly focuses on the application of supervised ML to electronic health record (EHR) data. The main limitation of EHR-based studies is in the difficulty of establishing causal relationships. However, the associated low cost and rich information content provide great potential to uncover hitherto unknown correlations. In this review, the basic concepts of ML are introduced along with important terms that any ML researcher should know. Practical tips regarding the choice of software and computing devices are also provided. Towards the end, several examples of successful ML applications in anesthesiology are discussed. The goal of this article is to provide a basic roadmap to novice ML researchers working in the field of anesthesiology.


2019 ◽  
Author(s):  
Christian Lovis

UNSTRUCTURED Data-driven science and its corollaries in machine learning and the wider field of artificial intelligence have the potential to drive important changes in medicine. However, medicine is not a science like any other: It is deeply and tightly bound, with a large and wide network of legal, ethical, regulatory, economical, and societal dependencies. As a consequence, the scientific and technological progresses in handling information and its further processing and cross-linking for decision support and predictive systems must be accompanied by parallel changes in the global environment, with numerous stakeholders, including citizen and society. What can be seen at the first glance as a barrier and mechanism slowing down the progression of data science must, however, be considered an important asset. Only global adoption can transform the potential of big data and artificial intelligence into an effective breakthroughs in handling health and medicine. This requires science and society, scientists and citizens, to progress together.


2020 ◽  
Vol 38 (02) ◽  
Author(s):  
TẠ DUY CÔNG CHIẾN

Ontologies apply to many applications in recent years, such as information retrieval, information extraction, and text document classification. The purpose of domain-specific ontology is to enrich the identification of concept and the interrelationships. In our research, we use ontology to specify a set of generic subjects (concept) that characterizes the domain as well as their definitions and interrelationships. This paper introduces a system for labeling subjects of a text documents based on the differential layers of domain specific ontology, which contains the information and the vocabularies related to the computer domain. A document can contain several subjects such as data science, database, and machine learning. The subjects in text document classification are determined based on the differential layers of the domain specific ontology. We combine the methodologies of Natural Language Processing with domain ontology to determine the subjects in text document. In order to increase performance, we use graph database to store and access ontology. Besides, the paper focuses on evaluating our proposed algorithm with some other methods. Experimental results show that our proposed algorithm yields performance significantly


Author(s):  
Steven F. Lehrer ◽  
Tian Xie ◽  
Guanxi Yi

AbstractThis chapter first provides an illustration of the benefits of using machine learning for forecasting relative to traditional econometric strategies. We consider the short-term volatility of the Bitcoin market by realized volatility observations. Our analysis highlights the importance of accounting for nonlinearities to explain the gains of machine learning algorithms and examines the robustness of our findings to the selection of hyperparameters. This provides an illustration of how different machine learning estimators improve the development of forecast models by relaxing the functional form assumptions that are made explicit when writing up an econometric model. Our second contribution is to illustrate how deep learning can be used to measure market-level sentiment from a 10% random sample of Twitter users. This sentiment variable significantly improves forecast accuracy for every econometric estimator and machine algorithm considered in our forecasting application. This provides an illustration of the benefits of new tools from the natural language processing literature at creating variables that can improve the accuracy of forecasting models.


Sign in / Sign up

Export Citation Format

Share Document