Data Driven Solution for Enhancing Workover Intervention Activities

Abstract Recent advancements in the field of natural language processing (NLP) and machine learning has allowed for the potential to ingest decades of field history and heterogeneous production records. This paper proposes an analytics workflow that leverages artificial intelligence to process thousands of historical workover reports (handwritten and electronic), extract important information, learn patterns in production activity, and train machines to quantify workover impact and derive best practices for field operations. Natural language processing libraries were developed to ingest and catalog gigabytes of field data, identify rich sources of workover information, and extract workover and cost information from unstructured reports. A clustering based architecture was developed and trained to categorize documents based on free text describing the activities found in reports. This machine learning model learnt the pattern and context of repeating words and was able to cluster documents with similar content together. This enabled the user to find a category of documents e.g. workover intervention reports instantaneously. Statistical models were built to determine return on investment from workovers and rank them based on production improvement and payout time. Today, 80% of an oilfield expert's time can be spent manually organizing data. When processing decades of historical oilfield production data spread across both structured (production timeseries) and unstructured records (e.g., workover reports), experts often face two major challenges: 1) How to rapidly analyze field data with thousands of historical records. 2) How to use the rich historical information to generate effective insights to take the proper actions to optimize production. In this paper, we analyzed multiple field datasets in a heterogeneous file environment with 20 different file formats (PDF, Excel, and other formats), 2,000+ files, production history spanning 50+ years across, and 2,000+ producing wells. Libraries were developed to extract files from complex folder hierarchies, machine learning architectures assisted in finding the workover reports from the myriad documents. Information from reports was extracted through Python libraries and optical character recognition technology to build master data source with production history, workover and cost information. The rich dataset was then used to analyze episodic workover activity by well and compute key performance indicators (KPIs) to identify well candidates for production enhancement. The building blocks included quantifying production upside and calculating return of investment for various workover classes. O&G companies have vast volumes of unstructured data and use less than 1% of it to uncover meaningful insights about field operations. Our workflow describes a methodology to ingest both structured and unstructured documents, capture knowledge, quantify production upside, understand capital spending, and learn best practices in workover operations through an automated process. This process helps optimize forward operating expense (OPEX) plans with a focus on cost reduction and shortened turnaround time for decision making.

Download Full-text

Machine Intelligence for Integrated Workover Operations

10.2118/204423-ms ◽

2021 ◽

Author(s):

Saniya Karnik ◽

Supriya Gupta ◽

Jason Baihly

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Best Practices ◽

Natural Language ◽

Language Processing ◽

Field Data ◽

Machine Intelligence ◽

Cost Information ◽

Production History ◽

The Rich

Abstract Because of recent advancements in the field of natural language processing (NLP) and machine learning, there is potential to ingest decades of field history and heterogeneous production records. This paper proposes an analytics workflow that leverages artificial intelligence to process thousands of historical workover reports (handwritten and electronic), extract important information, learn patterns in production activity, and train machines to quantify workover impact and derive best practices for field operations. Natural language processing libraries were developed to ingest and catalog gigabytes of field data, identify rich sources of workover information, and extract workover and cost information from unstructured reports. A machine learning (ML) model was developed and trained to predict well intervention categories based on free text describing workovers found in reports. This ML model learnt pattern and context of repeating words pertaining to a workover type (e.g. Artificial Lift, Well Integrity, etc.) and to classify reports accordingly. Statistical models were built to determine return on investment from workovers and rank them based on production improvement and payout time. Today, 80% of an oilfield expert's time can be spent manually organizing data. When processing decades of historical oilfield production data spread across both structured (production timeseries) and unstructured records (e.g., workover reports), experts often face two major challenges: 1) How to rapidly analyze field data with thousands of historical records. 2) How to use the rich historical information to generate effective insights to optimize production. In this paper, we analyzed multiple field datasets in a heterogeneous file environment with 20 different file formats (PDF, Excel, and other formats), 2,000+ files and production history spanning 50+ years across and 2000+ producing wells. Libraries were developed to extract workover files from complex folder hierarchies through an intelligent automated search. Information from reports was extracted through Python libraries and optical character recognition technology to build master data source with production history, workover, and cost information. A neural network model was trained to predict workover class for each report with >85% accuracy. The rich dataset was then used to analyze episodic workover activity by well and compute key performance indicators (KPIs) to identify well candidates for production enhancement. The building blocks included quantifying production upside and calculating return of investment for various workover classes. O&G companies have vast volumes of unstructured data and use less than 1% of it to uncover meaningful insights about field operations. Our workflow describes methodology to ingest both structured and unstructured documents, capture knowledge, quantify production upside, understand capital spending, and learn best practices in workover operations through an automated process. This process helps optimize forward operating expense (OPEX) plan with focus on cost reduction and shortens turnaround time for decision making.

Download Full-text

Applying natural language processing and machine learning techniques to patient experience feedback: a systematic review

BMJ Health & Care Informatics ◽

10.1136/bmjhci-2020-100262 ◽

2021 ◽

Vol 28 (1) ◽

pp. e100262

Author(s):

Mustafa Khanbhai ◽

Patrick Anyadi ◽

Joshua Symons ◽

Kelsey Flott ◽

Ara Darzi ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Social Media ◽

Natural Language Processing ◽

Natural Language ◽

Patient Experience ◽

Language Processing ◽

Performance Metrics ◽

Free Text ◽

Patient Feedback

ObjectivesUnstructured free-text patient feedback contains rich information, and analysing these data manually would require a lot of personnel resources which are not available in most healthcare organisations.To undertake a systematic review of the literature on the use of natural language processing (NLP) and machine learning (ML) to process and analyse free-text patient experience data.MethodsDatabases were systematically searched to identify articles published between January 2000 and December 2019 examining NLP to analyse free-text patient feedback. Due to the heterogeneous nature of the studies, a narrative synthesis was deemed most appropriate. Data related to the study purpose, corpus, methodology, performance metrics and indicators of quality were recorded.ResultsNineteen articles were included. The majority (80%) of studies applied language analysis techniques on patient feedback from social media sites (unsolicited) followed by structured surveys (solicited). Supervised learning was frequently used (n=9), followed by unsupervised (n=6) and semisupervised (n=3). Comments extracted from social media were analysed using an unsupervised approach, and free-text comments held within structured surveys were analysed using a supervised approach. Reported performance metrics included the precision, recall and F-measure, with support vector machine and Naïve Bayes being the best performing ML classifiers.ConclusionNLP and ML have emerged as an important tool for processing unstructured free text. Both supervised and unsupervised approaches have their role depending on the data source. With the advancement of data analysis tools, these techniques may be useful to healthcare organisations to generate insight from the volumes of unstructured free-text data.

Download Full-text

Natural language processing and recurrent network models for identifying genomic mutation-associated cancer treatment change from patient progress notes

JAMIA Open ◽

10.1093/jamiaopen/ooy061 ◽

2019 ◽

Vol 2 (1) ◽

pp. 139-149 ◽

Cited By ~ 9

Author(s):

Meijian Guan ◽

Samuel Cho ◽

Robin Petro ◽

Wei Zhang ◽

Boris Pasche ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Cancer Patients ◽

Language Processing ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Free Text ◽

Treatment Change ◽

Progress Notes

Abstract Objectives Natural language processing (NLP) and machine learning approaches were used to build classifiers to identify genomic-related treatment changes in the free-text visit progress notes of cancer patients. Methods We obtained 5889 deidentified progress reports (2439 words on average) for 755 cancer patients who have undergone a clinical next generation sequencing (NGS) testing in Wake Forest Baptist Comprehensive Cancer Center for our data analyses. An NLP system was implemented to process the free-text data and extract NGS-related information. Three types of recurrent neural network (RNN) namely, gated recurrent unit, long short-term memory (LSTM), and bidirectional LSTM (LSTM_Bi) were applied to classify documents to the treatment-change and no-treatment-change groups. Further, we compared the performances of RNNs to 5 machine learning algorithms including Naive Bayes, K-nearest Neighbor, Support Vector Machine for classification, Random forest, and Logistic Regression. Results Our results suggested that, overall, RNNs outperformed traditional machine learning algorithms, and LSTM_Bi showed the best performance among the RNNs in terms of accuracy, precision, recall, and F1 score. In addition, pretrained word embedding can improve the accuracy of LSTM by 3.4% and reduce the training time by more than 60%. Discussion and Conclusion NLP and RNN-based text mining solutions have demonstrated advantages in information retrieval and document classification tasks for unstructured clinical progress notes.

Download Full-text

Predicting mortality in both diabetes and open-source clinical datasets from free text entries using machine learning (natural language processing)

Endocrine Abstracts ◽

10.1530/endoabs.65.p219 ◽

2019 ◽

Author(s):

Christopher Sainsbury ◽

Andrew Conkie ◽

Mark Buchner ◽

Ann Wales ◽

Gregory Jones

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Open Source ◽

Language Processing ◽

Free Text

Download Full-text

Prediction of Stroke Outcome Using Natural Language Processing-Based Machine Learning of Radiology Report of Brain MRI

Journal of Personalized Medicine ◽

10.3390/jpm10040286 ◽

2020 ◽

Vol 10 (4) ◽

pp. 286

Author(s):

Tak Sung Heo ◽

Yu Seop Kim ◽

Jeong Myeong Choi ◽

Yeong Seok Jeong ◽

Soo Young Seo ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Brain Mri ◽

Free Text ◽

Word Level ◽

Poor Outcomes ◽

Document Level

Brain magnetic resonance imaging (MRI) is useful for predicting the outcome of patients with acute ischemic stroke (AIS). Although deep learning (DL) using brain MRI with certain image biomarkers has shown satisfactory results in predicting poor outcomes, no study has assessed the usefulness of natural language processing (NLP)-based machine learning (ML) algorithms using brain MRI free-text reports of AIS patients. Therefore, we aimed to assess whether NLP-based ML algorithms using brain MRI text reports could predict poor outcomes in AIS patients. This study included only English text reports of brain MRIs examined during admission of AIS patients. Poor outcome was defined as a modified Rankin Scale score of 3–6, and the data were captured by trained nurses and physicians. We only included MRI text report of the first MRI scan during the admission. The text dataset was randomly divided into a training and test dataset with a 7:3 ratio. Text was vectorized to word, sentence, and document levels. In the word level approach, which did not consider the sequence of words, and the “bag-of-words” model was used to reflect the number of repetitions of text token. The “sent2vec” method was used in the sensation-level approach considering the sequence of words, and the word embedding was used in the document level approach. In addition to conventional ML algorithms, DL algorithms such as the convolutional neural network (CNN), long short-term memory, and multilayer perceptron were used to predict poor outcomes using 5-fold cross-validation and grid search techniques. The performance of each ML classifier was compared with the area under the receiver operating characteristic (AUROC) curve. Among 1840 subjects with AIS, 645 patients (35.1%) had a poor outcome 3 months after the stroke onset. Random forest was the best classifier (0.782 of AUROC) using a word-level approach. Overall, the document-level approach exhibited better performance than did the word- or sentence-level approaches. Among all the ML classifiers, the multi-CNN algorithm demonstrated the best classification performance (0.805), followed by the CNN (0.799) algorithm. When predicting future clinical outcomes using NLP-based ML of radiology free-text reports of brain MRI, DL algorithms showed superior performance over the other ML algorithms. In particular, the prediction of poor outcomes in document-level NLP DL was improved more by multi-CNN and CNN than by recurrent neural network-based algorithms. NLP-based DL algorithms can be used as an important digital marker for unstructured electronic health record data DL prediction.

Download Full-text

Natural language processing to measure the frequency and mode of communication between healthcare professionals and family members of critically ill patients

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa263 ◽

2020 ◽

Author(s):

Filipe R Lucini ◽

Karla D Krewulak ◽

Kirsten M Fiest ◽

Sean M Bagshaw ◽

Danny J Zuege ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Critically Ill Patients ◽

Healthcare Professionals ◽

Free Text ◽

Learning Approach ◽

Rule Based ◽

Machine Learning Approach

Abstract Objective To apply natural language processing (NLP) techniques to identify individual events and modes of communication between healthcare professionals and families of critically ill patients from electronic medical records (EMR). Materials and Methods Retrospective cohort study of 280 randomly selected adult patients admitted to 1 of 15 intensive care units (ICU) in Alberta, Canada from June 19, 2012 to June 11, 2018. Individual events and modes of communication were independently abstracted using NLP and manual chart review (reference standard). Preprocessing techniques and 2 NLP approaches (rule-based and machine learning) were evaluated using sensitivity, specificity, and area under the receiver operating characteristic curves (AUROC). Results Over 2700 combinations of NLP methods and hyperparameters were evaluated for each mode of communication using a holdout subset. The rule-based approach had the highest AUROC in 65 datasets compared to the machine learning approach in 21 datasets. Both approaches had similar performance in 17 datasets. The rule-based AUROC for the grouped categories of patient documented to have family or friends (0.972, 95% CI 0.934–1.000), visit by family/friend (0.882 95% CI 0.820–0.943) and phone call with family/friend (0.975, 95% CI: 0.952–0.998) were high. Discussion We report an automated method to quantify communication between healthcare professionals and family members of adult patients from free-text EMRs. A rule-based NLP approach had better overall operating characteristics than a machine learning approach. Conclusion NLP can automatically and accurately measure frequency and mode of documented family visitation and communication from unstructured free-text EMRs, to support patient- and family-centered care initiatives.

Download Full-text

Automate incidental findings in radiology reports using natural language processing and machine learning to identify and classify lung nodules.

Journal of Global Oncology ◽

10.1200/jgo.2019.5.suppl.49 ◽

2019 ◽

Vol 5 (suppl) ◽

pp. 49-49

Author(s):

Christi French ◽

Dax Kurbegov ◽

David R. Spigel ◽

Maciek Makowski ◽

Samantha Terker ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Pulmonary Nodule ◽

Incidental Findings ◽

Free Text ◽

Radiology Reports

49 Background: Pulmonary nodule incidental findings challenge providers to balance resource efficiency and high clinical quality. Incidental findings tend to be under evaluated with studies reporting appropriate follow-up rates as low as 29%. The efficient identification of patients with high risk nodules is foundational to ensuring appropriate follow-up and requires the clinical reading and classification of radiology reports. We tested the feasibility of automating this process with natural language processing (NLP) and machine learning (ML). Methods: In cooperation with Sarah Cannon, the Cancer Institute of HCA Healthcare, we conducted a series of experiments on 8,879 free-text, narrative CT radiology reports. A representative sample of health system ED, IP, and OP reports dated from Dec 2015 - April 2017 were divided into a development set for model training and validation, and a test set to evaluate model performance. A “Nodule Model” was trained to detect the reported presence of a pulmonary nodule and a rules-based “Size Model” was developed to extract the size of the nodule in mms. Reports were bucketed into three prediction groups: ≥ 6 mm, <6 mm, and no size indicated. Nodules were placed in a queue for follow-up if the nodule was predicted ≥ 6 mm, or if the nodule had no size indicated and the report contained the word “mass.” The Fleischner Society Guidelines and clinical review informed these definitions. Results: Precision and recall metrics were calculated for multiple model thresholds. A threshold was selected based on the validation set calculations and a success criterion of 90% queue precision was selected to minimize false positives. On the test dataset, the F1 measure of the entire pipeline was 72.9%, recall was 60.3%, and queue precision was 90.2%, exceeding success criteria. Conclusions: The experiments demonstrate the feasibility of technology to automate the detection and classification of pulmonary nodule incidental findings in radiology reports. This approach promises to improve healthcare quality by increasing the rate of appropriate lung nodule incidental finding follow-up and treatment without excessive labor or risking overutilization.

Download Full-text

Using a Natural Language Processing and Machine Learning Algorithm Program to Analyze Inter-Radiologist Report Style Variation and Compare Variation Between Radiologists When Using Highly Structured Versus More Free Text Reporting

Current Problems in Diagnostic Radiology ◽

10.1067/j.cpradiol.2018.09.005 ◽

2019 ◽

Vol 48 (6) ◽

pp. 524-530 ◽

Cited By ~ 1

Author(s):

Lane F. Donnelly ◽

Robert Grzeszczuk ◽

Carolina V. Guimaraes ◽

Wei Zhang ◽

George S. Bisset III

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Learning Algorithm ◽

Free Text ◽

Machine Learning Algorithm

Download Full-text

Clinical Characteristics and Prognostic Factors for ICU Admission of Patients with Covid-19 Using Machine Learning and Natural Language Processing

10.1101/2020.05.22.20109959 ◽

2020 ◽

Cited By ~ 2

Author(s):

Jose L. Izquierdo ◽

Julio Ancochea ◽

Joan B. Soriano ◽

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Information ◽

Epidemiological Methods ◽

Free Text ◽

Icu Admission ◽

La Mancha ◽

Learning Data

ABSTRACTThere remain many unknowns regarding the onset and clinical course of the ongoing COVID-19 pandemic. We used a combination of classic epidemiological methods, natural language processing (NLP), and machine learning (for predictive modeling), to analyse the electronic health records (EHRs) of patients with COVID-19.We explored the unstructured free text in the EHRs within the SESCAM Healthcare Network (Castilla La-Mancha, Spain) from the entire population with available EHRs (1,364,924 patients) from January 1st to March 29th, 2020. We extracted related clinical information upon diagnosis, progression and outcome for all COVID-19 cases, focusing in those requiring ICU admission.A total of 10,504 patients with a clinical or PCR-confirmed diagnosis of COVID-19 were identified, 52.5% males, with age of 58.2±19.7 years. Upon admission, the most common symptoms were cough, fever, and dyspnoea, but all in less than half of cases. Overall, 6% of hospitalized patients required ICU admission. Using a machine-learning, data-driven algorithm we identified that a combination of age, fever, and tachypnoea was the most parsimonious predictor of ICU admission: those younger than 56 years, without tachypnoea, and temperature <39°C, (or >39°C without respiratory crackles), were free of ICU admission. On the contrary, COVID-19 patients aged 40 to 79 years were likely to be admitted to the ICU if they had tachypnoea and delayed their visit to the ER after being seen in primary care.Our results show that a combination of easily obtainable clinical variables (age, fever, and tachypnoea with/without respiratory crackles) predicts which COVID-19 patients require ICU admission.

Download Full-text

Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing - FeatureEng '05

10.3115/1610230 ◽

2005 ◽

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Feature Engineering

Download Full-text