scholarly journals Fracturing Productivity Prediction Model and Optimization of the Operation Parameters of Shale Gas Well Based on Machine Learning

Lithosphere ◽  
2021 ◽  
Vol 2021 (Special 4) ◽  
Author(s):  
Chaodong Tan ◽  
Junzheng Yang ◽  
Mingyue Cui ◽  
Hua Wu ◽  
Chunqiu Wang ◽  
...  

Abstract Based on the massive static and dynamic data of 137 fractured wells in WY shale gas block in Sichuan, China, this paper carried out the analysis of shale gas fracturing production influencing factors, production prediction model, and fracturing parameter optimization model research. Taking geological, engineering, fracturing operation, and production data of fractured wells in WY block as data set, the main control analysis method is used to construct the shale gas fracturing production influencing factors as the sample set. A production prediction model based on six machine learning (ML) algorithms including random forest (RF), back propagation (BP) neural network, support vector regression (SVR), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and multivariable linear regression (LR) has been established; the evaluation results show that the XGBoost model has the best performance on this sample set. The selection method of shale gas well fracturing operation scheme set is studied; the production rate and the ratio of cost and profit (ROCP) are comprehensively considered to select the final fracturing operation scheme. Research result shows that the data-driven production prediction model and fracturing parameter optimization model can not only be used to predict the production of shale gas fracturing and optimize operation parameters but also realize the sensitivity analysis of fracturing parameters and the effect comparison of fracturing operation schemes, which has good field application value.

2019 ◽  
Vol 12 (1) ◽  
Author(s):  
Daichi Shigemizu ◽  
Shintaro Akiyama ◽  
Yuya Asanomi ◽  
Keith A. Boroevich ◽  
Alok Sharma ◽  
...  

Abstract Background Dementia with Lewy bodies (DLB) is the second most common subtype of neurodegenerative dementia in humans following Alzheimer’s disease (AD). Present clinical diagnosis of DLB has high specificity and low sensitivity and finding potential biomarkers of prodromal DLB is still challenging. MicroRNAs (miRNAs) have recently received a lot of attention as a source of novel biomarkers. Methods In this study, using serum miRNA expression of 478 Japanese individuals, we investigated potential miRNA biomarkers and constructed an optimal risk prediction model based on several machine learning methods: penalized regression, random forest, support vector machine, and gradient boosting decision tree. Results The final risk prediction model, constructed via a gradient boosting decision tree using 180 miRNAs and two clinical features, achieved an accuracy of 0.829 on an independent test set. We further predicted candidate target genes from the miRNAs. Gene set enrichment analysis of the miRNA target genes revealed 6 functional genes included in the DHA signaling pathway associated with DLB pathology. Two of them were further supported by gene-based association studies using a large number of single nucleotide polymorphism markers (BCL2L1: P = 0.012, PIK3R2: P = 0.021). Conclusions Our proposed prediction model provides an effective tool for DLB classification. Also, a gene-based association test of rare variants revealed that BCL2L1 and PIK3R2 were statistically significantly associated with DLB.


2020 ◽  
Vol 4 (Supplement_1) ◽  
Author(s):  
Akihiro Nomura ◽  
Sho Yamamoto ◽  
Yuta Hayakawa ◽  
Kouki Taniguchi ◽  
Takuya Higashitani ◽  
...  

Abstract Diabetes mellitus (DM) is a chronic disorder, characterized by impaired glucose metabolism. It is linked to increased risks of several diseases such as atrial fibrillation, cancer, and cardiovascular diseases. Therefore, DM prevention is essential. However, the traditional regression-based DM-onset prediction methods are incapable of investigating future DM for generally healthy individuals without DM. Employing gradient-boosting decision trees, we developed a machine learning-based prediction model to identify the DM signatures, prior to the onset of DM. We employed the nationwide annual specific health checkup records, collected during the years 2008 to 2018, from Kanazawa city, Ishikawa, Japan. The data included the physical examinations, blood and urine tests, and participant questionnaires. Individuals without DM (at baseline), who underwent more than two annual health checkups during the said period, were included. The new cases of DM onset were recorded when the participants were diagnosed with DM in the annual check-ups. The dataset was divided into three subsets in a 6:2:2 ratio to constitute the training, tuning (internal validation), and testing datasets. Employing the testing dataset, the ability of our trained prediction model to calculate the area under the curve (AUC), precision, recall, F1 score, and overall accuracy was evaluated. Using a 1,000-iteration bootstrap method, every performance test resulted in a two-sided 95% confidence interval (CI). We included 509,153 annual health checkup records of 139,225 participants. Among them, 65,505 participants without DM were included, which constituted36,303 participants in the training dataset and 13,101 participants in each of the tuning and testing datasets. We identified a total of 4,696 new DM-onset patients (7.2%) in the study period. Our trained model predicted the future incidence of DM with the AUC, precision, recall, F1 score, and overall accuracy of 0.71 (0.69-0.72 with 95% CI), 75.3% (71.6-78.8), 42.2% (39.3-45.2), 54.1% (51.2-56.7), and 94.9% (94.5-95.2), respectively. In conclusion, the machine learning-based prediction model satisfactorily identified the DM onset prior to the actual incidence.


Author(s):  
Sooyoung Yoo ◽  
Jinwook Choi ◽  
Borim Ryu ◽  
Seok Kim

Abstract Background Unplanned hospital readmission after discharge reflects low satisfaction and reliability in care and the possibility of potential medical accidents, and is thus indicative of the quality of patient care and the appropriateness of discharge plans. Objectives The purpose of this study was to develop and validate prediction models for all-cause unplanned hospital readmissions within 30 days of discharge, based on a common data model (CDM), which can be applied to multiple institutions for efficient readmission management. Methods Retrospective patient-level prediction models were developed based on clinical data of two tertiary general university hospitals converted into a CDM developed by Observational Medical Outcomes Partnership. Machine learning classification models based on the LASSO logistic regression model, decision tree, AdaBoost, random forest, and gradient boosting machine (GBM) were developed and tested by manipulating a set of CDM variables. An internal 10-fold cross-validation was performed on the target data of the model. To examine its transportability, the model was externally validated. Verification indicators helped evaluate the model performance based on the values of area under the curve (AUC). Results Based on the time interval for outcome prediction, it was confirmed that the prediction model targeting the variables obtained within 30 days of discharge was the most efficient (AUC of 82.75). The external validation showed that the model is transferable, with the combination of various clinical covariates. Above all, the prediction model based on the GBM showed the highest AUC performance of 84.14 ± 0.015 for the Seoul National University Hospital cohort, yielding in 78.33 in external validation. Conclusions This study showed that readmission prediction models developed using machine-learning techniques and CDM can be a useful tool to compare two hospitals in terms of patient-data features.


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0253988
Author(s):  
Akihiro Shimoda ◽  
Yue Li ◽  
Hana Hayashi ◽  
Naoki Kondo

Due to difficulty in early diagnosis of Alzheimer’s disease (AD) related to cost and differentiated capability, it is necessary to identify low-cost, accessible, and reliable tools for identifying AD risk in the preclinical stage. We hypothesized that cognitive ability, as expressed in the vocal features in daily conversation, is associated with AD progression. Thus, we have developed a novel machine learning prediction model to identify AD risk by using the rich voice data collected from daily conversations, and evaluated its predictive performance in comparison with a classification method based on the Japanese version of the Telephone Interview for Cognitive Status (TICS-J). We used 1,465 audio data files from 99 Healthy controls (HC) and 151 audio data files recorded from 24 AD patients derived from a dementia prevention program conducted by Hachioji City, Tokyo, between March and May 2020. After extracting vocal features from each audio file, we developed machine-learning models based on extreme gradient boosting (XGBoost), random forest (RF), and logistic regression (LR), using each audio file as one observation. We evaluated the predictive performance of the developed models by describing the receiver operating characteristic (ROC) curve, calculating the areas under the curve (AUCs), sensitivity, and specificity. Further, we conducted classifications by considering each participant as one observation, computing the average of their audio files’ predictive value, and making comparisons with the predictive performance of the TICS-J based questionnaire. Of 1,616 audio files in total, 1,308 (81.0%) were randomly allocated to the training data and 308 (19.1%) to the validation data. For audio file-based prediction, the AUCs for XGboost, RF, and LR were 0.863 (95% confidence interval [CI]: 0.794–0.931), 0.882 (95% CI: 0.840–0.924), and 0.893 (95%CI: 0.832–0.954), respectively. For participant-based prediction, the AUC for XGboost, RF, LR, and TICS-J were 1.000 (95%CI: 1.000–1.000), 1.000 (95%CI: 1.000–1.000), 0.972 (95%CI: 0.918–1.000) and 0.917 (95%CI: 0.918–1.000), respectively. There was difference in predictive accuracy of XGBoost and TICS-J with almost approached significance (p = 0.065). Our novel prediction model using the vocal features of daily conversations demonstrated the potential to be useful for the AD risk assessment.


2021 ◽  
Author(s):  
Freddy J. Marquez

Abstract Machine Learning is an artificial intelligence subprocess applied to automatically and quickly perform mathematical calculations to data in order to build models used to make predictions. Technical papers related to machine learning algorithms applications have being increasingly published in many oil and gas disciplines over the last five years, revolutionizing the way engineers approach to their works, and sharing innovating solutions that contributes to an increase in efficiency. In this paper, Machine Learning models are built to predict inverse rate of penetration (ROPI) and surface torque for a well located at Gulf of Mexico shallow waters. Three type of analysis were performed. Pre-drill analysis, predicting the parameters without any data of the target well in the database. Drilling analysis, running the model every sixty meters, updating the database with information of the target well and predicting the parameters ahead the bit. Sensitivity parameter optimization analysis was performed iterating weight on bit and rotary speed values as model inputs in order identify the optimum combination to deliver the best drilling performance under the given conditions. The Extreme Gradient Boosting (XGBoost) library in Python programming language environment, was used to build the models. Model performance was satisfactory, overcoming the challenge of using drilling parameters input manually by drilling bit engineers. The database was built with data from different fields and wells. Two databases were created to build the models, one of the models did not consider logging while drilling (LWD) data in order to determine its importance on the predictions. Pre-drill surface torque prediction showed better performance than ROPI. Predictions ahead the bit performance was good both for torque and ROPI. Sensitivity parameter optimization showed better resolution with the database that includes LWD data.


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. e16801-e16801
Author(s):  
Daniel R Cherry ◽  
Qinyu Chen ◽  
James Don Murphy

e16801 Background: Pancreatic cancer has an insidious presentation with four-in-five patients presenting with disease not amenable to potentially curative surgery. Efforts to screen patients for pancreatic cancer using population-wide strategies have proven ineffective. We applied a machine learning approach to create an early prediction model drawing on the content of patients’ electronic health records (EHRs). Methods: We used patient data from OptumLabs which included de-identified data extracted from patient EHRs collected between 2009 and 2017. We identified patients diagnosed with pancreatic cancer at age 40 or later, which we categorized into early-stage pancreatic cancer (ESPC; n = 3,322) and late-stage pancreatic cancer (LSPC; n = 25,908) groups. ESPC cases were matched to non-pancreatic cancer controls in a ratio of 1:16 based on diagnosis year and geographic division, and the cohort was divided into training (70%) and test (30%) sets. The prediction model was built using an eXtreme Gradient Boosting machine learning algorithm of ESPC patients’ EHRs in the year preceding diagnosis, with features including patient demographics, procedure and clinical diagnosis codes, clinical notes and medications. Model discrimination was assessed with sensitivity, specificity, positive predictive value (PPV) and area under the curve (AUC) with a score of 1.0 indicating perfect prediction. Results: The final AUC in the test set was 0.841, and the model included 583 features, of which 248 (42.5%) were physician note elements, 146 (25.0%) were procedure codes, 91 (15.6%) were diagnosis codes, 89 (15.3%) were medications and 9 (1.54%) were demographic features. The most important features were history of pancreatic disorders (not diabetes or cancer), age, income, biliary tract disease, education level, obstructive jaundice and abdominal pain. We evaluated model performance at varying classification thresholds. When applied to patients over 40 choosing a threshold with a sensitivity of 20% produced a specificity of 99.9% and a PPV of 2.5%. The model PPV increased with age; for patients over 80, PPV was 8.0%. LSPC patients identified by the model would have been detected a median of 4 months before their actual diagnosis, with a quarter of these patients identified at least 14 months earlier. Conclusions: Using EHR data to identify early-stage pancreatic cancer patients shows promise. While widespread use of this approach on an unselected population would produce high rates of false positives, this technique could be employed among high risk patients, or paired with other screening tools.


2020 ◽  
Author(s):  
Toru Shirakawa ◽  
Tomohiro Sonoo ◽  
Kentaro Ogura ◽  
Ryo Fujimori ◽  
Konan Hara ◽  
...  

BACKGROUND Although multiple prediction models have been developed to predict hospital admission to emergency departments (EDs) to address overcrowding and patient safety, only a few studies have examined prediction models for prehospital use. Development of institution-specific prediction models is feasible in this age of data science, provided that predictor-related information is readily collectable. OBJECTIVE We aimed to develop a hospital admission prediction model based on patient information that is commonly available during ambulance transport before hospitalization. METHODS Patients transported by ambulance to our ED from April 2018 through March 2019 were enrolled. Candidate predictors were age, sex, chief complaint, vital signs, and patient medical history, all of which were recorded by emergency medical teams during ambulance transport. Patients were divided into two cohorts for derivation (3601/5145, 70.0%) and validation (1544/5145, 30.0%). For statistical models, logistic regression, logistic lasso, random forest, and gradient boosting machine were used. Prediction models were developed in the derivation cohort. Model performance was assessed by area under the receiver operating characteristic curve (AUROC) and association measures in the validation cohort. RESULTS Of 5145 patients transported by ambulance, including deaths in the ED and hospital transfers, 2699 (52.5%) required hospital admission. Prediction performance was higher with the addition of predictive factors, attaining the best performance with an AUROC of 0.818 (95% CI 0.792-0.839) with a machine learning model and predictive factors of age, sex, chief complaint, and vital signs. Sensitivity and specificity of this model were 0.744 (95% CI 0.716-0.773) and 0.745 (95% CI 0.709-0.776), respectively. CONCLUSIONS For patients transferred to EDs, we developed a well-performing hospital admission prediction model based on routinely collected prehospital information including chief complaints.


Sign in / Sign up

Export Citation Format

Share Document