Supervised Learning Predictive Models for Automated Fracturing Treatment Design: A Workflow Based on Algorithm Comparison and Multiphysics Model Validation

Abstract Diagnostic pumping techniques are used routinely in proppant fracturing design. The pumping process can be time consuming; however, it yields technical confidence in treatment and productivity optimization. Recent developments in data analytics and machine learning can aid in shortening operational workflows and enhance project economics. Supervised learning was applied to an existing database to streamline the process and affect the design framework. Five classification algorithms were used for this study. The database was constructed through heterogeneous reservoir plays from the injection/falloff outputs. The algorithms used were support vector machine, decision tree, random forest, multinomial, and XGBoost. The number of classes was sensitized to establish a balance between model accuracy and prediction granularity. Fifteen cases were developed for a comprehensive comparison. A complete machine learning framework was constructed to work through each case set along with hyperparameter tuning to maximize accuracy. After the model was finalized, an extensive field validation workflow was deployed. The target outputs selected for the model were crosslinked fluid efficiency, total proppant mass, and maximum proppant concentration. The unsupervised clustering technique with t-SNE algorithm that was used first lacked accuracy. Supervised classification models showed better predictions. Cross-validation techniques showed an increasing trend of prediction accuracy. Feature selection was done using one-variable-at-a-time (OVAT) and a simple feature correlation study. Because the number of features and the dataset size were small, no features were eliminated from the final model building. Accuracy and F1 score calculations were used from the confusion matrix for evaluation, XGBoost showed excellent results with an accuracy of 74 to 95% for the output parameters. Fluid efficiency was categorized into three classes and yielded an accuracy of 96%. Proppant concentration and proppant mass predictions showed 77% and 86% accuracy, respectively, for the six-class case. The combination of high accuracy and fine granularity confirmed the potential application of machine learning models. The ratio of training to testing (holdout) across all cases ranged from 80:20 to 70:30. Model validations were done through an inverse problem of predicting and matching the fracture geometry and treatment pressures from the machine learning model design and the actual net pressure match. The simulations were conducted using advanced multiphysics simulations. The advantages of this innovative design approach showed four areas of improvement: reduction in polymer consumption by 30%, reduction of the flowback time by 25%, reduction of water usage by 30%, and enhanced operational efficiency by 60 to 65%.

Download Full-text

Application of Machine Learning in Animal Disease Analysis and Prediction

Current Bioinformatics ◽

10.2174/1574893615999200728195613 ◽

2020 ◽

Vol 15 ◽

Author(s):

Shuwen Zhang ◽

Qiang Su ◽

Qin Chen

Keyword(s):

Machine Learning ◽

Unsupervised Learning ◽

Supervised Learning ◽

Clustering Algorithm ◽

Principal Component ◽

Support Vector ◽

Animal Disease ◽

Human Beings ◽

Animal Diseases ◽

Disease Analysis

Abstract: Major animal diseases pose a great threat to animal husbandry and human beings. With the deepening of globalization and the abundance of data resources, the prediction and analysis of animal diseases by using big data are becoming more and more important. The focus of machine learning is to make computers learn how to learn from data and use the learned experience to analyze and predict. Firstly, this paper introduces the animal epidemic situation and machine learning. Then it briefly introduces the application of machine learning in animal disease analysis and prediction. Machine learning is mainly divided into supervised learning and unsupervised learning. Supervised learning includes support vector machines, naive bayes, decision trees, random forests, logistic regression, artificial neural networks, deep learning, and AdaBoost. Unsupervised learning has maximum expectation algorithm, principal component analysis hierarchical clustering algorithm and maxent. Through the discussion of this paper, people have a clearer concept of machine learning and understand its application prospect in animal diseases.

Download Full-text

Predictive Modelling of Employee Turnover in Indian IT Industry Using Machine Learning Techniques

Vision The Journal of Business Perspective ◽

10.1177/0972262918821221 ◽

2019 ◽

Vol 23 (1) ◽

pp. 12-21 ◽

Cited By ~ 2

Author(s):

Shikha N. Khera ◽

Divya

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Confusion Matrix ◽

Predictive Modelling ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

It Industry ◽

Knowledge Based ◽

Employee Attrition

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.

Download Full-text

Machine Learning for Design Optimization of Electromagnetic Devices: Recent Developments and Future Directions

Applied Sciences ◽

10.3390/app11041627 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1627

Author(s):

Yanbin Li ◽

Gang Lei ◽

Gerd Bramerdorfer ◽

Sheng Peng ◽

Xiaodong Sun ◽

...

Keyword(s):

Machine Learning ◽

Design Optimization ◽

Optimization Methods ◽

Machine Learning Algorithms ◽

Cloud Services ◽

Robust Design Optimization ◽

Support Vector ◽

Future Directions ◽

Electromagnetic Devices ◽

Recent Developments

This paper reviews the recent developments of design optimization methods for electromagnetic devices, with a focus on machine learning methods. First, the recent advances in multi-objective, multidisciplinary, multilevel, topology, fuzzy, and robust design optimization of electromagnetic devices are overviewed. Second, a review is presented to the performance prediction and design optimization of electromagnetic devices based on the machine learning algorithms, including artificial neural network, support vector machine, extreme learning machine, random forest, and deep learning. Last, to meet modern requirements of high manufacturing/production quality and lifetime reliability, several promising topics, including the application of cloud services and digital twin, are discussed as future directions for design optimization of electromagnetic devices.

Download Full-text

Fault detection for air conditioning system using machine learning

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v9.i1.pp109-116 ◽

2020 ◽

Vol 9 (1) ◽

pp. 109

Author(s):

Noor Asyikin Sulaiman ◽

Md Pauzi Abdullah ◽

Hayati Abdullah ◽

Muhammad Noorazlan Shah Zainudin ◽

Azdiana Md Yusop

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Air Conditioning ◽

Machine Learning Algorithms ◽

Coefficient Of Performance ◽

Support Vector ◽

Air Conditioning System ◽

Learning Classifier ◽

Negative Impacts ◽

The Impact

Air conditioning system is a complex system and consumes the most energy in a building. Any fault in the system operation such as cooling tower fan faulty, compressor failure, damper stuck, etc. could lead to energy wastage and reduction in the system’s coefficient of performance (COP). Due to the complexity of the air conditioning system, detecting those faults is hard as it requires exhaustive inspections. This paper consists of two parts; i) to investigate the impact of different faults related to the air conditioning system on COP and ii) to analyse the performances of machine learning algorithms to classify those faults. Three supervised learning classifier models were developed, which were deep learning, support vector machine (SVM) and multi-layer perceptron (MLP). The performances of each classifier were investigated in terms of six different classes of faults. Results showed that different faults give different negative impacts on the COP. Also, the three supervised learning classifier models able to classify all faults for more than 94%, and MLP produced the highest accuracy and precision among all.

Download Full-text

Fault-Guided Seismic Stratigraphy Interpretation via Semi-Supervised Learning

10.2118/207218-ms ◽

2021 ◽

Author(s):

Haibin Di ◽

Chakib Kada Kloucha ◽

Cen Li ◽

Aria Abubakar ◽

Zhun Li ◽

...

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Model Building ◽

Structural Information ◽

Mapping Function ◽

Seismic Stratigraphy ◽

Training Data ◽

Entire Study ◽

Depositional Process ◽

Convolutional Autoencoder

Abstract Delineating seismic stratigraphic features and depositional facies is of importance to successful reservoir mapping and identification in the subsurface. Robust seismic stratigraphy interpretation is confronted with two major challenges. The first one is to maximally automate the process particularly with the increasing size of seismic data and complexity of target stratigraphies, while the second challenge is to efficiently incorporate available structures into stratigraphy model building. Machine learning, particularly convolutional neural network (CNN), has been introduced into assisting seismic stratigraphy interpretation through supervised learning. However, the small amount of available expert labels greatly restricts the performance of such supervised CNN. Moreover, most of the exiting CNN implementations are based on only amplitude, which fails to use necessary structural information such as faults for constraining the machine learning. To resolve both challenges, this paper presents a semi-supervised learning workflow for fault-guided seismic stratigraphy interpretation, which consists of two components. The first component is seismic feature engineering (SFE), which aims at learning the provided seismic and fault data through a unsupervised convolutional autoencoder (CAE), while the second one is stratigraphy model building (SMB), which aims at building an optimal mapping function between the features extracted from the SFE CAE and the target stratigraphic labels provided by an experienced interpreter through a supervised CNN. Both components are connected by embedding the encoder of the SFE CAE into the SMB CNN, which forces the SMB learning based on these features commonly existing in the entire study area instead of those only at the limited training data; correspondingly, the risk of overfitting is greatly eliminated. More innovatively, the fault constraint is introduced by customizing the SMB CNN of two output branches, with one to match the target stratigraphies and the other to reconstruct the input fault, so that the fault continues contributing to the process of SMB learning. The performance of such fault-guided seismic stratigraphy interpretation is validated by an application to a real seismic dataset, and the machine prediction not only matches the manual interpretation accurately but also clearly illustrates the depositional process in the study area.

Download Full-text

Supervised Learning Based Classification of Cardiovascular Diseases

Proceedings of Engineering and Technology Innovation ◽

10.46604/peti.2021.7217 ◽

2021 ◽

Vol 20 ◽

pp. 24-34

Author(s):

Arif Hussain ◽

Hassaan Malik ◽

Muhammad Umar Chaudhry

Keyword(s):

Machine Learning ◽

Model Building ◽

Matthews Correlation Coefficient ◽

Early Stage ◽

Support Vector ◽

Logistics Regression ◽

Efficiency And Effectiveness ◽

Sensitivity Specificity ◽

Better Than

Detecting cardiovascular disease (CVD) in the early stage is a difficult and crucial process. The objective of this study is to test the capability of machine learning (ML) methods for accurately diagnosing the CVD outcomes. For this study, the efficiency and effectiveness of four well renowned ML classifiers, i.e., support vector machine (SVM), logistics regression (LR), naive Bayes (NB), and decision tree (J48), are measured in terms of precision, sensitivity, specificity, accuracy, Matthews correlation coefficient (MCC), correctly and incorrectly classified instances, and model building time. These ML classifiers are applied on publically available CVD dataset. In accordance with the measured result, J48 performs better than its competitor classifiers, providing significant assistance to the cardiologists.

Download Full-text

Perbandingan Akurasi dan Waktu Proses Algoritma K-NN dan SVM dalam Analisis Sentimen Twitter

Jurnal Informatika ◽

10.31311/ji.v6i2.5129 ◽

2019 ◽

Vol 6 (2) ◽

pp. 226-235

Author(s):

Muhammad Rangga Aziz Nasution ◽

Mardhiya Hayaty

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Unsupervised Learning ◽

Supervised Learning ◽

Cross Validation ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Fold Cross Validation

Salah satu cabang ilmu komputer yaitu pembelajaran mesin (machine learning) menjadi tren dalam beberapa waktu terakhir. Pembelajaran mesin bekerja dengan memanfaatkan data dan algoritma untuk membuat model dengan pola dari kumpulan data tersebut. Selain itu, pembelajaran mesin juga mempelajari bagaimama model yang telah dibuat dapat memprediksi keluaran (output) berdasarkan pola yang ada. Terdapat dua jenis metode pembelajaran mesin yang dapat digunakan untuk analisis sentimen: supervised learning dan unsupervised learning. Penelitian ini akan membandingkan dua algoritma klasifikasi yang termasuk dari supervised learning: algoritma K-Nearest Neighbor dan Support Vector Machine, dengan cara membuat model dari masing-masing algoritma dengan objek teks sentimen. Perbandingan dilakukan untuk mengetahui algoritma mana lebih baik dalam segi akurasi dan waktu proses. Hasil pada perhitungan akurasi menunjukkan bahwa metode Support Vector Machine lebih unggul dengan nilai 89,70% tanpa K-Fold Cross Validation dan 88,76% dengan K-Fold Cross Validation. Sedangkan pada perhitungan waktu proses metode K-Nearest Neighbor lebih unggul dengan waktu proses 0.0160s tanpa K-Fold Cross Validation dan 0.1505s dengan K-Fold Cross Validation.

Download Full-text

Prediction of Liver Diseases by Using Few Machine Learning Based Approaches

Australian Journal of Engineering and Innovative Technology ◽

10.34104/ajeit.020.085090 ◽

2020 ◽

pp. 85-90

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Liver Diseases ◽

Model Building ◽

Medical Science ◽

Machine Learning Techniques ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbors ◽

Learning Techniques

Advancement in medical science has always been one of the most vital aspects of the human race. With the progress in technology, the use of modern techniques and equipment is always imposed on treatment purposes. Nowadays, machine learning techniques have widely been used in medical science for assuring accuracy. In this work, we have constructed computational model building techniques for liver disease prediction accurately. We used some efficient classification algorithms: Random Forest, Perceptron, Decision Tree, K-Nearest Neighbors (KNN), and Support Vector Machine (SVM) for predicting liver diseases. Our works provide the implementation of hybrid model construction and comparative analysis for improving prediction performance. At first, classification algorithms are applied to the original liver patient datasets collected from the UCI repository. Then we analyzed features and tweaked to improve the performance of our predictor and made a comparative analysis among the classifiers. We examined that, KNN algorithm outperformed all other techniques with feature selection.

Download Full-text

A Smart Terrain Identification Technique Based on Electromyography, Ground Reaction Force, and Machine Learning for Lower Limb Rehabilitation

Applied Sciences ◽

10.3390/app10082638 ◽

2020 ◽

Vol 10 (8) ◽

pp. 2638 ◽

Cited By ~ 2

Author(s):

Shuo Gao ◽

Yixuan Wang ◽

Chaoming Fang ◽

Lijun Xu

Keyword(s):

Machine Learning ◽

Ground Reaction Force ◽

Lower Limb ◽

Confusion Matrix ◽

Reaction Force ◽

Simple System ◽

Support Vector ◽

Terrain Classification ◽

Limb Rehabilitation ◽

Lower Limb Rehabilitation

Automatic terrain classification in lower limb rehabilitation systems has gained worldwide attention. In this field, a simple system architecture and high classification accuracy are two desired attributes. In this article, a smart neuromuscular–mechanical fusion and machine learning-based terrain classification technique utilizing only two electromyography (EMG) sensors and two ground reaction force (GRF) sensors is reported for classifying three different terrains (downhill, level, and uphill). The EMG and GRF signals from ten healthy subjects were collected, preprocessed and segmented to obtain the EMG and GRF profiles in each stride, based on which twenty-one statistical features, including 9 GRF features and 12 EMG features, were extracted. A support vector machine (SVM) machine learning model is established and trained by the extracted EMG features, GRF features and the fusion of them, respectively. Several methods or statistical metrics were used to evaluate the goodness of the proposed technique, including a paired-t-test and Kruskal–Wallis test for correlation analysis of the selected features and ten-fold cross-validation accuracy, confusion matrix, sensitivity and specificity for the performance of the SVM model. The results show that the extracted features are highly correlated with the terrain changes and the fusion of the EMG and GRF features produces the highest accuracy of 96.8%. The presented technique allows simple system construction to achieve the precise detection of outcomes, potentially advancing the development of terrain classification techniques for rehabilitation.

Download Full-text

A machine learning based prediction model of anti-PD-1 therapy response using noninvasive clinical information and blood markers of lung cancer patients.

Journal of Clinical Oncology ◽

10.1200/jco.2019.37.15_suppl.e14138 ◽

2019 ◽

Vol 37 (15_suppl) ◽

pp. e14138-e14138

Author(s):

Beung-Chul AHN ◽

Kyoung Ho Pyo ◽

Dongmin Jung ◽

Chun-Feng Xin ◽

Chang Gon Kim ◽

...

Keyword(s):

Machine Learning ◽

Flow Cytometry ◽

Supervised Learning ◽

Clinical Data ◽

Ridge Regression ◽

Predictive Score ◽

Support Vector ◽

Data Set ◽

Test Set ◽

Flow Cytometry Data

e14138 Background: Immune checkpoint inhibitors have become breakthrough therapy for various types of cancers. However, regarding their total response rate around 20% based on clinical trials, predicting accurate aPD-1 response for individual patient is unestablished. The presence of PD-L1 expression or tumor infiltrating lymphocyte may be used as indicators of response but are limited. We developed models using machine learning methods to predict the aPD-1 response. Methods: A total of 126 advanced NSCLC patients treated with the aPD-1 were enrolled. Their clinical characteristics, treatment outcomes, and adverse events were collected. Total clinical data (n = 126) consist of 15 variables were divided into two subsets, discovery set (n = 63) and test set (n = 63). Thirteen supervised learning algorithms including support vector machine and regularized regression (lasso, ridge, elastic net) were applied on discovery set for model development and on test set for validation. Each model were evaluated according to the ROC curve and cross-validation method. Same methods were used to the subset which had additional flow cytometry data (n = 40). Results: The median age was 64 and 69.8% were male. Adenocarcinoma was predominant (69.8%) and twenty patients (15.1%) were driver mutation positive. Clinical data set (n = 126) demonstrated that the Ridge regression (AUC: 0.79) was the best model for prediction. Of 15 clinical variables, tumor burden, age, ECOG PS and PD-L1, were most important based on the random forest algorithm. When we merged the clinical and flow cytometry data, the Ridge regression model (AUC:0.82) showed better performance compared to using clinical data only. Among 52 variables of merged set, the top most important immune markers were as follows: CD3+CD8+CD25+/Teff-CD28, CD3+CD8+CD25-/Teff-Ki-67, and CD3+CD8+CD25+/Teff-NY-ESO/Teff-PD-1, which indicate activated tumor specific T cell subset. Conclusions: Our machine learning based model has benefit for predicting aPD-1 responses. After further validation in independent patient cohort, the supervised learning based non-invasive predictive score can be established to predict aPD-1 response.

Download Full-text