Effective prediction of lost circulation from multiple drilling variables: a class imbalance problem for machine and deep learning algorithms

AbstractMultiple machine learning (ML) and deep learning (DL) models are evaluated and their prediction performance compared in classifying five wellbore fluid-loss classes from a 20-well drilling dataset (Azadegan oil field, Iran). That dataset includes 65,376 data records with seventeen drilling variables. The dataset fluid-loss classes are heavily imbalanced (> 95% of data records belong to the less significant loss classes 1 and 2; only 0.05% of the data records belong to the complete-loss class 5). Class imbalance and the lack of high correlations between the drilling variables and fluid-loss classes pose challenges for ML/DL models. Tree-based and data matching ML algorithms outperform DL and regression-based ML algorithms in predicting the fluid-loss classes. Random forest (RF), after training and testing, makes only 35 prediction errors for all data records. Consideration of precision recall and F1-scores and expanded confusion matrices show that the RF model provides the best predictions for fluid-loss classes 1 to 3, but that for class 4 Adaboost (ADA) and class 5 decision tree (DT) outperform RF. This suggests that an ensemble of the fast to execute RF, ADA and DT models may be the best way to practically achieve reliable wellbore fluid-loss predictions. DL models underperform several ML models evaluated and are particularly poor at predicting the least represented classes 4 and 5. The DL models also require much longer execution times than the ML models, making them less attractive for field operations that require prompt information regarding rapid real-time decision responses to pending class-4 and class-5 fluid-loss events.

Download Full-text

Data Sampling Methods to Deal With the Big Data Multi-Class Imbalance Problem

Applied Sciences ◽

10.3390/app10041276 ◽

2020 ◽

Vol 10 (4) ◽

pp. 1276 ◽

Cited By ~ 4

Author(s):

Eréndira Rendón ◽

Roberto Alejo ◽

Carlos Castorena ◽

Frank J. Isidro-Ortega ◽

Everardo E. Granda-Gutiérrez

Keyword(s):

Neural Network ◽

Big Data ◽

Deep Learning ◽

Sampling Methods ◽

Hybrid Approach ◽

Class Imbalance ◽

Data Sampling ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Output Noise

The class imbalance problem has been a hot topic in the machine learning community in recent years. Nowadays, in the time of big data and deep learning, this problem remains in force. Much work has been performed to deal to the class imbalance problem, the random sampling methods (over and under sampling) being the most widely employed approaches. Moreover, sophisticated sampling methods have been developed, including the Synthetic Minority Over-sampling Technique (SMOTE), and also they have been combined with cleaning techniques such as Editing Nearest Neighbor or Tomek’s Links (SMOTE+ENN and SMOTE+TL, respectively). In the big data context, it is noticeable that the class imbalance problem has been addressed by adaptation of traditional techniques, relatively ignoring intelligent approaches. Thus, the capabilities and possibilities of heuristic sampling methods on deep learning neural networks in big data domain are analyzed in this work, and the cleaning strategies are particularly analyzed. This study is developed on big data, multi-class imbalanced datasets obtained from hyper-spectral remote sensing images. The effectiveness of a hybrid approach on these datasets is analyzed, in which the dataset is cleaned by SMOTE followed by the training of an Artificial Neural Network (ANN) with those data, while the neural network output noise is processed with ENN to eliminate output noise; after that, the ANN is trained again with the resultant dataset. Obtained results suggest that best classification outcome is achieved when the cleaning strategies are applied on an ANN output instead of input feature space only. Consequently, the need to consider the classifier’s nature when the classical class imbalance approaches are adapted in deep learning and big data scenarios is clear.

Download Full-text

A Framework for Pedestrian Attribute Recognition Using Deep Learning

Applied Sciences ◽

10.3390/app12020622 ◽

2022 ◽

Vol 12 (2) ◽

pp. 622

Author(s):

Saadman Sakib ◽

Kaushik Deb ◽

Pranab Kumar Dhar ◽

Oh-Jin Kwon

Keyword(s):

Deep Learning ◽

Transfer Learning ◽

Class Imbalance ◽

Recognition Task ◽

Fine Tuning ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Technological Advances ◽

Learning Techniques ◽

Attribute Recognition

The pedestrian attribute recognition task is becoming more popular daily because of its significant role in surveillance scenarios. As the technological advances are significantly more than before, deep learning came to the surface of computer vision. Previous works applied deep learning in different ways to recognize pedestrian attributes. The results are satisfactory, but still, there is some scope for improvement. The transfer learning technique is becoming more popular for its extraordinary performance in reducing computation cost and scarcity of data in any task. This paper proposes a framework that can work in surveillance scenarios to recognize pedestrian attributes. The mask R-CNN object detector extracts the pedestrians. Additionally, we applied transfer learning techniques on different CNN architectures, i.e., Inception ResNet v2, Xception, ResNet 101 v2, ResNet 152 v2. The main contribution of this paper is fine-tuning the ResNet 152 v2 architecture, which is performed by freezing layers, last 4, 8, 12, 14, 20, none, and all. Moreover, data balancing techniques are applied, i.e., oversampling, to resolve the class imbalance problem of the dataset and analysis of the usefulness of this technique is discussed in this paper. Our proposed framework outperforms state-of-the-art methods, and it provides 93.41% mA and 89.24% mA on the RAP v2 and PARSE100K datasets, respectively.

Download Full-text

Addressing the Big Data Multi-class Imbalance Problem with Oversampling and Deep Learning Neural Networks

Pattern Recognition and Image Analysis - Lecture Notes in Computer Science ◽

10.1007/978-3-030-31332-6_19 ◽

2019 ◽

pp. 216-224 ◽

Cited By ~ 1

Author(s):

V. M. González-Barcenas ◽

E. Rendón ◽

R. Alejo ◽

E. E. Granda-Gutiérrez ◽

R. M. Valdovinos

Keyword(s):

Neural Networks ◽

Big Data ◽

Deep Learning ◽

Class Imbalance ◽

Class Imbalance Problem ◽

Imbalance Problem

Download Full-text

The application of deep learning algorithms to classify subsurface drilling lost circulation severity in large oil field datasets

SN Applied Sciences ◽

10.1007/s42452-021-04769-0 ◽

2021 ◽

Vol 3 (9) ◽

Author(s):

Sajjad Mardanirad ◽

David A. Wood ◽

Hassan Zakeri

Keyword(s):

Neural Network ◽

Deep Learning ◽

Learning Algorithms ◽

Carbonate Reservoir ◽

Oil Field ◽

Superior Performance ◽

Large Field ◽

Fluid Loss ◽

List Type ◽

Lost Circulation

Abstract In this paper, we present how precise deep learning algorithms can distinguish loss circulation severities in oil drilling operations. Lost circulation is one of the costliest downhole problem encountered during oil and gas well construction. Applying artificial intelligence can help drilling teams to be forewarned of pending lost circulation events and thereby mitigate their consequences. Data-driven methods are traditionally employed for fluid loss complexity quantification but are not able to achieve reliable predictions for field cases with large quantities of data. This paper attempts to investigate the performance of deep learning (DL) approach in classification the types of fluid loss from a very large field dataset. Three DL classification models are evaluated: Convolutional Neural Network (CNN), Gated Recurrent Unit (GRU) and Long-Short Term Memory (LSTM). Five fluid-loss classes are considered: No Loss, Seepage, Partial, Severe, and Complete Loss. 20 wells drilled into the giant Azadegan oil field (Iran) provide 65,376 data records are used to predict the fluid loss classes. The results obtained, based on multiple statistical performance measures, identify the CNN model as achieving superior performance (98% accuracy) compared to the LSTM and GRU models (94% accuracy). Confusion matrices provide further insight to the prediction accuracies achieved. The three DL models evaluated were all able to classify different types of lost circulation events with reasonable prediction accuracy. Future work is required to evaluate the performance of the DL approach proposed with additional large datasets. The proposed method helps drilling teams deal with lost circulation events efficiently. Article Highlights Three deep learning models classify fluid loss severity in an oil field carbonate reservoir. Deep learning algorithms advance machine learning a large resource dataset with 65,376 data records. Convolution neural network outperformed other deep learning methods.

Download Full-text

Classifying COVID-19 variants based on genetic sequences using deep learning models

10.1101/2021.06.29.450335 ◽

2021 ◽

Author(s):

Sayantani Basu ◽

Roy H. Campbell

Keyword(s):

Deep Learning ◽

Short Term Memory ◽

Sequence Data ◽

Virus Disease ◽

Class Imbalance ◽

Fixed Number ◽

Class Imbalance Problem ◽

Imbalance Problem ◽

Batch Sizes ◽

The One

The COrona VIrus Disease (COVID-19) pandemic led to the occurrence of several variants with time. This has led to an increased importance of understanding sequence data related to COVID-19. In this chapter, we propose an alignment-free k-mer based LSTM (Long Short-Term Memory) deep learning model that can classify 20 different variants of COVID-19. We handle the class imbalance problem by sampling a fixed number of sequences for each class label. We handle the vanishing gradient problem in LSTMs arising from long sequences by dividing the sequence into fixed lengths and obtaining results on individual runs. Our results show that one- vs-all classifiers have test accuracies as high as 92.5% with tuned hyperparameters compared to the multi-class classifier model. Our experiments show higher overall accuracies for B.1.1.214, B.1.177.21, B.1.1.7, B.1.526, and P.1 on the one-vs-all classifiers, suggesting the presence of distinct mutations in these variants. Our results show that embedding vector size and batch sizes have insignificant improvement in accuracies, but changing from 2-mers to 3-mers mostly improves accuracies. We also studied individual runs which show that most accuracies improved after the 20th run, indicating that these sequence positions may have more contributions to distinguishing among different COVID-19 variants.

Download Full-text

Detection of Myocardial Infarction Using ECG and Multi-Scale Feature Concatenate

Sensors ◽

10.3390/s21051906 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1906

Author(s):

Jia-Zheng Jian ◽

Tzong-Rong Ger ◽

Han-Hua Lai ◽

Chi-Ming Ku ◽

Chiung-An Chen ◽

...

Keyword(s):

Myocardial Infarction ◽

Network Structure ◽

Class Imbalance ◽

Class Imbalance Problem ◽

Multi Scale ◽

Imbalance Problem ◽

Average Accuracy ◽

Significant Difference ◽

Electrocardiogram Ecg

Diverse computer-aided diagnosis systems based on convolutional neural networks were applied to automate the detection of myocardial infarction (MI) found in electrocardiogram (ECG) for early diagnosis and prevention. However, issues, particularly overfitting and underfitting, were not being taken into account. In other words, it is unclear whether the network structure is too simple or complex. Toward this end, the proposed models were developed by starting with the simplest structure: a multi-lead features-concatenate narrow network (N-Net) in which only two convolutional layers were included in each lead branch. Additionally, multi-scale features-concatenate networks (MSN-Net) were also implemented where larger features were being extracted through pooling the signals. The best structure was obtained via tuning both the number of filters in the convolutional layers and the number of inputting signal scales. As a result, the N-Net reached a 95.76% accuracy in the MI detection task, whereas the MSN-Net reached an accuracy of 61.82% in the MI locating task. Both networks give a higher average accuracy and a significant difference of p < 0.001 evaluated by the U test compared with the state-of-the-art. The models are also smaller in size thus are suitable to fit in wearable devices for offline monitoring. In conclusion, testing throughout the simple and complex network structure is indispensable. However, the way of dealing with the class imbalance problem and the quality of the extracted features are yet to be discussed.

Download Full-text

A Novel Focal Phi Loss for Power Line Segmentation with Auxiliary Classifier U-Net

Sensors ◽

10.3390/s21082803 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2803

Author(s):

Rabeea Jaffari ◽

Manzoor Ahmed Hashmani ◽

Constantino Carlos Reyes-Aldasoro

Keyword(s):

Loss Function ◽

Class Imbalance ◽

Power Line ◽

Aerial Images ◽

Class Imbalance Problem ◽

Trade Off ◽

Urban Scenes ◽

Imbalance Problem ◽

A Minor ◽

Evaluation Parameters

The segmentation of power lines (PLs) from aerial images is a crucial task for the safe navigation of unmanned aerial vehicles (UAVs) operating at low altitudes. Despite the advances in deep learning-based approaches for PL segmentation, these models are still vulnerable to the class imbalance present in the data. The PLs occupy only a minimal portion (1–5%) of the aerial images as compared to the background region (95–99%). Generally, this class imbalance problem is addressed via the use of PL-specific detectors in conjunction with the popular class balanced cross entropy (BBCE) loss function. However, these PL-specific detectors do not work outside their application areas and a BBCE loss requires hyperparameter tuning for class-wise weights, which is not trivial. Moreover, the BBCE loss results in low dice scores and precision values and thus, fails to achieve an optimal trade-off between dice scores, model accuracy, and precision–recall values. In this work, we propose a generalized focal loss function based on the Matthews correlation coefficient (MCC) or the Phi coefficient to address the class imbalance problem in PL segmentation while utilizing a generic deep segmentation architecture. We evaluate our loss function by improving the vanilla U-Net model with an additional convolutional auxiliary classifier head (ACU-Net) for better learning and faster model convergence. The evaluation of two PL datasets, namely the Mendeley Power Line Dataset and the Power Line Dataset of Urban Scenes (PLDU), where PLs occupy around 1% and 2% of the aerial images area, respectively, reveal that our proposed loss function outperforms the popular BBCE loss by 16% in PL dice scores on both the datasets, 19% in precision and false detection rate (FDR) values for the Mendeley PL dataset and 15% in precision and FDR values for the PLDU with a minor degradation in the accuracy and recall values. Moreover, our proposed ACU-Net outperforms the baseline vanilla U-Net for the characteristic evaluation parameters in the range of 1–10% for both the PL datasets. Thus, our proposed loss function with ACU-Net achieves an optimal trade-off for the characteristic evaluation parameters without any bells and whistles. Our code is available at Github.

Download Full-text

Threshold Moving Approaches for Addressing the Class Imbalance Problem and their Application to Multi-label Classification

2020 4th International Conference on Advances in Image Processing ◽

10.1145/3441250.3441274 ◽

2020 ◽

Author(s):

Xingfu Zhang ◽

Hyukjun Gweon ◽

Serge Provost

Keyword(s):

Class Imbalance ◽

Class Imbalance Problem ◽

Imbalance Problem

Download Full-text

A systematic study of the class imbalance problem: Automatically identifying empty camera trap images using convolutional neural networks

Ecological Informatics ◽

10.1016/j.ecoinf.2021.101350 ◽

2021 ◽

pp. 101350

Author(s):

Deng-Qi Yang ◽

Tao Li ◽

Meng-Tao Liu ◽

Xiao-Wei Li ◽

Ben-Hui Chen

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Systematic Study ◽

Class Imbalance ◽

Camera Trap ◽

Class Imbalance Problem ◽

Imbalance Problem

Download Full-text

Hybridization of ring theory-based evolutionary algorithm and particle swarm optimization to solve class imbalance problem

Complex & Intelligent Systems ◽

10.1007/s40747-021-00314-z ◽

2021 ◽

Author(s):

Sayan Surya Shaw ◽

Shameem Ahmed ◽

Samir Malakar ◽

Laura Garcia-Hernandez ◽

Ajith Abraham ◽

...

Keyword(s):

Particle Swarm Optimization ◽

Real Life ◽

Class Imbalance ◽

Ring Theory ◽

Class Imbalance Problem ◽

Minority Class ◽

Swarm Optimization ◽

Imbalance Problem ◽

Representative Samples ◽

Selection Of

AbstractMany real-life datasets are imbalanced in nature, which implies that the number of samples present in one class (minority class) is exceptionally less compared to the number of samples found in the other class (majority class). Hence, if we directly fit these datasets to a standard classifier for training, then it often overlooks the minority class samples while estimating class separating hyperplane(s) and as a result of that it missclassifies the minority class samples. To solve this problem, over the years, many researchers have followed different approaches. However the selection of the true representative samples from the majority class is still considered as an open research problem. A better solution for this problem would be helpful in many applications like fraud detection, disease prediction and text classification. Also, the recent studies show that it needs not only analyzing disproportion between classes, but also other difficulties rooted in the nature of different data and thereby it needs more flexible, self-adaptable, computationally efficient and real-time method for selection of majority class samples without loosing much of important data from it. Keeping this fact in mind, we have proposed a hybrid model constituting Particle Swarm Optimization (PSO), a popular swarm intelligence-based meta-heuristic algorithm, and Ring Theory (RT)-based Evolutionary Algorithm (RTEA), a recently proposed physics-based meta-heuristic algorithm. We have named the algorithm as RT-based PSO or in short RTPSO. RTPSO can select the most representative samples from the majority class as it takes advantage of the efficient exploration and the exploitation phases of its parent algorithms for strengthening the search process. We have used AdaBoost classifier to observe the final classification results of our model. The effectiveness of our proposed method has been evaluated on 15 standard real-life datasets having low to extreme imbalance ratio. The performance of the RTPSO has been compared with PSO, RTEA and other standard undersampling methods. The obtained results demonstrate the superiority of RTPSO over state-of-the-art class imbalance problem-solvers considered here for comparison. The source code of this work is available in https://github.com/Sayansurya/RTPSO_Class_imbalance.

Download Full-text