A Comparison of Machine Learning Algorithms in Predicting Lithofacies: Case Studies from Norway and Kazakhstan

Defining distinctive areas of the physical properties of rocks plays an important role in reservoir evaluation and hydrocarbon production as core data are challenging to obtain from all wells. In this work, we study the evaluation of lithofacies values using the machine learning algorithms in the determination of classification from various well log data of Kazakhstan and Norway. We also use the wavelet-transformed data in machine learning algorithms to identify geological properties from the well log data. Numerical results are presented for the multiple oil and gas reservoir data which contain more than 90 released wells from Norway and 10 wells from the Kazakhstan field. We have compared the the machine learning algorithms including KNN, Decision Tree, Random Forest, XGBoost, and LightGBM. The evaluation of the model score is conducted by using metrics such as accuracy, Hamming loss, and penalty matrix. In addition, the influence of the dataset features on the prediction is investigated using the machine learning algorithms. The result of research shows that the Random Forest model has the best score among considered algorithms. In addition, the results are consistent with outcome of the SHapley Additive exPlanations (SHAP) framework.

Download Full-text

Direct prediction of petrophysical and petroelastic reservoir properties from seismic and well-log data using nonlinear machine learning algorithms

The Leading Edge ◽

10.1190/tle38120949.1 ◽

2019 ◽

Vol 38 (12) ◽

pp. 949-958 ◽

Cited By ~ 1

Author(s):

I. I. Priezzhev ◽

P. C. H. Veeken ◽

S. V. Egorov ◽

U. Strecker

Keyword(s):

Machine Learning ◽

Bulk Density ◽

Learning Algorithms ◽

Seismic Inversion ◽

Machine Learning Algorithms ◽

Well Log ◽

Convolution Operators ◽

Reservoir Properties ◽

Log Data ◽

Linear Convolution

An analytical comparison of seismic inversion with several multivariate predictive techniques is made. Statistical data reduction techniques are examined that incorporate various machine learning algorithms, such as linear regression, alternating conditional expectation regression, random forest, and neural network. Seismic and well-log data are combined to estimate petrophysical or petroelastic properties, like bulk density. Currently, spatial distribution and estimation of reservoir properties is leveraged by inverting 3D seismic data calibrated to elastic properties (VP, VS, and bulk density) obtained from well-log data. Most commercial seismic inversions are based on linear convolution, i.e., one-dimensional models that involve a simplified plane-parallel medium. However, in cases that are geophysically more complex, such as fractured and/or fluid-rich layers, the conventional straightforward prediction relationship breaks down. This is because linear convolution operators no longer adequately describe seismic wavefield propagation due to nonlinear energy absorption. Such nonlinearity is also suggested by the seismic nonstationarity phenomenon, expressed by vertical and horizontal changes in the shape of the seismic wavelet (amplitude and frequency variations). The nonlinear predictive operator, extracted by machine learning algorithms, makes it possible in certain cases to estimate petrophysical reservoir properties more accurately and with less influence of interpretational bias.

Download Full-text

Application of machine learning algorithms to predict permeability in tight sandstone formations

Nafta-Gaz ◽

10.18668/ng.2021.05.01 ◽

2021 ◽

Vol 77 (5) ◽

pp. 283-292

Author(s):

Tomasz Topór ◽

Keyword(s):

Machine Learning ◽

Oil And Gas ◽

Learning Algorithms ◽

Confining Pressure ◽

Machine Learning Algorithms ◽

Core Material ◽

Confining Stress ◽

Tight Sandstone ◽

Oil And Gas Exploration ◽

Better Than

The application of machine learning algorithms in petroleum geology has opened a new chapter in oil and gas exploration. Machine learning algorithms have been successfully used to predict crucial petrophysical properties when characterizing reservoirs. This study utilizes the concept of machine learning to predict permeability under confining stress conditions for samples from tight sandstone formations. The models were constructed using two machine learning algorithms of varying complexity (multiple linear regression [MLR] and random forests [RF]) and trained on a dataset that combined basic well information, basic petrophysical data, and rock type from a visual inspection of the core material. The RF algorithm underwent feature engineering to increase the number of predictors in the models. In order to check the training models’ robustness, 10-fold cross-validation was performed. The MLR and RF applications demonstrated that both algorithms can accurately predict permeability under constant confining pressure (R2 0.800 vs. 0.834). The RF accuracy was about 3% better than that of the MLR and about 6% better than the linear reference regression (LR) that utilized only porosity. Porosity was the most influential feature of the models’ performance. In the case of RF, the depth was also significant in the permeability predictions, which could be evidence of hidden interactions between the variables of porosity and depth. The local interpretation revealed the common features among outliers. Both the training and testing sets had moderate-low porosity (3–10%) and a lack of fractures. In the test set, calcite or quartz cementation also led to poor permeability predictions. The workflow that utilizes the tidymodels concept will be further applied in more complex examples to predict spatial petrophysical features from seismic attributes using various machine learning algorithms.

Download Full-text

Feature Selection and Comparison of Machine Learning Algorithms in Classification of Grazing and Rumination Behaviour in Sheep

Sensors ◽

10.3390/s18103532 ◽

2018 ◽

Vol 18 (10) ◽

pp. 3532 ◽

Cited By ~ 16

Author(s):

Nicola Mansbridge ◽

Jurgen Mitsch ◽

Nicola Bollard ◽

Keith Ellis ◽

Giuliana Miguel-Pacheco ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Time Budget ◽

Learning Algorithms ◽

Eating Behaviour ◽

Machine Learning Algorithms ◽

Support Vector ◽

Optimum Number ◽

Eating Behaviours ◽

Adaptive Boosting

Grazing and ruminating are the most important behaviours for ruminants, as they spend most of their daily time budget performing these. Continuous surveillance of eating behaviour is an important means for monitoring ruminant health, productivity and welfare. However, surveillance performed by human operators is prone to human variance, time-consuming and costly, especially on animals kept at pasture or free-ranging. The use of sensors to automatically acquire data, and software to classify and identify behaviours, offers significant potential in addressing such issues. In this work, data collected from sheep by means of an accelerometer/gyroscope sensor attached to the ear and collar, sampled at 16 Hz, were used to develop classifiers for grazing and ruminating behaviour using various machine learning algorithms: random forest (RF), support vector machine (SVM), k nearest neighbour (kNN) and adaptive boosting (Adaboost). Multiple features extracted from the signals were ranked on their importance for classification. Several performance indicators were considered when comparing classifiers as a function of algorithm used, sensor localisation and number of used features. Random forest yielded the highest overall accuracies: 92% for collar and 91% for ear. Gyroscope-based features were shown to have the greatest relative importance for eating behaviours. The optimum number of feature characteristics to be incorporated into the model was 39, from both ear and collar data. The findings suggest that one can successfully classify eating behaviours in sheep with very high accuracy; this could be used to develop a device for automatic monitoring of feed intake in the sheep sector to monitor health and welfare.

Download Full-text

Techniques for Detecting Malware Traffic: A Comprehensive Approach to Feature Selection and Classification

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39088 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1-10

Author(s):

Harsha A K

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Learning Algorithms ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Steady Increase ◽

Extreme Gradient Boosting

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.

Download Full-text

PSIX-15 Assessment of machine learning algorithms for prediction of Aleutian disease in American mink

Journal of Animal Science ◽

10.1093/jas/skab235.484 ◽

2021 ◽

Vol 99 (Supplement_3) ◽

pp. 264-265

Author(s):

Duy Ngoc Do ◽

Guoyu Hu ◽

Younes Miar

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Models ◽

American Mink ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Training Data ◽

Enzyme Linked Immunosorbent Assay ◽

Linear Discriminant ◽

Machine Learning Classification

Abstract American mink (Neovison vison) is the major source of fur for the fur industries worldwide and Aleutian disease (AD) is causing severe financial losses to the mink industry. Different methods have been used to diagnose the AD in mink, but the combination of several methods can be the most appropriate approach for the selection of AD resilient mink. Iodine agglutination test (IAT) and counterimmunoelectrophoresis (CIEP) methods are commonly employed in test-and-remove strategy; meanwhile, enzyme-linked immunosorbent assay (ELISA) and packed-cell volume (PCV) methods are complementary. However, using multiple methods are expensive; and therefore, hindering the corrected use of AD tests in selection. This research presented the assessments of the AD classification based on machine learning algorithms. The Aleutian disease was tested on 1,830 individuals using these tests in an AD positive mink farm (Canadian Centre for Fur Animal Research, NS, Canada). The accuracy of classification for CIEP was evaluated based on the sex information, and IAT, ELISA and PCV test results implemented in seven machine learning classification algorithms (Random Forest, Artificial Neural Networks, C50Tree, Naive Bayes, Generalized Linear Models, Boost, and Linear Discriminant Analysis) using the Caret package in R. The accuracy of prediction varied among the methods. Overall, the Random Forest was the best-performing algorithm for the current dataset with an accuracy of 0.89 in the training data and 0.94 in the testing data. Our work demonstrated the utility and relative ease of using machine learning algorithms to assess the CIEP information, and consequently reducing the cost of AD tests. However, further works require the inclusion of production and reproduction information in the models and extension of phenotypic collection to increase the accuracy of current methods.

Download Full-text

Abstract 15895: Machine Learning Algorithms to Predict Major Adverse Cardiovascular Events in Patients Undergoing Orthotopic Liver Transplantation: A Retrospective Cohort Study

Circulation ◽

10.1161/circ.142.suppl_3.15895 ◽

2020 ◽

Vol 142 (Suppl_3) ◽

Author(s):

vardhmaan jain ◽

Vikram Sharma ◽

Agam Bansal ◽

Cerise Kleb ◽

Chirag Sheth ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Cardiovascular Events ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Major Adverse Cardiovascular Events ◽

Support Vector ◽

Post Transplant ◽

Extreme Gradient Boosting ◽

All Cause Mortality

Background: Post-transplant major adverse cardiovascular events (MACE) are amongst the leading cause of death amongst orthotopic liver transplant(OLT) recipients. Despite years of guideline directed therapy, there are limited data on predictors of post-OLT MACE. We assessed if machine learning algorithms (MLA) can predict MACE and all-cause mortality in patients undergoing OLT. Methods: We tested three MLA: support vector machine, extreme gradient boosting(XG-Boost) and random forest with traditional logistic regression for prediction of MACE and all-cause mortality on a cohort of consecutive patients undergoing OLT at our center between 2008-2019. The cohort was randomly split into a training (80%) and testing (20%) cohort. Model performance was assessed using c-statistic or AUC. Results: We included 1,459 consecutive patients with mean ± SD age 54.2 ± 13.8 years, 32% female who underwent OLT. There were 199 (13.6%) MACE and 289 (20%) deaths at a mean follow up of 4.56 ± 3.3 years. The random forest MLA was the best performing model for predicting MACE [AUC:0.78, 95% CI: 0.70-0.85] as well as mortality [AUC:0.69, 95% CI: 0.61-0.76], with all models performing better when predicting MACE vs mortality. See Table and Figure. Conclusion: Random forest machine learning algorithms were more predictive and discriminative than traditional regression models for predicting major adverse cardiovascular events and all-cause mortality in patients undergoing OLT. Validation and subsequent incorporation of MLA in clinical decision making for OLT candidacy could help risk stratify patients for post-transplant adverse cardiovascular events.

Download Full-text

Modeling Barrier Island Habitats Using Landscape Position Information

Remote Sensing ◽

10.3390/rs11080976 ◽

2019 ◽

Vol 11 (8) ◽

pp. 976

Author(s):

Nicholas M. Enwright ◽

Lei Wang ◽

Hongqing Wang ◽

Michael J. Osland ◽

Laura C. Feher ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Barrier Island ◽

Barrier Islands ◽

Machine Learning Algorithms ◽

Landscape Position ◽

K Nearest Neighbor ◽

Island Habitats

Barrier islands are dynamic environments because of their position along the marine–estuarine interface. Geomorphology influences habitat distribution on barrier islands by regulating exposure to harsh abiotic conditions. Researchers have identified linkages between habitat and landscape position, such as elevation and distance from shore, yet these linkages have not been fully leveraged to develop predictive models. Our aim was to evaluate the performance of commonly used machine learning algorithms, including K-nearest neighbor, support vector machine, and random forest, for predicting barrier island habitats using landscape position for Dauphin Island, Alabama, USA. Landscape position predictors were extracted from topobathymetric data. Models were developed for three tidal zones: subtidal, intertidal, and supratidal/upland. We used a contemporary habitat map to identify landscape position linkages for habitats, such as beach, dune, woody vegetation, and marsh. Deterministic accuracy, fuzzy accuracy, and hindcasting were used for validation. The random forest algorithm performed best for intertidal and supratidal/upland habitats, while the K-nearest neighbor algorithm performed best for subtidal habitats. A posteriori application of expert rules based on theoretical understanding of barrier island habitats enhanced model results. For the contemporary model, deterministic overall accuracy was nearly 70%, and fuzzy overall accuracy was over 80%. For the hindcast model, deterministic overall accuracy was nearly 80%, and fuzzy overall accuracy was over 90%. We found machine learning algorithms were well-suited for predicting barrier island habitats using landscape position. Our model framework could be coupled with hydrodynamic geomorphologic models for forecasting habitats with accelerated sea-level rise, simulated storms, and restoration actions.

Download Full-text

Development and Evaluation of the Combined Machine Learning Models for the Prediction of Dam Inflow

Water ◽

10.3390/w12102927 ◽

2020 ◽

Vol 12 (10) ◽

pp. 2927

Author(s):

Jiyeong Hong ◽

Seoro Lee ◽

Joo Hyun Bae ◽

Jimin Lee ◽

Woon Ji Park ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Multilayer Perceptron ◽

Short Term Memory ◽

Learning Algorithms ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Dam Inflow

Predicting dam inflow is necessary for effective water management. This study created machine learning algorithms to predict the amount of inflow into the Soyang River Dam in South Korea, using weather and dam inflow data for 40 years. A total of six algorithms were used, as follows: decision tree (DT), multilayer perceptron (MLP), random forest (RF), gradient boosting (GB), recurrent neural network–long short-term memory (RNN–LSTM), and convolutional neural network–LSTM (CNN–LSTM). Among these models, the multilayer perceptron model showed the best results in predicting dam inflow, with the Nash–Sutcliffe efficiency (NSE) value of 0.812, root mean squared errors (RMSE) of 77.218 m3/s, mean absolute error (MAE) of 29.034 m3/s, correlation coefficient (R) of 0.924, and determination coefficient (R2) of 0.817. However, when the amount of dam inflow is below 100 m3/s, the ensemble models (random forest and gradient boosting models) performed better than MLP for the prediction of dam inflow. Therefore, two combined machine learning (CombML) models (RF_MLP and GB_MLP) were developed for the prediction of the dam inflow using the ensemble methods (RF and GB) at precipitation below 16 mm, and the MLP at precipitation above 16 mm. The precipitation of 16 mm is the average daily precipitation at the inflow of 100 m3/s or more. The results show the accuracy verification results of NSE 0.857, RMSE 68.417 m3/s, MAE 18.063 m3/s, R 0.927, and R2 0.859 in RF_MLP, and NSE 0.829, RMSE 73.918 m3/s, MAE 18.093 m3/s, R 0.912, and R2 0.831 in GB_MLP, which infers that the combination of the models predicts the dam inflow the most accurately. CombML algorithms showed that it is possible to predict inflow through inflow learning, considering flow characteristics such as flow regimes, by combining several machine learning algorithms.

Download Full-text

Towards Near-Real-Time Intrusion Detection for IoT Devices using Supervised Learning and Apache Spark

Electronics ◽

10.3390/electronics9030444 ◽

2020 ◽

Vol 9 (3) ◽

pp. 444 ◽

Cited By ~ 1

Author(s):

Valerio Morfino ◽

Salvatore Rampone

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithms ◽

Hybrid Approach ◽

Cyber Attacks ◽

Machine Learning Algorithms ◽

Apache Spark ◽

Identification Accuracy ◽

Supervised Machine Learning ◽

Iot Devices

In the fields of Internet of Things (IoT) infrastructures, attack and anomaly detection are rising concerns. With the increased use of IoT infrastructure in every domain, threats and attacks in these infrastructures are also growing proportionally. In this paper the performances of several machine learning algorithms in identifying cyber-attacks (namely SYN-DOS attacks) to IoT systems are compared both in terms of application performances, and in training/application times. We use supervised machine learning algorithms included in the MLlib library of Apache Spark, a fast and general engine for big data processing. We show the implementation details and the performance of those algorithms on public datasets using a training set of up to 2 million instances. We adopt a Cloud environment, emphasizing the importance of the scalability and of the elasticity of use. Results show that all the Spark algorithms used result in a very good identification accuracy (>99%). Overall, one of them, Random Forest, achieves an accuracy of 1. We also report a very short training time (23.22 sec for Decision Tree with 2 million rows). The experiments also show a very low application time (0.13 sec for over than 600,000 instances for Random Forest) using Apache Spark in the Cloud. Furthermore, the explicit model generated by Random Forest is very easy-to-implement using high- or low-level programming languages. In light of the results obtained, both in terms of computation times and identification performance, a hybrid approach for the detection of SYN-DOS cyber-attacks on IoT devices is proposed: the application of an explicit Random Forest model, implemented directly on the IoT device, along with a second level analysis (training) performed in the Cloud.

Download Full-text

Comparison of Machine Learning Algorithms for Wildland-Urban Interface Fuelbreak Planning Integrating ALS and UAV-Borne LiDAR Data and Multispectral Images

Drones ◽

10.3390/drones4020021 ◽

2020 ◽

Vol 4 (2) ◽

pp. 21 ◽

Cited By ~ 1

Author(s):

Francisco Rodríguez-Puerta ◽

Rafael Alonso Ponce ◽

Fernando Pérez-Rodríguez ◽

Beatriz Águeda ◽

Saray Martín-García ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Random Forest ◽

Learning Algorithms ◽

Remote Sensing Data ◽

Machine Learning Algorithms ◽

Data Sources ◽

Lidar Data ◽

Sensing Data ◽

Sentinel 2

Controlling vegetation fuels around human settlements is a crucial strategy for reducing fire severity in forests, buildings and infrastructure, as well as protecting human lives. Each country has its own regulations in this respect, but they all have in common that by reducing fuel load, we in turn reduce the intensity and severity of the fire. The use of Unmanned Aerial Vehicles (UAV)-acquired data combined with other passive and active remote sensing data has the greatest performance to planning Wildland-Urban Interface (WUI) fuelbreak through machine learning algorithms. Nine remote sensing data sources (active and passive) and four supervised classification algorithms (Random Forest, Linear and Radial Support Vector Machine and Artificial Neural Networks) were tested to classify five fuel-area types. We used very high-density Light Detection and Ranging (LiDAR) data acquired by UAV (154 returns·m−2 and ortho-mosaic of 5-cm pixel), multispectral data from the satellites Pleiades-1B and Sentinel-2, and low-density LiDAR data acquired by Airborne Laser Scanning (ALS) (0.5 returns·m−2, ortho-mosaic of 25 cm pixels). Through the Variable Selection Using Random Forest (VSURF) procedure, a pre-selection of final variables was carried out to train the model. The four algorithms were compared, and it was concluded that the differences among them in overall accuracy (OA) on training datasets were negligible. Although the highest accuracy in the training step was obtained in SVML (OA=94.46%) and in testing in ANN (OA=91.91%), Random Forest was considered to be the most reliable algorithm, since it produced more consistent predictions due to the smaller differences between training and testing performance. Using a combination of Sentinel-2 and the two LiDAR data (UAV and ALS), Random Forest obtained an OA of 90.66% in training and of 91.80% in testing datasets. The differences in accuracy between the data sources used are much greater than between algorithms. LiDAR growth metrics calculated using point clouds in different dates and multispectral information from different seasons of the year are the most important variables in the classification. Our results support the essential role of UAVs in fuelbreak planning and management and thus, in the prevention of forest fires.

Download Full-text