Novel hybrid machine learning optimizer algorithms to prediction of fracture density by petrophysical data

AbstractOne of the challenges in reservoir management is determining the fracture density (FVDC) in reservoir rock. Given the high cost of coring operations and image logs, the ability to predict FVDC from various petrophysical input variables using a supervised learning basis calibrated to the standard well is extremely useful. In this study, a novel machine learning approach is developed to predict FVDC from 12-input variable well-log based on feature selection. To predict the FVDC, combination of two networks of multiple extreme learning machines (MELM) and multi-layer perceptron (MLP) hybrid algorithm with a combination of genetic algorithm (GA) and particle swarm optimizer (PSO) has been used. We use a novel MELM-PSO/GA combination that has never been used before, and the best comparison result between MELM-PSO-related models with performance test data is RMSE = 0.0047 1/m; R2 = 0.9931. According to the performance accuracy analysis, the models are MLP-PSO < MLP-GA < MELM-GA < MELM-PSO. This method can be used in other fields, but it must be recalibrated with at least one well. Furthermore, the developed method provides insights for the use of machine learning to reduce errors and avoid data overfitting in order to create the best possible prediction performance for FVDC prediction.

Download Full-text

Automatic Identification of Formation Iithology from Well Log Data: A Machine Learning Approach

Journal of Petroleum Science Research ◽

10.14355/jpsr.2014.0302.04 ◽

2014 ◽

Vol 3 (2) ◽

pp. 73 ◽

Cited By ~ 13

Author(s):

Seyyed Mohsen Salehi ◽

Bizhan Honarvar

Keyword(s):

Machine Learning ◽

Automatic Identification ◽

Learning Approach ◽

Well Log ◽

Log Data ◽

Machine Learning Approach

Download Full-text

A Random Forest Machine Learning Approach for the Retrieval of Leaf Chlorophyll Content in Wheat

Remote Sensing ◽

10.3390/rs11080920 ◽

2019 ◽

Vol 11 (8) ◽

pp. 920 ◽

Cited By ~ 18

Author(s):

Syed Haleem Shah ◽

Yoseline Angel ◽

Rasmus Houborg ◽

Shawkat Ali ◽

Matthew F. McCabe

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Vegetation Indices ◽

Learning Approach ◽

Random Forest Regression ◽

Leaf Chlorophyll ◽

Machine Learning Approach ◽

Input Variables ◽

Non Destructive

Developing rapid and non-destructive methods for chlorophyll estimation over large spatial areas is a topic of much interest, as it would provide an indirect measure of plant photosynthetic response, be useful in monitoring soil nitrogen content, and offer the capacity to assess vegetation structural and functional dynamics. Traditional methods of direct tissue analysis or the use of handheld meters, are not able to capture chlorophyll variability at anything beyond point scales, so are not particularly useful for informing decisions on plant health and status at the field scale. Examining the spectral response of plants via remote sensing has shown much promise as a means to capture variations in vegetation properties, while offering a non-destructive and scalable approach to monitoring. However, determining the optimum combination of spectra or spectral indices to inform plant response remains an active area of investigation. Here, we explore the use of a machine learning approach to enhance the estimation of leaf chlorophyll (Chlt), defined as the sum of chlorophyll a and b, from spectral reflectance data. Using an ASD FieldSpec 4 Hi-Res spectroradiometer, 2700 individual leaf hyperspectral reflectance measurements were acquired from wheat plants grown across a gradient of soil salinity and nutrient levels in a greenhouse experiment. The extractable Chlt was determined from laboratory analysis of 270 collocated samples, each composed of three leaf discs. A random forest regression algorithm was trained against these data, with input predictors based upon (1) reflectance values from 2102 bands across the 400–2500 nm spectral range; and (2) 45 established vegetation indices. As a benchmark, a standard univariate regression analysis was performed to model the relationship between measured Chlt and the selected vegetation indices. Results show that the root mean square error (RMSE) was significantly reduced when using the machine learning approach compared to standard linear regression. When exploiting the entire spectral range of individual bands as input variables, the random forest estimated Chlt with an RMSE of 5.49 µg·cm−2 and an R2 of 0.89. Model accuracy was improved when using vegetation indices as input variables, producing an RMSE ranging from 3.62 to 3.91 µg·cm−2, depending on the particular combination of indices selected. In further analysis, input predictors were ranked according to their importance level, and a step-wise reduction in the number of input features (from 45 down to 7) was performed. Implementing this resulted in no significant effect on the RMSE, and showed that much the same prediction accuracy could be obtained by a smaller subset of indices. Importantly, the random forest regression approach identified many important variables that were not good predictors according to their linear regression statistics. Overall, the research illustrates the promise in using established vegetation indices as input variables in a machine learning approach for the enhanced estimation of Chlt from hyperspectral data.

Download Full-text

Cost-Sensitive Distributed Machine Learning for NetFlow-Based Botnet Activity Detection

Security and Communication Networks ◽

10.1155/2018/8753870 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Rafał Kozik ◽

Marek Pawlicki ◽

Michał Choraś

Keyword(s):

Machine Learning ◽

Big Data ◽

Machine Learning Techniques ◽

Learning Machines ◽

Imbalance Problem ◽

Learning Techniques ◽

Machine Learning Approach ◽

Big Data Technologies ◽

The Cost ◽

Distributed Machine Learning

The recent advancements of malevolent techniques have caused a situation where the traditional signature-based approach to cyberattack detection is rendered ineffective. Currently, new, improved, potent solutions incorporating Big Data technologies, effective distributed machine learning, and algorithms countering data imbalance problem are needed. Therefore, the major contribution of this paper is the proposal of the cost-sensitive distributed machine learning approach for cybersecurity. In particular, we proposed to use and implemented cost-sensitive distributed machine learning by means of distributed Extreme Learning Machines (ELM), distributed Random Forest, and Distributed Random Boosted-Trees to detect botnets. The system’s concept and architecture are based on the Big Data processing framework with data mining and machine learning techniques. In practical terms in this paper, as a use case, we consider the problem of botnet detection by means of analysing the data in form of NetFlows. The reported results are promising and show that the proposed system can be considered as a useful tool for the improvement of cybersecurity.

Download Full-text

Prediction of Glucose Metabolism Disorder Risk Using a Machine Learning Algorithm: Pilot Study (Preprint)

10.2196/preprints.10212 ◽

2018 ◽

Author(s):

Katsutoshi Maeta ◽

Yu Nishiyama ◽

Kazutoshi Fujibayashi ◽

Toshiaki Gunji ◽

Noriko Sasabe ◽

...

Keyword(s):

Machine Learning ◽

Glucose Metabolism ◽

Plasma Glucose ◽

Prediction Accuracy ◽

Learning Approach ◽

Classification Models ◽

Machine Learning Methods ◽

Machine Learning Approach ◽

Future Risk ◽

Input Variables

BACKGROUND A 75-g oral glucose tolerance test (OGTT) provides important information about glucose metabolism, although the test is expensive and invasive. Complete OGTT information, such as 1-hour and 2-hour postloading plasma glucose and immunoreactive insulin levels, may be useful for predicting the future risk of diabetes or glucose metabolism disorders (GMD), which includes both diabetes and prediabetes. OBJECTIVE We trained several classification models for predicting the risk of developing diabetes or GMD using data from thousands of OGTTs and a machine learning technique (XGBoost). The receiver operating characteristic (ROC) curves and their area under the curve (AUC) values for the trained classification models are reported, along with the sensitivity and specificity determined by the cutoff values of the Youden index. We compared the performance of the machine learning techniques with logistic regressions (LR), which are traditionally used in medical research studies. METHODS Data were collected from subjects who underwent multiple OGTTs during comprehensive check-up medical examinations conducted at a single facility in Tokyo, Japan, from May 2006 to April 2017. For each examination, a subject was diagnosed with diabetes or prediabetes according to the American Diabetes Association guidelines. Given the data, 2 studies were conducted: predicting the risk of developing diabetes (study 1) or GMD (study 2). For each study, to apply supervised machine learning methods, the required label data was prepared. If a subject was diagnosed with diabetes or GMD at least once during the period, then that subject’s data obtained in previous trials were classified into the risk group (y=1). After data processing, 13,581 and 6760 OGTTs were analyzed for study 1 and study 2, respectively. For each study, a randomly chosen subset representing 80% of the data was used for training 9 classification models and the remaining 20% was used for evaluating the models. Three classification models, A to C, used XGBoost with various input variables, some including OGTT data. The other 6 classification models, D to I, used LR for comparison. RESULTS For study 1, the AUC values ranged from 0.78 to 0.93. For study 2, the AUC values ranged from 0.63 to 0.78. The machine learning approach using XGBoost showed better performance compared with traditional LR methods. The AUC values increased when the full OGTT variables were included. In our analysis using a particular setting of input variables, XGBoost showed that the OGTT variables were more important than fasting plasma glucose or glycated hemoglobin. CONCLUSIONS A machine learning approach, XGBoost, showed better prediction accuracy compared with LR, suggesting that advanced machine learning methods are useful for detecting the early signs of diabetes or GMD. The prediction accuracy increased when all OGTT variables were added. This indicates that complete OGTT information is important for predicting the future risk of diabetes and GMD accurately.

Download Full-text

Linear and Nonlinear Controls of Wireline Logs on Automated Grain Size Estimation Using Machine Learning Approach

10.2118/205802-ms ◽

2021 ◽

Author(s):

Fatai Adesina Anifowose ◽

Saeed Saad Alshahrani ◽

Mokhles Mustafa Mezghani

Keyword(s):

Machine Learning ◽

Grain Size ◽

Gamma Ray ◽

Reservoir Rock ◽

Learning Approach ◽

Machine Learning Approach ◽

Wireline Logs ◽

Linear And Nonlinear ◽

Core Description ◽

Size Data

Abstract Wireline logs have been utilized to indirectly estimate various reservoir properties, such as porosity, permeability, saturation, cementation factor, and lithology. Attempts have been made to correlate Gamma-ray, density, neutron, spontaneous potential, and resistivity logs with lithology. The current approach to estimate grain size, the traditional core description, is time-consuming, labor-intensive, qualitative, and subjective. An alternative approach is essential given the utility of grain size in petrophysical characterization and identification of depositional environments. This paper proposes to fill the gap by studying the linear and nonlinear influences of wireline logs on reservoir rock grain size. We used the observed influences to develop and optimize respective linear and machine learning models to estimate reservoir rock grain size for a new well or targeted reservoir sections. The linear models comprised logistic regression and linear discriminant analysis while the machine learning method is random forest (RF). We will present the preliminary results comparing the linear and machine learning methods. We used anonymized wireline and archival core description datasets from nine wells in a clastic reservoir. Seven wells were used to train the models and the remaining two to test their classification performance. The grain size-types range from clay to granules. While sedimentologists have used gamma-ray logs to guide grain size qualification, the RF model recommended sonic, neutron, and density logs as having the most significant grain size in the nonlinear domain. The comparative results of the models' performance comparison showed that considering the subjectivity and bias associated with the visual core description approach, the RF model gave up to an 89% correct classification rate. This suggested looking beyond the linear influences of the wireline logs on reservoir rock grain size. The apparent relative stability of the RF model compared to the linear ones also confirms the feasibility of the machine learning approach. This is an acceptable and promising result. Future research will focus on conducting more rigorous quality checks on the grain size data, possibly introduce more heterogeneity, and explore more advanced algorithms. This will help to address the uncertainty in the grain size data more effectively and improve the models performance. The outcome of this study will reduce the limitations in the traditional core description and may eventually reduce the need for extensive core description processes.

Download Full-text

Machine learning approach to vulnerability detection in OAuth 2.0 authentication and authorization flow

International Journal of Information Security ◽

10.1007/s10207-021-00551-w ◽

2021 ◽

Author(s):

Kindson Munonye ◽

Martinek Péter

Keyword(s):

Machine Learning ◽

Web Applications ◽

Vulnerability Detection ◽

Learning Problem ◽

Performance Accuracy ◽

Machine Learning Approach ◽

Final Output ◽

Authentication And Authorization ◽

Exploratory Data ◽

The Relationship

AbstractTechnologies for integrating enterprise web applications have improved rapidly over the years. The OAuth framework provides authentication and authorization using the users’ profile and credentials in an existing identity provider. This makes it possible for attackers to exploit any vulnerability arising from exchange of data with the provider. Vulnerability in OAuth authorization flow allows an attacker to alter the normal flow sequence of the OAuth protocol. In this paper, a machine learning-based approach was applied in the detection of potential vulnerability in the OAuth authentication and authorization flow by analyzing the relationship between changes in the OAuth parameters and the final output. This research models the OAuth protocol as a supervised learning problem where seven classification models were developed, tuned and evaluated. Exploratory Data Analytics (EDA) techniques were applied in the extraction and analysis of specific OAuth features so that each output class could be evaluated to determine the effect of the identified OAuth features. The models developed in this research were trained, tuned and tested. A performance accuracy above 90% was attained for detection of vulnerabilities in the OAuth authentication and authorization flow. Comparison with known vulnerability resulted in a 54% match.

Download Full-text