Machine Learning for Formation Tightness Prediction and Mobility Prediction

Abstract From the perspective of wireline formation test (WFT), formation tightness reflects the "speed" of pressure buildup while the pressure test is being conducted. We usually define a pressure test point that has a very low pressure-buildup speed as a tight point. The mobility derived from this kind of pressure point is usually less than 0.01md/cP; otherwise, the pressure points will be defined as valid points with valid formation pressure and mobility. Formation tightness reflects the formation permeability information and can be an indicator to estimate the difficulty of the WFT pumping and sampling operation. Mobility, as compared to permeability, reflects the dynamic supply capacity of the formation. A rapid and good mobility prediction based on petrophysical logging can not only directly provide valid formation productivity but can also evaluate the feasibility of the WFT and doing optimization work in advance. Compared to a time-consuming and costly drillstem test (DST) operation, the WFT is the most efficient and cost-saving method to confirm hydrocarbon presence. However, the success rate of WFT sampling operations in the deep Kuqa formation is less than 50% overall, mostly due to the formation tightness exceeding the capability of the tools. Therefore, a rapid mobility evaluation is necessary to meet WFT feasibility analysis. As companion work to a previous WFT optimization study(SPE-195932-MS), we further studied and discuss the machine learning for mobility prediction. In the previous study, we formed a mobility prediction workflow by doing a statistical analysis of more than 1000 pressure test points with several statistical mathematic methods, such as univariate linear regression (ULR), multivariate linear regression (MLR), neural network regression analysis (NNA), and decision tree classification analysis (DTA) methods. In this paper, the methods and principles of machine learning are expounded. A series of machine learning methods were tested. The algorithms that are appropriate for these specific data set were selected. Includes DTA, discriminant analysis (DA), logistic regression, support vector machine (SVM), K-nearest neighbor (KNN) for formation tightness prediction and linear regression, DTA, SVM, Gaussian process regression SVM, random tree, neural network analysis for mobility prediction. Contrastive analysis reveals that: The SVM classifier has the best result over other methods for formation tightness probability prediction. Based on R squared and RMSE analysis, linear regression, GPR, and NNA delivered relatively good results compared with other mobility prediction methods. An optimized data processing workflow was proposed, and it delivered a better result than the workflow proposed in SPE-195932-MS under the same training and testing dataset condition. The comparison between measured mobility and predicted mobility results reveals that, in most situations, the predicted mobility and measured mobility matched very well with each other. WFT were conducted in newly drilled wells. Sampling success rate also achieved 100% in these wells by optimizing the WFT tool string and sampling stations selection in advance, and NPT is significantly reduced.

Download Full-text

Improving Machine Learning Identification of Unsafe Driver Behavior by Means of Sensor Fusion

Applied Sciences ◽

10.3390/app10186417 ◽

2020 ◽

Vol 10 (18) ◽

pp. 6417 ◽

Cited By ~ 1

Author(s):

Emanuele Lattanzi ◽

Giacomo Castellucci ◽

Valerio Freschi

Keyword(s):

Neural Network ◽

Machine Learning ◽

Driver Behavior ◽

Ground Truth ◽

Support Vector ◽

Svm Classifier ◽

Learning Technology ◽

Average Accuracy ◽

Unsafe Behaviors ◽

Vehicle Sensors

Most road accidents occur due to human fatigue, inattention, or drowsiness. Recently, machine learning technology has been successfully applied to identifying driving styles and recognizing unsafe behaviors starting from in-vehicle sensors signals such as vehicle and engine speed, throttle position, and engine load. In this work, we investigated the fusion of different external sensors, such as a gyroscope and a magnetometer, with in-vehicle sensors, to increase machine learning identification of unsafe driver behavior. Starting from those signals, we computed a set of features capable to accurately describe the behavior of the driver. A support vector machine and an artificial neural network were then trained and tested using several features calculated over more than 200 km of travel. The ground truth used to evaluate classification performances was obtained by means of an objective methodology based on the relationship between speed, and lateral and longitudinal acceleration of the vehicle. The classification results showed an average accuracy of about 88% using the SVM classifier and of about 90% using the neural network demonstrating the potential capability of the proposed methodology to identify unsafe driver behaviors.

Download Full-text

Deep Learning Assisted Neonatal Cry Classification via Support Vector Machine Models

Frontiers in Public Health ◽

10.3389/fpubh.2021.670352 ◽

2021 ◽

Vol 9 ◽

Author(s):

Ashwini K ◽

P. M. Durai Raj Vincent ◽

Kathiravan Srinivasan ◽

Chuan-Yu Chang

Keyword(s):

Neural Network ◽

Machine Learning ◽

Support Vector Machine ◽

Feature Extraction ◽

Deep Learning ◽

Convolutional Neural Network ◽

Support Vector ◽

Svm Classifier ◽

Infant Cry ◽

Learning Techniques

Neonatal infants communicate with us through cries. The infant cry signals have distinct patterns depending on the purpose of the cries. Preprocessing, feature extraction, and feature selection need expert attention and take much effort in audio signals in recent days. In deep learning techniques, it automatically extracts and selects the most important features. For this, it requires an enormous amount of data for effective classification. This work mainly discriminates the neonatal cries into pain, hunger, and sleepiness. The neonatal cry auditory signals are transformed into a spectrogram image by utilizing the short-time Fourier transform (STFT) technique. The deep convolutional neural network (DCNN) technique takes the spectrogram images for input. The features are obtained from the convolutional neural network and are passed to the support vector machine (SVM) classifier. Machine learning technique classifies neonatal cries. This work combines the advantages of machine learning and deep learning techniques to get the best results even with a moderate number of data samples. The experimental result shows that CNN-based feature extraction and SVM classifier provides promising results. While comparing the SVM-based kernel techniques, namely radial basis function (RBF), linear and polynomial, it is found that SVM-RBF provides the highest accuracy of kernel-based infant cry classification system provides 88.89% accuracy.

Download Full-text

A comparative study of multiple linear regression, artificial neural network and support vector machine for the prediction of dissolved oxygen

Hydrology Research ◽

10.2166/nh.2016.149 ◽

2016 ◽

Vol 48 (5) ◽

pp. 1214-1225 ◽

Cited By ~ 22

Author(s):

Xue Li ◽

Jian Sha ◽

Zhong-liang Wang

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Linear Regression ◽

Dissolved Oxygen ◽

Multiple Linear Regression ◽

Coefficient Of Determination ◽

Support Vector ◽

Physical Factors ◽

Data Set ◽

Important Indicator

Dissolved oxygen (DO) is an important indicator reflecting the healthy state of aquatic ecosystems. The balance between oxygen supply and consuming in the water body is significantly influenced by physical and chemical parameters. This study aimed to evaluate and compare the performance of multiple linear regression (MLR), back propagation neural network (BPNN), and support vector machine (SVM) for the prediction of DO concentration based on multiple water quality parameters. The data set included 969 samples collected from rivers in China and the 16 predicted variables involved physical factors, nutrients, organic substances, and metal ions, which would affect the DO concentrations directly or indirectly by influencing the water–air exchange, the growth of water plants, and the lives of aquatic animals. The models optimized by particle swarm optimization (PSO) algorithm were calibrated and tested, with nearly 80% and 20% data, respectively. The results showed that the PSO-BPNN and PSO-SVM had better predicted performances than linear regression methods. All of the evaluated criteria, including coefficient of determination, mean squared error, and absolute relative errors suggested that the PSO-SVM model was superior to the MLR and PSO-BPNN for DO prediction in the rivers of China with limited knowledge of other information.

Download Full-text

Retrieval of aerosol optical depth from surface solar radiation measurements using machine learning algorithms, non-linear regression and a radiative transfer-based look-up table

Atmospheric Chemistry and Physics ◽

10.5194/acp-16-8181-2016 ◽

2016 ◽

Vol 16 (13) ◽

pp. 8181-8191 ◽

Cited By ~ 10

Author(s):

Jani Huttunen ◽

Harri Kokkola ◽

Tero Mielonen ◽

Mika Esa Juhani Mononen ◽

Antti Lipponen ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Support Vector Machine ◽

Linear Regression ◽

Support Vector ◽

Learning Methods ◽

Surface Solar Radiation ◽

Machine Learning Methods ◽

Look Up Table ◽

Non Linear

Abstract. In order to have a good estimate of the current forcing by anthropogenic aerosols, knowledge on past aerosol levels is needed. Aerosol optical depth (AOD) is a good measure for aerosol loading. However, dedicated measurements of AOD are only available from the 1990s onward. One option to lengthen the AOD time series beyond the 1990s is to retrieve AOD from surface solar radiation (SSR) measurements taken with pyranometers. In this work, we have evaluated several inversion methods designed for this task. We compared a look-up table method based on radiative transfer modelling, a non-linear regression method and four machine learning methods (Gaussian process, neural network, random forest and support vector machine) with AOD observations carried out with a sun photometer at an Aerosol Robotic Network (AERONET) site in Thessaloniki, Greece. Our results show that most of the machine learning methods produce AOD estimates comparable to the look-up table and non-linear regression methods. All of the applied methods produced AOD values that corresponded well to the AERONET observations with the lowest correlation coefficient value being 0.87 for the random forest method. While many of the methods tended to slightly overestimate low AODs and underestimate high AODs, neural network and support vector machine showed overall better correspondence for the whole AOD range. The differences in producing both ends of the AOD range seem to be caused by differences in the aerosol composition. High AODs were in most cases those with high water vapour content which might affect the aerosol single scattering albedo (SSA) through uptake of water into aerosols. Our study indicates that machine learning methods benefit from the fact that they do not constrain the aerosol SSA in the retrieval, whereas the LUT method assumes a constant value for it. This would also mean that machine learning methods could have potential in reproducing AOD from SSR even though SSA would have changed during the observation period.

Download Full-text

Global Warming Prediction in India using Machine Learning

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1301.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 4061-4065

Keyword(s):

Machine Learning ◽

Global Warming ◽

Linear Regression ◽

Greenhouse Gases ◽

Learning Algorithm ◽

Regression Tree ◽

Global Temperature ◽

Support Vector ◽

Data Set ◽

Average Annual Temperature

Long term global warming prediction can be of major importance in various sectors like climate related studies, agricultural, energy, medical and many more. This paper evaluates the performance of several Machine Learning algorithm (Linear Regression, Multi-Regression tree, Support Vector Regression (SVR), lasso) in problem of annual global warming prediction, from previous measured values over India. The first challenge dwells on creating a reliable, efficient statistical reliable data model on large data set and accurately capture relationship between average annual temperature and potential factors such as concentration of carbon dioxide, methane, nitrous oxide. The data is predicted and forecasted by linear regression because it is obtaining the highest accuracy for greenhouse gases and temperature among all the technologies which can be used. It was also found that CO2 is the plays the role of major contributor temperature change, followed by CH4, then by N20. After seeing the analysed and predicted data of the greenhouse gases and temperature, the global warming can be reduced comparatively within few years. The reduction of global temperature can help the whole world because not only human but also different animals are suffering from the global temperature.

Download Full-text

Author identification for Under-Resourced language (KadazanDusun)

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v17.i1.pp248-255 ◽

2020 ◽

Vol 17 (1) ◽

pp. 248 ◽

Cited By ~ 1

Author(s):

Nursyahirah Tarmizi ◽

Suhaila Saee ◽

Dayang Hanani Abang Ibrahim

Keyword(s):

Machine Learning ◽

Naive Bayes ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector ◽

Svm Classifier ◽

Identification Task ◽

Data Set ◽

Short Text ◽

Author Identification

<span>This paper presents the task of Author Identification for KadazanDusun language by using tweets as the source of data to perform Author Identification task of short text on KadazanDusun, which is considered as one the under-resourced language in Malaysia. The aim of this paper is to demonstrate Author Identification of short text on KadazanDusun. Besides, this paper also examines the performance of two machine learning algorithms on the KadazanDusun data set by analyzing the stylometric features. Stylometric features are used to quantify the writing styles of the authors which includes character n-grams and word n-grams. The workflow of Author Identification implements the machine learning approach to solve the single-labelled multi-class problem and predict the author of a given message in KadazanDusun. Two classifiers are used to compare the accuracy including Naïve Bayes and Support Vector Machine (SVM). The results show that the combination of n-grams which is word-level unigram and {1-5}-grams with character 3-grams are the most relevant stylometric features in identifying the author of KadazanDusun message with an accuracy of 80.17%. The results also show that SVM classifier has outperformed Naive Bayes in this Author Identification task with the accuracy of 80.17%.</span>

Download Full-text

Using Support Vector Machine Detection of Breast Cancer in Early stage

International Journal for Research in Engineering Application & Management ◽

10.35291/2454-9150.2020.0465 ◽

2020 ◽

pp. 213-216

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Support Vector Machine ◽

Early Stage ◽

Breast Cancer Diagnosis ◽

Support Vector ◽

Svm Classifier ◽

K Nearest Neighbors ◽

Data Set ◽

Sensitivity Specificity

The Breast Cancer is disease which tremendously increased in women’s nowadays. Mammography is technique of low-powered X-ray diagnosis approach for detection and diagnosis of cancer diseases at early stage. The proposed system shows the solution of two problems. First shows to detect tumors as suspicious regions with a weak contrast to their background and second shows way to extract features which categorize tumors. Hence this classification can be done with SVM, a great method of statistical learning has made significant achievement in various field. Discovered in the early 90’s, which led to an interest in machine learning? Here the different types of tumor like Benign, Malignant, or Normal image are classified using the SVM classifier. This techniques shows how easily we can detect region of tumor is present in mammogram images with more than 80% of accuracy rates for linear classification using SVM. The 10-fold cross validation to get an accurate outcome is been used by proposed system. The Wisconsin breast cancer diagnosis data set is referred from UCI machine learning repository. The considering accuracy, sensitivity, specificity, false discovery rate, false omission rate and Matthews’s correlation coefficient is appraised in the proposed system. This Provides good result for both training and testing phase. The techniques also shows accuracy of 98.57% and 97.14% by use of Support Vector Machine and K-Nearest Neighbors

Download Full-text

Analysis of Machine Learning Algorithms for Diagnosis of Diffuse Lung Diseases

Methods of Information in Medicine ◽

10.1055/s-0039-1681086 ◽

2018 ◽

Vol 57 (05/06) ◽

pp. 272-279 ◽

Cited By ~ 1

Author(s):

Isadora Cardoso ◽

Eliana Almeida ◽

Hector Allende-Cid ◽

Alejandro Frery ◽

Rangaraj Rangayyan ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Lung Diseases ◽

Gaussian Mixture ◽

Machine Learning Algorithms ◽

Support Vector ◽

Data Set ◽

Linear Discriminant ◽

Diffuse Lung Diseases ◽

Diffuse Lung

Computational Intelligence Re-meets Medical Image Processing A Comparison of Some Nature-Inspired Optimization Metaheuristics Applied in Biomedical Image Registration Background Diffuse lung diseases (DLDs) are a diverse group of pulmonary disorders, characterized by inflammation of lung tissue, which may lead to permanent loss of the ability to breathe and death. Distinguishing among these diseases is challenging to physicians due their wide variety and unknown causes. Computer-aided diagnosis (CAD) is a useful approach to improve diagnostic accuracy, by combining information provided by experts with Machine Learning (ML) methods. Objectives Exploring the potential of dimensionality reduction combined with ML methods for diagnosis of DLDs; improving the classification accuracy over state-of-the-art methods. Methods A data set composed of 3252 regions of interest (ROIs) was used, from which 28 features were extracted per ROI. We used Principal Component Analysis, Linear Discriminant Analysis, and Stepwise Selection – Forward, Backward, and Forward-Backward to reduce feature dimensionality. The feature subsets obtained were used as input to the following ML methods: Support Vector Machine, Gaussian Mixture Model, k-Nearest Neighbor, and Deep Feedforward Neural Network. We also applied a Deep Convolutional Neural Network directly to the ROIs. Results We achieved the maximum reduction from 28 to 5 dimensions using LDA. The best classification results were obtained by DFNN, with 99.60% of overall accuracy. Conclusions This work contributes to the analysis and selection of features that can efficiently characterize the DLDs studied.

Download Full-text

Prediction of the Temperature of Liquid Aluminum and the Dissolved Hydrogen Content in Liquid Aluminum with a Machine Learning Approach

Metals ◽

10.3390/met10030330 ◽

2020 ◽

Vol 10 (3) ◽

pp. 330 ◽

Cited By ~ 1

Author(s):

Moon-Jo Kim ◽

Jong Pil Yun ◽

Ji-Ba-Reum Yang ◽

Seung-Jun Choi ◽

DongEung Kim

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Hydrogen Content ◽

Liquid Aluminum ◽

Support Vector ◽

Learning Models ◽

Data Set ◽

Window Method ◽

Dissolved Hydrogen ◽

Machine Learning Models

In aluminum casting, the temperature of liquid aluminum and the dissolved hydrogen density are crucial factors to be controlled for the purpose of both quality control of molten metal and cost efficiency. However, the empirical and numerical approaches to predict these parameters are quite complex and time consuming, and it is necessary to develop an alternative method for rapid prediction with a small number of experiments. In this study, the machine learning models were developed to predict the temperature of liquid aluminum and the dissolved hydrogen content in liquid aluminum. The obtained experimental data was preprocessed to be used for constructing the machine learning models by the sliding time window method. The machine learning models of linear regression, regression tree, Gaussian process regression (GPR), Support vector machine (SVM), and ensembles of regression trees were compared to find the model with the highest performance to predict the target properties. For the prediction of the temperature of liquid aluminum and the dissolved hydrogen content in liquid aluminum, the linear regression and GPR models were selected with the high accuracy of prediction, respectively. In comparison to the numerical modeling, the machine learning modeling had better performance, and was more effective for predicting the target property even with the limited data set when the characteristics of the data were properly considered in data preprocessing.

Download Full-text

Predicting Future Products Rate using Machine Learning Algorithms

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2020.05.04 ◽

2020 ◽

Vol 12 (5) ◽

pp. 41-51

Author(s):

Shaimaa Mahmoud ◽

◽

Mahmoud Hussein ◽

Arabi Keshk

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Mean Squared Error ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Regression ◽

Data Set ◽

Squared Error

Opinion mining in social networks data is considered as one of most important research areas because a large number of users interact with different topics on it. This paper discusses the problem of predicting future products rate according to users’ comments. Researchers interacted with this problem by using machine learning algorithms (e.g. Logistic Regression, Random Forest Regression, Support Vector Regression, Simple Linear Regression, Multiple Linear Regression, Polynomial Regression and Decision Tree). However, the accuracy of these techniques still needs to be improved. In this study, we introduce an approach for predicting future products rate using LR, RFR, and SVR. Our data set consists of tweets and its rate from 1:5. The main goal of our approach is improving the prediction accuracy about existing techniques. SVR can predict future product rate with a Mean Squared Error (MSE) of 0.4122, Linear Regression model predict with a Mean Squared Error of 0.4986 and Random Forest Regression can predict with a Mean Squared Error of 0.4770. This is better than the existing approaches accuracy.

Download Full-text