Machine-learning algorithm for estimating oil-recovery factor using a combination of engineering and stratigraphic dependent parameters

Kachalla Aliyuda; John Howell

doi:10.1190/int-2018-0211.1

Machine-learning algorithm for estimating oil-recovery factor using a combination of engineering and stratigraphic dependent parameters

Interpretation ◽

10.1190/int-2018-0211.1 ◽

2019 ◽

Vol 7 (3) ◽

pp. SE151-SE159 ◽

Cited By ~ 1

Author(s):

Kachalla Aliyuda ◽

John Howell

Keyword(s):

Machine Learning ◽

Mean Square Error ◽

North Sea ◽

Recovery Factor ◽

Gaussian Kernel ◽

Support Vector ◽

Mean Square ◽

Data Set ◽

Wide Range ◽

Testing Set

The methods used to estimate recovery factor change through the life cycle of a field. During appraisal, prior to development when there are no production data, we typically rely on analog fields and empirical methods. Given the absence of a perfect analog, these methods are typically associated with a wide range of uncertainty. During plateau, recovery factors are typically associated with simulation and dynamic modeling, whereas in later field life, once the field drops off the plateau, a decline curve analysis is also used. The use of different methods during different stages of the field life leads to uncertainty and potential inconsistencies in recovery estimates. A wide range of interacting, partially related, reservoir and production variables controls the production and recovery factor. Machine learning allows more complex multivariate analysis that can be used to investigate the roles of these variables using a training data set and then to ultimately predict future performance in fields. To investigate this approach, we used a data set consisting of producing reservoirs all of which are at plateau or in decline to train a series of machine-learning algorithms that can potentially predict the recovery factor with minimal percentage error. The database for this study consists of categorical and numerical properties for 93 reservoirs from the Norwegian Continental Shelf. Of these, 75 are from the Norwegian Sea, the Norwegian North Sea, and the Barents Sea, whereas the remaining 18 reservoirs are from the Viking Graben in the UK sector of the North Sea. The data set was divided into training and testing sets: The training set comprised approximately 80% of the total data, and the remaining 20% was the testing set. Linear regression models and a support vector machine (SVM) models were trained with all parameters in the data set (30 parameters); then with the 16 most influential parameters in the data set, the performance of these models was compared from results of fivefold crossvalidation. SVM training using a combination of 16 geologic/engineering parameters models with Gaussian kernel function has a root-mean-square error of 0.12, mean square error of 0.01, and [Formula: see text]-squared of 0.76. This model was tested on 18 reservoirs from the testing set; the test results are very similar to crossvalidation results during models training phase, suggesting that this method can potentially be used to predict the future recovery factor.

Download Full-text

Evaluation of Machine Learning Approaches for Automated Diagnosis of COVID-19 using X-Ray images (Preprint)

10.2196/preprints.18947 ◽

2020 ◽

Author(s):

Mazin Mohammed ◽

Karrar Hameed Abdulkareem ◽

Mashael S. Maashi ◽

Salama A. Mostafa A. Mostafa ◽

Abdullah Baz ◽

...

Keyword(s):

Machine Learning ◽

Computational Method ◽

Learning Performance ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Approaches ◽

Data Set ◽

X Ray ◽

Wide Range ◽

Artificial Neural Network Ann

BACKGROUND In most recent times, global concern has been caused by a coronavirus (COVID19), which is considered a global health threat due to its rapid spread across the globe. Machine learning (ML) is a computational method that can be used to automatically learn from experience and improve the accuracy of predictions. OBJECTIVE In this study, the use of machine learning has been applied to Coronavirus dataset of 50 X-ray images to enable the development of directions and detection modalities with risk causes.The dataset contains a wide range of samples of COVID-19 cases alongside SARS, MERS, and ARDS. The experiment was carried out using a total of 50 X-ray images, out of which 25 images were that of positive COVIDE-19 cases, while the other 25 were normal cases. METHODS An orange tool has been used for data manipulation. To be able to classify patients as carriers of Coronavirus and non-Coronavirus carriers, this tool has been employed in developing and analysing seven types of predictive models. Models such as , artificial neural network (ANN), support vector machine (SVM), linear kernel and radial basis function (RBF), k-nearest neighbour (k-NN), Decision Tree (DT), and CN2 rule inducer were used in this study.Furthermore, the standard InceptionV3 model has been used for feature extraction target. RESULTS The various machine learning techniques that have been trained on coronavirus disease 2019 (COVID-19) dataset with improved ML techniques parameters. The data set was divided into two parts, which are training and testing. The model was trained using 70% of the dataset, while the remaining 30% was used to test the model. The results show that the improved SVM achieved a F1 of 97% and an accuracy of 98%. CONCLUSIONS :. In this study, seven models have been developed to aid the detection of coronavirus. In such cases, the learning performance can be improved through knowledge transfer, whereby time-consuming data labelling efforts are not required.the evaluations of all the models are done in terms of different parameters. it can be concluded that all the models performed well, but the SVM demonstrated the best result for accuracy metric. Future work will compare classical approaches with deep learning ones and try to obtain better results. CLINICALTRIAL None

Download Full-text

Development of Flood Forecasting System for Someshwari-Kangsa Sub-watershed of Bangladesh-India Using Different Machine Learning Techniques

10.5194/egusphere-egu21-15294 ◽

2021 ◽

Author(s):

Md Hamidul Haque ◽

Mushtari Sadia ◽

Mashiat Mustaq

Keyword(s):

Machine Learning ◽

Water Level ◽

Mean Square Error ◽

Lead Time ◽

Flood Forecasting ◽

Data Driven ◽

Coefficient Of Determination ◽

Support Vector ◽

Mean Square ◽

Physically Based

Floods are natural disasters caused mainly due to heavy or excessive rainfall. They induce massive economic losses in Bangladesh every year. Physically-based flood prediction models have been used over the years where simplified forms of physical laws are used to reduce calculations' complexity. It sometimes leads to oversimplification and inaccuracy in the prediction. Moreover, a physically-based model requires intensive monitoring datasets for calibration, accurate soil properties information, and a heavy computational facility, creating an impediment for quick, economical and precise short-term prediction. Researchers have tried different approaches like empirical data-driven models, especially machine learning-based models, to offer an alternative approach to the physically-based models but focused on developing only one machine learning (ML) technique at a time (i.e., ANN, MLP, etc.). There are many other techniques, algorithms, and models in machine learning (ML) technology that have the potential to be effective and efficient in flood forecasting. In this study, five different machine learning algorithms- exponent back propagation neural network (EBPNN), multilayer perceptron (MLP), support vector regression (SVR), DT Regression (DTR), and extreme gradient boosting (XGBoost) were used to develop total 180 independent models based on a different combination of time lags for input data and lead time in forecast. Models were developed for Someshwari-Kangsa sub-watershed of Bangladesh's North Central hydrological region with 5772 km2 drainage area. It is also a data-scarce region with only three hydrological and hydro-meteorological stations for the whole sub-watershed. This region mostly suffers extreme meteorological events driven flooding. Therefore, satellite-based precipitation, temperature, relative humidity, wind speed data, and observed water level data from the Bangladesh Water Development Board (BWDB) were used as input and response variables.For comparison, the accuracy of these models was evaluated using different statistical indices - coefficient of determination, mean square error (MSE), mean absolute error (MAE), mean relative error (MRE), explained variance score and normalized centred root mean square error (NCRMSE). Developed models were ranked based on the coefficient of determination (R2) value. All the models performed well with R2 being greater than 0.85 in most cases. Further analysis of the model results showed that most of the models performed well for forecasting 24-hour lead time water level. Models developed using XGBoost algorithm outperformed other models in all metrics. Moreover, each of the algorithms' best-performed models was extended further up to 20 days lead time to generate forecasting horizon. Models demonstrated remarkable consistency in their performance with the coefficient of determination (R2) being greater than 0.70 at 20 days lead-time of forecasting horizon in most cases except the DTR-based model. For 10- and 5-days lead time of forecasting horizon, it was greater than 0.75 and 0.80 respectively, for all the model extended. This study concludes that the machine algorithm-based data-driven model can be a powerful tool for flood forecasting in data-scarce regions with excellent accuracy, quick building and running time, and economic feasibility.

Download Full-text

PEMODELAN PREDIKTIF KONSUMSI ENERGI BANGUNAN GEDUNG KOMERSIAL DENGAN ALGORITMA SUPPORT VECTOR MACHINE

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v14i2.882 ◽

2018 ◽

Vol 14 (2) ◽

pp. 225

Author(s):

Indriyanti Indriyanti ◽

Agus Subekti

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Root Mean Square Error ◽

Mean Square Error ◽

Root Mean Square ◽

Mean Absolute Error ◽

Absolute Error ◽

Support Vector ◽

Mean Square

Konsumsi energi bangunan yang semakin meningkat mendorong para peneliti untuk membangun sebuah model prediksi dengan menerapkan metode machine learning, namun masih belum diketahui model yang paling akurat. Model prediktif untuk konsumsi energi bangunan komersial penting untuk konservasi energi. Dengan menggunakan model yang tepat, kita dapat membuat desain bangunan yang lebih efisien dalam penggunaan energi. Dalam tulisan ini, kami mengusulkan model prediktif berdasarkan metode pembelajaran mesin untuk mendapatkan model terbaik dalam memprediksi total konsumsi energi. Algoritma yang digunakan yaitu SMOreg dan LibSVM dari kelas Support Vector Machine, kemudian untuk evaluasi model berdasarkan nilai Mean Absolute Error dan Root Mean Square Error. Dengan menggunakan dataset publik yang tersedia, kami mengembangkan model berdasarkan pada mesin vektor pendukung untuk regresi. Hasil pengujian kedua algoritma tersebut diketahui bahwa algoritma SMOreg memiliki akurasi lebih baik karena memiliki nilai MAE dan RMSE sebesar 4,70 dan 10,15, sedangkan untuk model LibSVM memiliki nilai MAE dan RMSE sebesar 9,37 dan 14,45. Kami mengusulkan metode berdasarkan algoritma SMOreg karena kinerjanya lebih baik.

Download Full-text

Machine learning assisted Cameriere method for dental age estimation

BMC Oral Health ◽

10.1186/s12903-021-01996-0 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Shihui Shen ◽

Zihao Liu ◽

Jian Wang ◽

Linfeng Fan ◽

Fang Ji ◽

...

Keyword(s):

Machine Learning ◽

Mean Square Error ◽

Age Estimation ◽

Estimation Method ◽

Permanent Teeth ◽

Coefficient Of Determination ◽

Support Vector ◽

Mean Square ◽

Dental Age ◽

Dental Age Estimation

Abstract Background Recently, the dental age estimation method developed by Cameriere has been widely recognized and accepted. Although machine learning (ML) methods can improve the accuracy of dental age estimation, no machine learning research exists on the use of the Cameriere dental age estimation method, making this research innovative and meaningful. Aim The purpose of this research is to use 7 lower left permanent teeth and three models [random forest (RF), support vector machine (SVM), and linear regression (LR)] based on the Cameriere method to predict children's dental age, and compare with the Cameriere age estimation. Subjects and methods This was a retrospective study that collected and analyzed orthopantomograms of 748 children (356 females and 392 males) aged 5–13 years. Data were randomly divided into training and test datasets in an 80–20% proportion for the ML algorithms. The procedure, starting with randomly creating new training and test datasets, was repeated 20 times. 7 permanent developing teeth on the left mandible (except wisdom teeth) were recorded using the Cameriere method. Then, the traditional Cameriere formula and three models (RF, SVM, and LR) were used to estimate the dental age. The age prediction accuracy was measured by five indicators: the coefficient of determination (R2), mean error (ME), root mean square error (RMSE), mean square error (MSE), and mean absolute error (MAE). Results The research showed that the ML models have better accuracy than the traditional Cameriere formula. The ME, MAE, MSE, and RMSE values of the SVM model (0.004, 0.489, 0.392, and 0.625, respectively) and the RF model (− 0.004, 0.495, 0.389, and 0.623, respectively) were lower with the highest accuracy. In contrast, the ME, MAE, MSE and RMSE of the European Cameriere formula were 0.592, 0.846, 0.755, and 0.869, respectively, and those of the Chinese Cameriere formula were 0.748, 0.812, 0.890 and 0.943, respectively. Conclusions Compared to the Cameriere formula, ML methods based on the Cameriere’s maturation stages were more accurate in estimating dental age. These results support the use of ML algorithms instead of the traditional Cameriere formula.

Download Full-text

A Comparative Analysis and Predicting for Breast Cancer Detection Based on Data Mining Models

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v8i430209 ◽

2021 ◽

pp. 45-59

Author(s):

Shler Farhad Khorshid ◽

Adnan Mohsin Abdulazeez ◽

Amira Bibo Sallow

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Data Mining ◽

Nearest Neighbors ◽

Performance Comparison ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbors ◽

Data Set ◽

Wide Range

Breast cancer is one of the most common diseases among women, accounting for many deaths each year. Even though cancer can be treated and cured in its early stages, many patients are diagnosed at a late stage. Data mining is the method of finding or extracting information from massive databases or datasets, and it is a field of computer science with a lot of potentials. It covers a wide range of areas, one of which is classification. Classification may also be accomplished using a variety of methods or algorithms. With the aid of MATLAB, five classification algorithms were compared. This paper presents a performance comparison among the classifiers: Support Vector Machine (SVM), Logistics Regression (LR), K-Nearest Neighbors (K-NN), Weighted K-Nearest Neighbors (Weighted K-NN), and Gaussian Naïve Bayes (Gaussian NB). The data set was taken from UCI Machine learning Repository. The main objective of this study is to classify breast cancer women using the application of machine learning algorithms based on their accuracy. The results have revealed that Weighted K-NN (96.7%) has the highest accuracy among all the classifiers.

Download Full-text

Application of Machine Learning to a Medium Gaussian Support Vector Machine in the Diagnosis of Motor Bearing Faults

Electronics ◽

10.3390/electronics10182266 ◽

2021 ◽

Vol 10 (18) ◽

pp. 2266

Author(s):

Shih-Lin Lin

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Learning Algorithms ◽

Kernel Functions ◽

Fault Detection And Diagnosis ◽

Machine Learning Algorithms ◽

Gaussian Kernel ◽

Support Vector ◽

Data Set ◽

Motor Bearing

In recent years, artificial intelligence technology has been widely used in fault prediction and health management (PHM). The machine learning algorithm is widely used in the condition monitoring of rotating machines, and normal and fault data can be obtained through the data acquisition and monitoring system. After analyzing the data and establishing a model, the system can automatically learn the features from the input data to predict the failure of the maintenance and diagnosis equipment, which is important for motor maintenance. This research proposes a medium Gaussian support vector machine (SVM) method for the application of machine learning and constructs a feature space by extracting the characteristics of the vibration signal collected on the spot based on experience. Different methods were used to cluster and classify features to classify motor health. The influence of different Gaussian kernel functions, such as fine, medium, and coarse, on the performance of the SVM algorithm was analyzed. The experimental data verify the performance of various models through the data set released by the Case Western Reserve University Motor Bearing Data Center. As the motor often has noise interference in the actual application environment, a simulated Gaussian white noise was added to the original vibration data in order to verify the performance of the research method in a noisy environment. The results summarize the classification results of related motor data sets derived recently from the use of motor fault detection and diagnosis using different machine learning algorithms. The results show that the medium Gaussian SVM method improves the reliability and accuracy of motor bearing fault estimation, detection, and identification under variable crack-size and load conditions. This paper also provides a detailed discussion of the predictive analytical capabilities of machine learning algorithms, which can be used as a reference for the future motor predictive maintenance analysis of electric vehicles.

Download Full-text

Data-Driven Natural Gas Spot Price Forecasting with Least Squares Regression Boosting Algorithm

Energies ◽

10.3390/en12061094 ◽

2019 ◽

Vol 12 (6) ◽

pp. 1094 ◽

Cited By ~ 5

Author(s):

Moting Su ◽

Zongyi Zhang ◽

Ye Zhu ◽

Donglan Zha

Keyword(s):

Natural Gas ◽

Least Squares ◽

Mean Square Error ◽

Environmental Conservation ◽

Spot Price ◽

Support Vector ◽

Mean Square ◽

Least Squares Regression ◽

Spot Prices ◽

Wide Range

Natural gas is often described as the cleanest fossil fuel. The consumption of natural gas is increasing rapidly. Accurate prediction of natural gas spot prices would significantly benefit energy management, economic development, and environmental conservation. In this study, the least squares regression boosting (LSBoost) algorithm was used for forecasting natural gas spot prices. LSBoost can fit regression ensembles well by minimizing the mean squared error. Henry Hub natural gas spot prices were investigated, and a wide range of time series from January 2001 to December 2017 was selected. The LSBoost method is adopted to analyze data series at daily, weekly and monthly. An empirical study verified that the proposed prediction model has a high degree of fitting. Compared with some existing approaches such as linear regression, linear support vector machine (SVM), quadratic SVM, and cubic SVM, the proposed LSBoost-based model showed better performance such as a higher R-square and lower mean absolute error, mean square error, and root-mean-square error.

Download Full-text

Automated selection of mid-height intervertebral disc slice in traverse lumbar spine MRI using a combination of deep learning feature and machine learning classifier

PLoS ONE ◽

10.1371/journal.pone.0261659 ◽

2022 ◽

Vol 17 (1) ◽

pp. e0261659

Author(s):

Friska Natalia ◽

Julio Christian Young ◽

Nunik Afriliana ◽

Hira Meidia ◽

Reyhan Eddy Yunus ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Lumbar Spine ◽

Intervertebral Disc ◽

Classification Performance ◽

Image Features ◽

Gaussian Kernel ◽

Support Vector ◽

Wide Range ◽

Spine Mri

Abnormalities and defects that can cause lumbar spinal stenosis often occur in the Intervertebral Disc (IVD) of the patient’s lumbar spine. Their automatic detection and classification require an application of an image analysis algorithm on suitable images, such as mid-sagittal images or traverse mid-height intervertebral disc slices, as inputs. Hence the process of selecting and separating these images from other medical images in the patient’s set of scans is necessary. However, the technological progress in making this process automated is still lagging behind other areas in medical image classification research. In this paper, we report the result of our investigation on the suitability and performance of different approaches of machine learning to automatically select the best traverse plane that cuts closest to the half-height of an IVD from a database of lumbar spine MRI images. This study considers images features extracted using eleven different pre-trained Deep Convolution Neural Network (DCNN) models. We investigate the effectiveness of three dimensionality-reduction techniques and three feature-selection techniques on the classification performance. We also investigate the performance of five different Machine Learning (ML) algorithms and three Fully Connected (FC) neural network learning optimizers which are used to train an image classifier with hyperparameter optimization using a wide range of hyperparameter options and values. The different combinations of methods are tested on a publicly available lumbar spine MRI dataset consisting of MRI studies of 515 patients with symptomatic back pain. Our experiment shows that applying the Support Vector Machine algorithm with a short Gaussian kernel on full-length image features extracted using a pre-trained DenseNet201 model is the best approach to use. This approach gives the minimum per-class classification performance of around 0.88 when measured using the precision and recall metrics. The median performance measured using the precision metric ranges from 0.95 to 0.99 whereas that using the recall metric ranges from 0.93 to 1.0. When only considering the L3/L4, L4/L5, and L5/S1 classes, the minimum F1-Scores range between 0.93 to 0.95, whereas the median F1-Scores range between 0.97 to 0.99.

Download Full-text

Integrating water quality and streamflow into prediction of chemical dosage in a drinking water treatment plant using machine learning algorithms

Water Science & Technology Water Supply ◽

10.2166/ws.2021.435 ◽

2021 ◽

Author(s):

Hui Wang ◽

Tirusew Asefa ◽

Jack Thornburgh

Keyword(s):

Machine Learning ◽

Water Quality ◽

Drinking Water ◽

Water Treatment ◽

Mean Square Error ◽

Learning Algorithms ◽

Drinking Water Treatment ◽

Machine Learning Algorithms ◽

Support Vector ◽

Mean Square

Abstract Understanding the relationship between raw water quality and chemical dosage is especially important for drinking water treatment plants (DWTP) that have multiple water sources where the ratio of different supply sources could change with seasons or in a matter of weeks in response to changing hydrologic conditions. In this study, the potential for deploying machine learning algorithms, including principal component regression (PCR), support vector regression (SVR) and long short-term memory (LSTM) neural network, are tested to build predictive models. These tools were used to estimate chemical dosage at daily time scale. Influent water quality such as pH, color, turbidity, and alkalinity, as well as chemical dosage including sulfuric acid, ferric sulfate and liquid oxygen were used to build and test these models. An 80/20 percent data split was used for training and testing model performance using correlation coefficients, relative mean square error, relative root mean square error and Nash-Sutcliffe efficiency. Results indicate, compared to PCR, both SVR and LSTM, were able to capture the nonlinear relationship between chemical dose and source water quality changes and displayed higher predictive skills. These types of models have application in real-time operational support without requiring computationally expensive physics-based models.

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text