Evaluation of Machine Learning Methods for Prediction of Multiphase Production Rates

Abstract Multiphase flow metering is an important tool for production monitoring and optimization. Although there are many technologies available on the market, the existing multiphase meters are only accurate to a certain extend and generally are expensive to purchase and maintain. Virtual flow metering (VFM) is a low-cost alternative to conventional production monitoring tools, which relies on mathematical modelling rather than the use of hardware instrumentation. Supported by the availability of the data from different sensors and production history, the development of different virtual flow metering systems has become a focal point for many companies. This paper discusses the importance of flow modelling for virtual flow metering. In addition, main data-driven algorithms are introduced for the analysis of several dynamic production data sets. Artificial Neural Networks (ANN) together with advanced machine learning methods such as GRU and XGBoost have been considered as possible candidates for virtual flow metering. The obtained results indicate that the machine learning algorithms estimate oil, gas and water rates with acceptable accuracy. The feasibility of the data-driven virtual metering approach for continuous production monitoring purposes has been demonstrated via a series of simulation-based cases. Amongst the used algorithms the deep learning methods provided the most accurate results combined with reasonable time for model training.

Download Full-text

Predicting rice blast disease: machine learning versus process-based models

BMC Bioinformatics ◽

10.1186/s12859-019-3065-1 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 1

Author(s):

David F. Nettleton ◽

Dimitrios Katsantonis ◽

Argyris Kalaitzidis ◽

Natasa Sarafijanovic-Djukic ◽

Pau Puigdollers ◽

...

Keyword(s):

Machine Learning ◽

Rice Blast ◽

Machine Learning Algorithms ◽

Rice Blast Disease ◽

Blast Disease ◽

Data Driven ◽

Learning Methods ◽

Machine Learning Methods ◽

Plant Disease Management ◽

Process Based Models

Abstract Background In this study, we compared four models for predicting rice blast disease, two operational process-based models (Yoshino and Water Accounting Rice Model (WARM)) and two approaches based on machine learning algorithms (M5Rules and Recurrent Neural Networks (RNN)), the former inducing a rule-based model and the latter building a neural network. In situ telemetry is important to obtain quality in-field data for predictive models and this was a key aspect of the RICE-GUARD project on which this study is based. According to the authors, this is the first time process-based and machine learning modelling approaches for supporting plant disease management are compared. Results Results clearly showed that the models succeeded in providing a warning of rice blast onset and presence, thus representing suitable solutions for preventive remedial actions targeting the mitigation of yield losses and the reduction of fungicide use. All methods gave significant “signals” during the “early warning” period, with a similar level of performance. M5Rules and WARM gave the maximum average normalized scores of 0.80 and 0.77, respectively, whereas Yoshino gave the best score for one site (Kalochori 2015). The best average values of r and r2 and %MAE (Mean Absolute Error) for the machine learning models were 0.70, 0.50 and 0.75, respectively and for the process-based models the corresponding values were 0.59, 0.40 and 0.82. Thus it has been found that the ML models are competitive with the process-based models. This result has relevant implications for the operational use of the models, since most of the available studies are limited to the analysis of the relationship between the model outputs and the incidence of rice blast. Results also showed that machine learning methods approximated the performances of two process-based models used for years in operational contexts. Conclusions Process-based and data-driven models can be used to provide early warnings to anticipate rice blast and detect its presence, thus supporting fungicide applications. Data-driven models derived from machine learning methods are a viable alternative to process-based approaches and – in cases when training datasets are available – offer a potentially greater adaptability to new contexts.

Download Full-text

Towards low-cost and high-performance air pollution measurements using machine learning calibration techniques

10.5194/amt-2020-473 ◽

2020 ◽

Author(s):

Peer Nowack ◽

Lev Konstantinovskiy ◽

Hannah Gardiner ◽

John Cant

Keyword(s):

Machine Learning ◽

Air Pollution ◽

Ridge Regression ◽

Low Cost ◽

Reference Signal ◽

Machine Learning Algorithms ◽

Public Health Issue ◽

Learning Methods ◽

Machine Learning Methods ◽

Calibration Techniques

Abstract. Air pollution is a key public health issue in urban areas worldwide. The development of low-cost air pollution sensors is consequently a major research priority. However, low-cost sensors often fail to attain sufficient measurement performance compared to state-of-the-art measurement stations, and typically require calibration procedures in expensive laboratory settings. As a result, there has been much debate about calibration techniques that could make their performance more reliable, while also developing calibration procedures that can be carried out without access to advanced laboratories. One repeatedly proposed strategy is low-cost sensor calibration through co-location with public measurement stations. The idea is that, using a regression function, the low-cost sensor signals can be calibrated against the station reference signal, to be then deployed separately with performances similar to the original stations. Here we test the idea of using machine learning algorithms for such regression tasks using hourly-averaged co-location data for nitrogen dioxide (NO2) and particulate matter of particle sizes smaller than 10 μm (PM10) at three different locations in the urban area of London, UK. Specifically, we compare the performance of Ridge regression, a linear statistical learning algorithm, to two non-linear algorithms in the form of Random Forest (RF) regression and Gaussian Process regression (GPR). We further benchmark the performance of all three machine learning methods to the more common Multiple Linear Regression (MLR). We obtain very good out-of-sample R2-scores (coefficient of determination) > 0.7, frequently exceeding 0.8, for the machine learning calibrated low-cost sensors. In contrast, the performance of MLR is more dependent on random variations in the sensor hardware and co-located signals, and is also more sensitive to the length of the co-location period. We find that, subject to certain conditions, GPR is typically the best performing method in our calibration setting, followed by Ridge regression and RF regression. However, we also highlight several key limitations of the machine learning methods, which will be crucial to consider in any co-location calibration. In particular, none of the methods is able to extrapolate to pollution levels well outside those encountered at training stage. Ultimately, this is one of the key limiting factors when sensors are deployed away from the co-location site itself. Consequently, we find that the linear Ridge method, which best mitigates such extrapolation effects, is typically performing as good as, or even better, than GPR after sensor re-location. Overall, our results highlight the potential of co-location methods paired with machine learning calibration techniques to reduce costs of air pollution measurements, subject to careful consideration of the co-location training conditions, the choice of calibration variables, and the features of the calibration algorithm.

Download Full-text

Data Driven Natural Gas Spot Price Prediction Models Using Machine Learning Methods

Energies ◽

10.3390/en12091680 ◽

2019 ◽

Vol 12 (9) ◽

pp. 1680 ◽

Cited By ~ 7

Author(s):

Moting Su ◽

Zongyi Zhang ◽

Ye Zhu ◽

Donglan Zha ◽

Wenying Wen

Keyword(s):

Machine Learning ◽

Natural Gas ◽

Machine Learning Algorithms ◽

Data Driven ◽

Spot Price ◽

Support Vector ◽

Price Forecasting ◽

Learning Methods ◽

Machine Learning Methods ◽

Gas Price

Natural gas has been proposed as a solution to increase the security of energy supply and reduce environmental pollution around the world. Being able to forecast natural gas price benefits various stakeholders and has become a very valuable tool for all market participants in competitive natural gas markets. Machine learning algorithms have gradually become popular tools for natural gas price forecasting. In this paper, we investigate data-driven predictive models for natural gas price forecasting based on common machine learning tools, i.e., artificial neural networks (ANN), support vector machines (SVM), gradient boosting machines (GBM), and Gaussian process regression (GPR). We harness the method of cross-validation for model training and monthly Henry Hub natural gas spot price data from January 2001 to October 2018 for evaluation. Results show that these four machine learning methods have different performance in predicting natural gas prices. However, overall ANN reveals better prediction performance compared with SVM, GBM, and GPR.

Download Full-text

Are Machine Learning Methods the Future for Smoking Cessation Apps?

Sensors ◽

10.3390/s21134254 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4254

Author(s):

Maryam Abo-Tabik ◽

Yael Benn ◽

Nicholas Costen

Keyword(s):

Machine Learning ◽

Smoking Cessation ◽

Low Cost ◽

Smoking Behaviour ◽

Ease Of Use ◽

Machine Learning Algorithms ◽

Dramatic Improvement ◽

Learning Methods ◽

Mobile Phone Technology ◽

Machine Learning Methods

Smoking cessation apps provide efficient, low-cost and accessible support to smokers who are trying to quit smoking. This article focuses on how up-to-date machine learning algorithms, combined with the improvement of mobile phone technology, can enhance our understanding of smoking behaviour and support the development of advanced smoking cessation apps. In particular, we focus on the pros and cons of existing approaches that have been used in the design of smoking cessation apps to date, highlighting the need to improve the performance of these apps by minimizing reliance on self-reporting of environmental conditions (e.g., location), craving status and/or smoking events as a method of data collection. Lastly, we propose that making use of more advanced machine learning methods while enabling the processing of information about the user’s circumstances in real time is likely to result in dramatic improvement in our understanding of smoking behaviour, while also increasing the effectiveness and ease-of-use of smoking cessation apps, by enabling the provision of timely, targeted and personalised intervention.

Download Full-text

Machine learning calibration of low-cost NO<sub>2</sub> and PM<sub>10</sub> sensors: non-linear algorithms and their impact on site transferability

Atmospheric Measurement Techniques ◽

10.5194/amt-14-5637-2021 ◽

2021 ◽

Vol 14 (8) ◽

pp. 5637-5655 ◽

Cited By ~ 1

Author(s):

Peer Nowack ◽

Lev Konstantinovskiy ◽

Hannah Gardiner ◽

John Cant

Keyword(s):

Machine Learning ◽

Air Pollution ◽

Ridge Regression ◽

Low Cost ◽

Machine Learning Algorithms ◽

Limiting Factors ◽

Learning Methods ◽

Machine Learning Methods ◽

Non Linear ◽

Linear Algorithms

Abstract. Low-cost air pollution sensors often fail to attain sufficient performance compared with state-of-the-art measurement stations, and they typically require expensive laboratory-based calibration procedures. A repeatedly proposed strategy to overcome these limitations is calibration through co-location with public measurement stations. Here we test the idea of using machine learning algorithms for such calibration tasks using hourly-averaged co-location data for nitrogen dioxide (NO2) and particulate matter of particle sizes smaller than 10 µm (PM10) at three different locations in the urban area of London, UK. We compare the performance of ridge regression, a linear statistical learning algorithm, to two non-linear algorithms in the form of random forest regression (RFR) and Gaussian process regression (GPR). We further benchmark the performance of all three machine learning methods relative to the more common multiple linear regression (MLR). We obtain very good out-of-sample R2 scores (coefficient of determination) >0.7, frequently exceeding 0.8, for the machine learning calibrated low-cost sensors. In contrast, the performance of MLR is more dependent on random variations in the sensor hardware and co-located signals, and it is also more sensitive to the length of the co-location period. We find that, subject to certain conditions, GPR is typically the best-performing method in our calibration setting, followed by ridge regression and RFR. We also highlight several key limitations of the machine learning methods, which will be crucial to consider in any co-location calibration. In particular, all methods are fundamentally limited in how well they can reproduce pollution levels that lie outside those encountered at training stage. We find, however, that the linear ridge regression outperforms the non-linear methods in extrapolation settings. GPR can allow for a small degree of extrapolation, whereas RFR can only predict values within the training range. This algorithm-dependent ability to extrapolate is one of the key limiting factors when the calibrated sensors are deployed away from the co-location site itself. Consequently, we find that ridge regression is often performing as good as or even better than GPR after sensor relocation. Our results highlight the potential of co-location approaches paired with machine learning calibration techniques to reduce costs of air pollution measurements, subject to careful consideration of the co-location training conditions, the choice of calibration variables and the features of the calibration algorithm.

Download Full-text

Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01403-2 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Alan Brnabic ◽

Lisa M. Hess

Keyword(s):

Machine Learning ◽

Decision Making ◽

Literature Review ◽

Systematic Literature Review ◽

Real World ◽

Learning Algorithms ◽

External Validation ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Machine Learning Methods

Abstract Background Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making. Methods This systematic literature review was conducted to identify published observational research of employed machine learning to inform decision making at the patient-provider level. The search strategy was implemented and studies meeting eligibility criteria were evaluated by two independent reviewers. Relevant data related to study design, statistical methods and strengths and limitations were identified; study quality was assessed using a modified version of the Luo checklist. Results A total of 34 publications from January 2014 to September 2020 were identified and evaluated for this review. There were diverse methods, statistical packages and approaches used across identified studies. The most common methods included decision tree and random forest approaches. Most studies applied internal validation but only two conducted external validation. Most studies utilized one algorithm, and only eight studies applied multiple machine learning algorithms to the data. Seven items on the Luo checklist failed to be met by more than 50% of published studies. Conclusions A wide variety of approaches, algorithms, statistical software, and validation strategies were employed in the application of machine learning methods to inform patient-provider decision making. There is a need to ensure that multiple machine learning approaches are used, the model selection strategy is clearly defined, and both internal and external validation are necessary to be sure that decisions for patient care are being made with the highest quality evidence. Future work should routinely employ ensemble methods incorporating multiple machine learning algorithms.

Download Full-text

Machine-learning based prediction of Cushing’s syndrome in dogs attending UK primary-care veterinary practice

Scientific Reports ◽

10.1038/s41598-021-88440-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Imogen Schofield ◽

David C. Brodbelt ◽

Noel Kennedy ◽

Stijn J. M. Niessen ◽

David B. Church ◽

...

Keyword(s):

Machine Learning ◽

Cushing’S Syndrome ◽

Clinical Decision Making ◽

Predictive Performance ◽

Clinical Decision ◽

Cushing's Syndrome ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Machine Learning Methods ◽

Clinical Records

AbstractCushing’s syndrome is an endocrine disease in dogs that negatively impacts upon the quality-of-life of affected animals. Cushing’s syndrome can be a challenging diagnosis to confirm, therefore new methods to aid diagnosis are warranted. Four machine-learning algorithms were applied to predict a future diagnosis of Cushing's syndrome, using structured clinical data from the VetCompass programme in the UK. Dogs suspected of having Cushing's syndrome were included in the analysis and classified based on their final reported diagnosis within their clinical records. Demographic and clinical features available at the point of first suspicion by the attending veterinarian were included within the models. The machine-learning methods were able to classify the recorded Cushing’s syndrome diagnoses, with good predictive performance. The LASSO penalised regression model indicated the best overall performance when applied to the test set with an AUROC = 0.85 (95% CI 0.80–0.89), sensitivity = 0.71, specificity = 0.82, PPV = 0.75 and NPV = 0.78. The findings of our study indicate that machine-learning methods could predict the future diagnosis of a practicing veterinarian. New approaches using these methods could support clinical decision-making and contribute to improved diagnosis of Cushing’s syndrome in dogs.

Download Full-text

Estimation of data-driven streamflow predicting models using machine learning methods

Arabian Journal of Geosciences ◽

10.1007/s12517-021-07446-z ◽

2021 ◽

Vol 14 (11) ◽

Author(s):

Tanveer Ahmed Siddiqi ◽

Saima Ashraf ◽

Sadiq Ali Khan ◽

Muhammad Jawed Iqbal

Keyword(s):

Machine Learning ◽

Data Driven ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Thermal Load Prediction of Communal District Heating Systems by Applying Data-Driven Machine Learning Methods

SSRN Electronic Journal ◽

10.2139/ssrn.3870973 ◽

2021 ◽

Author(s):

Nikolaos Panagiotis Sakkas ◽

Roger Abang

Keyword(s):

Machine Learning ◽

Thermal Load ◽

District Heating ◽

Data Driven ◽

Load Prediction ◽

Learning Methods ◽

Heating Systems ◽

Machine Learning Methods ◽

District Heating Systems

Download Full-text

Acoustic feature-based sentiment analysis of call center data

10.32469/10355/66751 ◽

2017 ◽

Author(s):

◽

Zeshan Peng

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Emotion Recognition ◽

Sentiment Analysis ◽

Call Center ◽

Machine Learning Algorithms ◽

Language Recognition ◽

Acoustic Features ◽

Learning Methods ◽

Machine Learning Methods

With the advancement of machine learning methods, audio sentiment analysis has become an active research area in recent years. For example, business organizations are interested in persuasion tactics from vocal cues and acoustic measures in speech. A typical approach is to find a set of acoustic features from audio data that can indicate or predict a customer's attitude, opinion, or emotion state. For audio signals, acoustic features have been widely used in many machine learning applications, such as music classification, language recognition, emotion recognition, and so on. For emotion recognition, previous work shows that pitch and speech rate features are important features. This thesis work focuses on determining sentiment from call center audio records, each containing a conversation between a sales representative and a customer. The sentiment of an audio record is considered positive if the conversation ended with an appointment being made, and is negative otherwise. In this project, a data processing and machine learning pipeline for this problem has been developed. It consists of three major steps: 1) an audio record is split into segments by speaker turns; 2) acoustic features are extracted from each segment; and 3) classification models are trained on the acoustic features to predict sentiment. Different set of features have been used and different machine learning methods, including classical machine learning algorithms and deep neural networks, have been implemented in the pipeline. In our deep neural network method, the feature vectors of audio segments are stacked in temporal order into a feature matrix, which is fed into deep convolution neural networks as input. Experimental results based on real data shows that acoustic features, such as Mel frequency cepstral coefficients, timbre and Chroma features, are good indicators for sentiment. Temporal information in an audio record can be captured by deep convolutional neural networks for improved prediction accuracy.

Download Full-text