Towards low-cost and high-performance air pollution measurements using machine learning calibration techniques

Abstract. Air pollution is a key public health issue in urban areas worldwide. The development of low-cost air pollution sensors is consequently a major research priority. However, low-cost sensors often fail to attain sufficient measurement performance compared to state-of-the-art measurement stations, and typically require calibration procedures in expensive laboratory settings. As a result, there has been much debate about calibration techniques that could make their performance more reliable, while also developing calibration procedures that can be carried out without access to advanced laboratories. One repeatedly proposed strategy is low-cost sensor calibration through co-location with public measurement stations. The idea is that, using a regression function, the low-cost sensor signals can be calibrated against the station reference signal, to be then deployed separately with performances similar to the original stations. Here we test the idea of using machine learning algorithms for such regression tasks using hourly-averaged co-location data for nitrogen dioxide (NO2) and particulate matter of particle sizes smaller than 10 μm (PM10) at three different locations in the urban area of London, UK. Specifically, we compare the performance of Ridge regression, a linear statistical learning algorithm, to two non-linear algorithms in the form of Random Forest (RF) regression and Gaussian Process regression (GPR). We further benchmark the performance of all three machine learning methods to the more common Multiple Linear Regression (MLR). We obtain very good out-of-sample R2-scores (coefficient of determination) > 0.7, frequently exceeding 0.8, for the machine learning calibrated low-cost sensors. In contrast, the performance of MLR is more dependent on random variations in the sensor hardware and co-located signals, and is also more sensitive to the length of the co-location period. We find that, subject to certain conditions, GPR is typically the best performing method in our calibration setting, followed by Ridge regression and RF regression. However, we also highlight several key limitations of the machine learning methods, which will be crucial to consider in any co-location calibration. In particular, none of the methods is able to extrapolate to pollution levels well outside those encountered at training stage. Ultimately, this is one of the key limiting factors when sensors are deployed away from the co-location site itself. Consequently, we find that the linear Ridge method, which best mitigates such extrapolation effects, is typically performing as good as, or even better, than GPR after sensor re-location. Overall, our results highlight the potential of co-location methods paired with machine learning calibration techniques to reduce costs of air pollution measurements, subject to careful consideration of the co-location training conditions, the choice of calibration variables, and the features of the calibration algorithm.

Download Full-text

Machine learning calibration of low-cost NO<sub>2</sub> and PM<sub>10</sub> sensors: non-linear algorithms and their impact on site transferability

Atmospheric Measurement Techniques ◽

10.5194/amt-14-5637-2021 ◽

2021 ◽

Vol 14 (8) ◽

pp. 5637-5655 ◽

Cited By ~ 1

Author(s):

Peer Nowack ◽

Lev Konstantinovskiy ◽

Hannah Gardiner ◽

John Cant

Keyword(s):

Machine Learning ◽

Air Pollution ◽

Ridge Regression ◽

Low Cost ◽

Machine Learning Algorithms ◽

Limiting Factors ◽

Learning Methods ◽

Machine Learning Methods ◽

Non Linear ◽

Linear Algorithms

Abstract. Low-cost air pollution sensors often fail to attain sufficient performance compared with state-of-the-art measurement stations, and they typically require expensive laboratory-based calibration procedures. A repeatedly proposed strategy to overcome these limitations is calibration through co-location with public measurement stations. Here we test the idea of using machine learning algorithms for such calibration tasks using hourly-averaged co-location data for nitrogen dioxide (NO2) and particulate matter of particle sizes smaller than 10 µm (PM10) at three different locations in the urban area of London, UK. We compare the performance of ridge regression, a linear statistical learning algorithm, to two non-linear algorithms in the form of random forest regression (RFR) and Gaussian process regression (GPR). We further benchmark the performance of all three machine learning methods relative to the more common multiple linear regression (MLR). We obtain very good out-of-sample R2 scores (coefficient of determination) >0.7, frequently exceeding 0.8, for the machine learning calibrated low-cost sensors. In contrast, the performance of MLR is more dependent on random variations in the sensor hardware and co-located signals, and it is also more sensitive to the length of the co-location period. We find that, subject to certain conditions, GPR is typically the best-performing method in our calibration setting, followed by ridge regression and RFR. We also highlight several key limitations of the machine learning methods, which will be crucial to consider in any co-location calibration. In particular, all methods are fundamentally limited in how well they can reproduce pollution levels that lie outside those encountered at training stage. We find, however, that the linear ridge regression outperforms the non-linear methods in extrapolation settings. GPR can allow for a small degree of extrapolation, whereas RFR can only predict values within the training range. This algorithm-dependent ability to extrapolate is one of the key limiting factors when the calibrated sensors are deployed away from the co-location site itself. Consequently, we find that ridge regression is often performing as good as or even better than GPR after sensor relocation. Our results highlight the potential of co-location approaches paired with machine learning calibration techniques to reduce costs of air pollution measurements, subject to careful consideration of the co-location training conditions, the choice of calibration variables and the features of the calibration algorithm.

Download Full-text

Evaluation of Machine Learning Methods for Prediction of Multiphase Production Rates

10.2118/208648-ms ◽

2021 ◽

Author(s):

Anton Gryzlov ◽

Liliya Mironova ◽

Sergey Safonov ◽

Muhammad Arsalan

Keyword(s):

Machine Learning ◽

Focal Point ◽

Low Cost ◽

Continuous Production ◽

Machine Learning Algorithms ◽

Data Driven ◽

Learning Methods ◽

Machine Learning Methods ◽

Flow Metering ◽

Production Monitoring

Abstract Multiphase flow metering is an important tool for production monitoring and optimization. Although there are many technologies available on the market, the existing multiphase meters are only accurate to a certain extend and generally are expensive to purchase and maintain. Virtual flow metering (VFM) is a low-cost alternative to conventional production monitoring tools, which relies on mathematical modelling rather than the use of hardware instrumentation. Supported by the availability of the data from different sensors and production history, the development of different virtual flow metering systems has become a focal point for many companies. This paper discusses the importance of flow modelling for virtual flow metering. In addition, main data-driven algorithms are introduced for the analysis of several dynamic production data sets. Artificial Neural Networks (ANN) together with advanced machine learning methods such as GRU and XGBoost have been considered as possible candidates for virtual flow metering. The obtained results indicate that the machine learning algorithms estimate oil, gas and water rates with acceptable accuracy. The feasibility of the data-driven virtual metering approach for continuous production monitoring purposes has been demonstrated via a series of simulation-based cases. Amongst the used algorithms the deep learning methods provided the most accurate results combined with reasonable time for model training.

Download Full-text

Are Machine Learning Methods the Future for Smoking Cessation Apps?

Sensors ◽

10.3390/s21134254 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4254

Author(s):

Maryam Abo-Tabik ◽

Yael Benn ◽

Nicholas Costen

Keyword(s):

Machine Learning ◽

Smoking Cessation ◽

Low Cost ◽

Smoking Behaviour ◽

Ease Of Use ◽

Machine Learning Algorithms ◽

Dramatic Improvement ◽

Learning Methods ◽

Mobile Phone Technology ◽

Machine Learning Methods

Smoking cessation apps provide efficient, low-cost and accessible support to smokers who are trying to quit smoking. This article focuses on how up-to-date machine learning algorithms, combined with the improvement of mobile phone technology, can enhance our understanding of smoking behaviour and support the development of advanced smoking cessation apps. In particular, we focus on the pros and cons of existing approaches that have been used in the design of smoking cessation apps to date, highlighting the need to improve the performance of these apps by minimizing reliance on self-reporting of environmental conditions (e.g., location), craving status and/or smoking events as a method of data collection. Lastly, we propose that making use of more advanced machine learning methods while enabling the processing of information about the user’s circumstances in real time is likely to result in dramatic improvement in our understanding of smoking behaviour, while also increasing the effectiveness and ease-of-use of smoking cessation apps, by enabling the provision of timely, targeted and personalised intervention.

Download Full-text

Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01403-2 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Alan Brnabic ◽

Lisa M. Hess

Keyword(s):

Machine Learning ◽

Decision Making ◽

Literature Review ◽

Systematic Literature Review ◽

Real World ◽

Learning Algorithms ◽

External Validation ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Machine Learning Methods

Abstract Background Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making. Methods This systematic literature review was conducted to identify published observational research of employed machine learning to inform decision making at the patient-provider level. The search strategy was implemented and studies meeting eligibility criteria were evaluated by two independent reviewers. Relevant data related to study design, statistical methods and strengths and limitations were identified; study quality was assessed using a modified version of the Luo checklist. Results A total of 34 publications from January 2014 to September 2020 were identified and evaluated for this review. There were diverse methods, statistical packages and approaches used across identified studies. The most common methods included decision tree and random forest approaches. Most studies applied internal validation but only two conducted external validation. Most studies utilized one algorithm, and only eight studies applied multiple machine learning algorithms to the data. Seven items on the Luo checklist failed to be met by more than 50% of published studies. Conclusions A wide variety of approaches, algorithms, statistical software, and validation strategies were employed in the application of machine learning methods to inform patient-provider decision making. There is a need to ensure that multiple machine learning approaches are used, the model selection strategy is clearly defined, and both internal and external validation are necessary to be sure that decisions for patient care are being made with the highest quality evidence. Future work should routinely employ ensemble methods incorporating multiple machine learning algorithms.

Download Full-text

Machine-learning based prediction of Cushing’s syndrome in dogs attending UK primary-care veterinary practice

Scientific Reports ◽

10.1038/s41598-021-88440-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Imogen Schofield ◽

David C. Brodbelt ◽

Noel Kennedy ◽

Stijn J. M. Niessen ◽

David B. Church ◽

...

Keyword(s):

Machine Learning ◽

Cushing’S Syndrome ◽

Clinical Decision Making ◽

Predictive Performance ◽

Clinical Decision ◽

Cushing's Syndrome ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Machine Learning Methods ◽

Clinical Records

AbstractCushing’s syndrome is an endocrine disease in dogs that negatively impacts upon the quality-of-life of affected animals. Cushing’s syndrome can be a challenging diagnosis to confirm, therefore new methods to aid diagnosis are warranted. Four machine-learning algorithms were applied to predict a future diagnosis of Cushing's syndrome, using structured clinical data from the VetCompass programme in the UK. Dogs suspected of having Cushing's syndrome were included in the analysis and classified based on their final reported diagnosis within their clinical records. Demographic and clinical features available at the point of first suspicion by the attending veterinarian were included within the models. The machine-learning methods were able to classify the recorded Cushing’s syndrome diagnoses, with good predictive performance. The LASSO penalised regression model indicated the best overall performance when applied to the test set with an AUROC = 0.85 (95% CI 0.80–0.89), sensitivity = 0.71, specificity = 0.82, PPV = 0.75 and NPV = 0.78. The findings of our study indicate that machine-learning methods could predict the future diagnosis of a practicing veterinarian. New approaches using these methods could support clinical decision-making and contribute to improved diagnosis of Cushing’s syndrome in dogs.

Download Full-text

Acoustic feature-based sentiment analysis of call center data

10.32469/10355/66751 ◽

2017 ◽

Author(s):

◽

Zeshan Peng

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Emotion Recognition ◽

Sentiment Analysis ◽

Call Center ◽

Machine Learning Algorithms ◽

Language Recognition ◽

Acoustic Features ◽

Learning Methods ◽

Machine Learning Methods

With the advancement of machine learning methods, audio sentiment analysis has become an active research area in recent years. For example, business organizations are interested in persuasion tactics from vocal cues and acoustic measures in speech. A typical approach is to find a set of acoustic features from audio data that can indicate or predict a customer's attitude, opinion, or emotion state. For audio signals, acoustic features have been widely used in many machine learning applications, such as music classification, language recognition, emotion recognition, and so on. For emotion recognition, previous work shows that pitch and speech rate features are important features. This thesis work focuses on determining sentiment from call center audio records, each containing a conversation between a sales representative and a customer. The sentiment of an audio record is considered positive if the conversation ended with an appointment being made, and is negative otherwise. In this project, a data processing and machine learning pipeline for this problem has been developed. It consists of three major steps: 1) an audio record is split into segments by speaker turns; 2) acoustic features are extracted from each segment; and 3) classification models are trained on the acoustic features to predict sentiment. Different set of features have been used and different machine learning methods, including classical machine learning algorithms and deep neural networks, have been implemented in the pipeline. In our deep neural network method, the feature vectors of audio segments are stacked in temporal order into a feature matrix, which is fed into deep convolution neural networks as input. Experimental results based on real data shows that acoustic features, such as Mel frequency cepstral coefficients, timbre and Chroma features, are good indicators for sentiment. Temporal information in an audio record can be captured by deep convolutional neural networks for improved prediction accuracy.

Download Full-text

MODIS-FIRMS and ground-truthing based wildfire likelihood mapping of Sikkim Himalaya using machine learning algorithms.

10.21203/rs.3.rs-750123/v1 ◽

2021 ◽

Author(s):

Polash Banerjee

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Tree Cover ◽

Anthropogenic Factors ◽

Gradient Boosting ◽

Support Vector ◽

Learning Methods ◽

Sikkim Himalaya ◽

Environmental Features ◽

Machine Learning Methods

Abstract Wildfires in limited extent and intensity can be a boon for the forest ecosystem. However, recent episodes of wildfires of 2019 in Australia and Brazil are sad reminders of their heavy ecological and economical costs. Understanding the role of environmental factors in the likelihood of wildfires in a spatial context would be instrumental in mitigating it. In this study, 14 environmental features encompassing meteorological, topographical, ecological, in situ and anthropogenic factors have been considered for preparing the wildfire likelihood map of Sikkim Himalaya. A comparative study on the efficiency of machine learning methods like Generalized Linear Model (GLM), Support Vector Machine (SVM), Random Forest (RF) and Gradient Boosting Model (GBM) has been performed to identify the best performing algorithm in wildfire prediction. The study indicates that all the machine learning methods are good at predicting wildfires. However, RF has outperformed, followed by GBM in the prediction. Also, environmental features like average temperature, average wind speed, proximity to roadways and tree cover percentage are the most important determinants of wildfires in Sikkim Himalaya. This study can be considered as a decision support tool for preparedness, efficient resource allocation and sensitization of people towards mitigation of wildfires in Sikkim.

Download Full-text

Landslide susceptibility mapping using machine learning for Wenchuan County, Sichuan province, China

E3S Web of Conferences ◽

10.1051/e3sconf/202019803023 ◽

2020 ◽

Vol 198 ◽

pp. 03023

Author(s):

Xin Yang ◽

Rui Liu ◽

Luyao Li ◽

Mei Yang ◽

Yuantao Yang

Keyword(s):

Machine Learning ◽

Landslide Susceptibility ◽

Susceptibility Mapping ◽

Machine Learning Algorithms ◽

Landslide Susceptibility Mapping ◽

Support Vector ◽

Roc Curve Analysis ◽

Learning Methods ◽

Machine Learning Methods ◽

Boosted Decision Tree

Landslide susceptibility mapping is a method used to assess the probability and spatial distribution of landslide occurrences. Machine learning methods have been widely used in landslide susceptibility in recent years. In this paper, six popular machine learning algorithms namely logistic regression, multi-layer perceptron, random forests, support vector machine, Adaboost, and gradient boosted decision tree were leveraged to construct landslide susceptibility models with a total of 1365 landslide points and 14 predisposing factors. Subsequently, the landslide susceptibility maps (LSM) were generated by the trained models. LSM shows the main landslide zone is concentrated in the southeastern area of Wenchuan County. The result of ROC curve analysis shows that all models fitted the training datasets and achieved satisfactory results on validation datasets. The results of this paper reveal that machine learning methods are feasible to build robust landslide susceptibility models.

Download Full-text

Advanced Machine Learning Methods for Prediction of Fracture Closure Pressure

10.2118/200782-ms ◽

2021 ◽

Author(s):

Mohamed Ibrahim Mohamed ◽

Dinesh Mehta ◽

Erdal Ozkan

Keyword(s):

Machine Learning ◽

Integrated Approach ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Learning Methods ◽

Pressure Derivative ◽

Fracture Geometry ◽

Machine Learning Methods ◽

Personal Bias ◽

Fracture Closure

Abstract Determining the closure pressure is crucial for optimal hydraulic fracturing design and successful execution of fracturing treatment. Historically, the use of diagnostic tests before the main fracturing treatment has significantly advanced to gain more information about the pattern of fracture propagation and fluid performance to optimize the designs. The goal is to inject a small volume of fracturing fluid to breakdown the formation and create small fracture geometry, then once pumping is stopped the pressure decline is analyzed to observe the fracture closure. Many analytical methods such as G-Function, square root of time, etc. have been developed to determine the fracture closure pressure. There are cases in which there is difficulty in determining the fracture closure pressure, as well as personal bias and field experiences make it challenging to interpret the changes in the pressure derivative slope and identify fracture closure. These conditions include: High permeability reservoirs where fracture closure occurs very fast due to the quick fluid leakoff.Extremely low permeability reservoir, which requires a long shut-in time for the fluid to leak off and determine the fracture closure pressure.The non-ideal fluid leak-off behavior under complex conditions. The objective of this study is to apply machine learning methods to implement a predesigned algorithm to execute the required tasks and predict the fracture closure pressure while minimizing the shortcomings in determining the closure pressure for non-ideal or subjective conditions. This paper demonstrates training different supervised machine learning algorithms to help predict fracture closure pressure. The workflow involves using the datasets to train and optimize the models, which subsequently are used to predict the closure pressure of testing data. The output results are then compared with actual results from more than 120 DFIT data points. We further propose an integrated approach to feature selection and dataset processing and study the effects of data processing on the success of the model prediction. The results from this study limit the subjectivity and the need for the experience of personal interpreting the data. We speculate that a linear regression and MLP neural network algorithms can yield high scores in the prediction of fracture closure pressure.

Download Full-text

Machine Learning Approaches for Sentiment Analysis

Data Mining and Analysis in the Engineering Field - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-6086-1.ch011 ◽

2014 ◽

pp. 193-208 ◽

Cited By ~ 9

Author(s):

Basant Agarwal ◽

Namita Mittal

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Opinion Mining ◽

Machine Learning Algorithms ◽

Sentiment Classification ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Knowledge Based ◽

Semantic Orientation

Opinion Mining or Sentiment Analysis is the study that analyzes people's opinions or sentiments from the text towards entities such as products and services. It has always been important to know what other people think. With the rapid growth of availability and popularity of online review sites, blogs', forums', and social networking sites' necessity of analysing and understanding these reviews has arisen. The main approaches for sentiment analysis can be categorized into semantic orientation-based approaches, knowledge-based, and machine-learning algorithms. This chapter surveys the machine learning approaches applied to sentiment analysis-based applications. The main emphasis of this chapter is to discuss the research involved in applying machine learning methods mostly for sentiment classification at document level. Machine learning-based approaches work in the following phases, which are discussed in detail in this chapter for sentiment classification: (1) feature extraction, (2) feature weighting schemes, (3) feature selection, and (4) machine-learning methods. This chapter also discusses the standard free benchmark datasets and evaluation methods for sentiment analysis. The authors conclude the chapter with a comparative study of some state-of-the-art methods for sentiment analysis and some possible future research directions in opinion mining and sentiment analysis.

Download Full-text