Reconstruction of high frequency methane peaks from measurements of metal oxide low-cost sensors using machine learning

Deploying a dense network of sensors around emitting industrial facilities allows to detect and quantify possible CH4&#160;leaks and monitor the emissions continuously. Designing such a monitoring network with highly precise instruments is limited by the elevated cost of instruments, requirements of power consumption and maintenance. Low cost and low power metal oxide sensor could come handy to be an alternative to deploy this kind of network at a fraction of the cost with satisfactory quality of measurements for such applications.Recent studies have tested Metal Oxide Sensors (MOx) on natural and controlled conditions to measure atmospheric methane concentrations and showed a fair agreement with high precision instruments, such as those from Cavity Ring Down Spectrometers (CRDS). Such results open perspectives regarding the potential of MOx to be employed as an alternative to measure and quantify CH4 emissions on industrial facilities. However, such sensors are known to drift with time, to be highly sensitive to water vapor mole fraction, have a poor selectivity with several known cross-sensitivities to other species and present significant sensitivity environmental factors like temperature and pressure. Different approaches for the derivation of CH4 mole fractions from the MOx signal and ancillary parameter measurements have been employed to overcome these problems, from traditional approaches like linear or multilinear regressions to machine learning (ANN, SVM or Random Forest).Most studies were focused on the derivation of ambient CH4 concentrations under different conditions, but few tests assessed the performance of these sensors to capture CH4 variations at high frequency, with peaks of elevated concentrations, which corresponds well with the signal observed from point sources in industrial sites presenting leakage and isolated methane emission. We conducted a continuous controlled experiment over four months (from November 2019 to February 2020) in which three types of MOx Sensors from Figaro&#174; measured high frequency CH4 peaks with concentrations varying between atmospheric background levels up to 24 ppm at LSCE, Saclay, France. We develop a calibration strategy including a two-step baseline correction and compared different approaches to reconstruct CH4 spikes such as linear, multilinear and polynomial regression, and ANN and random forest algorithms. We found that baseline correction in the pre-processing stage improved the reconstruction of CH4 concentrations in the spikes. The random forest models performed better than other methods achieving a mean RMSE = 0.25 ppm when reconstructing peaks amplitude over windows of 4 days. In addition, we conducted tests to determine the minimum amount of data required to train successful models for predicting CH4 spikes, and the needed frequency of re-calibration / re-training under these controlled circumstances. We concluded that for a target RMSE <= 0.3 ppm at a measurement frequency of 5s, 4 days of training are required, and a recalibration / re-training is recommended every 30 days.Our study presents a new approach to process and reconstruct observations from low cost CH4 sensors and highlights its potential to quantify high concentration releases in industrial facilities.

Download Full-text

A Daily Covid-19 Cases Prediction System using Data Mining and Machine Learning Algorithm

10.5121/csit.2021.112320 ◽

2021 ◽

Author(s):

Yiqi Jack Gao ◽

Yu Sun

Keyword(s):

Machine Learning ◽

Random Forest ◽

Hospital Admissions ◽

Polynomial Regression ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Policy Makers ◽

Diverse Range ◽

Using Data

The start of 2020 marked the beginning of the deadly COVID-19 pandemic caused by the novel SARS-COV-2 from Wuhan, China. As of the time of writing, the virus had infected over 150 million people worldwide and resulted in more than 3.5 million global deaths. Accurate future predictions made through machine learning algorithms can be very useful as a guide for hospitals and policy makers to make adequate preparations and enact effective policies to combat the pandemic. This paper carries out a two pronged approach to analyzing COVID-19. First, the model utilizes the feature significance of random forest regressor to select eight of the most significant predictors (date, new tests, weekly hospital admissions, population density, total tests, total deaths, location, and total cases) for predicting daily increases of Covid-19 cases, highlighting potential target areas in order to achieve efficient pandemic responses. Then it utilizes machine learning algorithms such as linear regression, polynomial regression, and random forest regression to make accurate predictions of daily COVID-19 cases using a combination of this diverse range of predictors and proved to be competent at generating predictions with reasonable accuracy.

Download Full-text

Application of Machine Learning Algorithms in Predicting Pyrolytic Analysis Result

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/931/1/012013 ◽

2021 ◽

Vol 931 (1) ◽

pp. 012013

Author(s):

Le Thi Nhut Suong ◽

A V Bondarev ◽

E V Kozlova

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Decision Tree ◽

Multiple Linear Regression ◽

Oil And Gas ◽

Polynomial Regression ◽

Source Rocks ◽

Regression Algorithms ◽

Before And After

Abstract Geochemical studies of organic matter in source rocks play an important role in predicting the oil and gas accumulation of any territory, especially in oil and gas shale. For deep understanding, pyrolytic analyses are often carried out on samples before and after extraction of hydrocarbon with chloroform. However, extraction is a laborious and time-consuming process and the workload of laboratory equipment and time doubles. In this work, machine learning regression algorithms is applied for forecasting S2ex based on the pyrolytic analytic result of non-extracted samples. This study is carried out using more than 300 samples from 3 different wells in Bazhenov formation, Western Siberia. For developing a prediction model, 5 different machine learning regression algorithms including Multiple Linear Regression, Polynomial Regression, Support vector regression, Decision tree and Random forest have been tested and compared. The performance of these algorithms is examined by R-squared coefficient. The data of the X2 well was used for building a model. Simultaneously, this data is divided into 2 parts – 80% for training and 20% for checking. The model also was used for prediction of wells X1 and X3. Then, these predictive results were compared with the real results, which had been obtained from standard experiments. Despite limited amount of data, the result exceeded all expectations. The result of prediction also showcases that the relationship between before and after extraction parameters are complex and non-linear. The proof is R2 value of Multiple Linear Regression and Polynomial Regression is negative, which means the model is broken. However, Random forest and Decision tree give us a good performance. With the same algorithms, we can apply for prediction all geochemical parameters by depth or utilize them for well-logging data.

Download Full-text

Sustainable Irrigation System for Farming Supported by Machine Learning and Real-Time Sensor Data

Sensors ◽

10.3390/s21093079 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3079

Author(s):

André Glória ◽

João Cardoso ◽

Pedro Sebastião

Keyword(s):

Machine Learning ◽

Random Forest ◽

Real Time ◽

New Technologies ◽

Irrigation System ◽

Low Cost ◽

Time Of Day ◽

Machine Learning Algorithms ◽

Sensor Data ◽

Sensors And Actuators

Presently, saving natural resources is increasingly a concern, and water scarcity is a fact that has been occurring in more areas of the globe. One of the main strategies used to counter this trend is the use of new technologies. On this topic, the Internet of Things has been highlighted, with these solutions being characterized by offering robustness and simplicity, while being low cost. This paper presents the study and development of an automatic irrigation control system for agricultural fields. The developed solution had a wireless sensors and actuators network, a mobile application that offers the user the capability of consulting not only the data collected in real time but also their history and also act in accordance with the data it analyses. To adapt the water management, Machine Learning algorithms were studied to predict the best time of day for water administration. Of the studied algorithms (Decision Trees, Random Forest, Neural Networks, and Support Vectors Machines) the one that obtained the best results was Random Forest, presenting an accuracy of 84.6%. Besides the ML solution, a method was also developed to calculate the amount of water needed to manage the fields under analysis. Through the implementation of the system it was possible to realize that the developed solution is effective and can achieve up to 60% of water savings.

Download Full-text

NanoTox: Development of a parsimonious in silico model for toxicity assessment of metal-oxide nanoparticles using physicochemical features

10.1101/2021.02.22.432301 ◽

2021 ◽

Author(s):

Nilesh AnanthaSubramanian ◽

Ashok Palaniappan

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Metal Oxide ◽

Metal Oxide Nanoparticles ◽

Feature Space ◽

Assay Method ◽

Oxide Nanoparticles ◽

Learning Models ◽

Machine Learning Models

AbstractMetal-oxide nanoparticles find widespread applications in mundane life today, and cost-effective evaluation of their cytotoxicity and ecotoxicity is essential for sustainable progress. Machine learning models use existing experimental data, and learn the relationship of various features to nanoparticle cytotoxicity to generate predictive models. In this work, we adopted a principled approach to this problem by formulating a feature space based on intrinsic and extrinsic physico-chemical properties, but exclusive of any in vitro characteristics such as cell line, cell type, and assay method. A minimal set of features was developed by applying variance inflation analysis to the correlation structure of the feature space. Using a balanced dataset, a mapping was then obtained from the normalized feature space to the toxicity class using various hyperparameter-tuned machine learning models. Evaluation on an unseen test set yielded > 96% balanced accuracy for both the random forest model, and neural network with one hidden layer model. The obtained cytotoxicity models are parsimonious, with intelligible inputs, and include an applicability check. Interpretability investigations of the models yielded the key predictor variables of metal-oxide nanoparticle cytotoxicity. Our models could be applied on new, untested oxides, using a majority-voting ensemble classifier, NanoTox, that incorporates the neural network, random forest, support vector machine, and logistic regression models. NanoTox is the very first predictive nanotoxicology pipeline made freely available under the GNU General Public License (https://github.com/NanoTox).

Download Full-text

Methods to estimate aspects of physical activity and sedentary behavior from high-frequency wrist accelerometer measurements

Journal of Applied Physiology ◽

10.1152/japplphysiol.00026.2015 ◽

2015 ◽

Vol 119 (4) ◽

pp. 396-403 ◽

Cited By ~ 66

Author(s):

John Staudenmayer ◽

Shai He ◽

Amanda Hickey ◽

Jeffer Sasaki ◽

Patty Freedson

Keyword(s):

Physical Activity ◽

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Sedentary Behavior ◽

High Frequency ◽

Sedentary Time ◽

Correct Decision ◽

Activity Intensity ◽

Better Than

This investigation developed models to estimate aspects of physical activity and sedentary behavior from three-axis high-frequency wrist-worn accelerometer data. The models were developed and tested on 20 participants ( n = 10 males, n = 10 females, mean age = 24.1, mean body mass index = 23.9), who wore an ActiGraph GT3X+ accelerometer on their dominant wrist and an ActiGraph GT3X on the hip while performing a variety of scripted activities. Energy expenditure was concurrently measured by a portable indirect calorimetry system. Those calibration data were then used to develop and assess both machine-learning and simpler models with fewer unknown parameters (linear regression and decision trees) to estimate metabolic equivalent scores (METs) and to classify activity intensity, sedentary time, and locomotion time. The wrist models, applied to 15-s windows, estimated METs [random forest: root mean squared error (rSME) = 1.21 METs, hip: rMSE = 1.67 METs] and activity intensity (random forest: 75% correct, hip: 60% correct) better than a previously developed model that used counts per minute measured at the hip. In a separate set of comparisons, the simpler decision trees classified activity intensity (random forest: 75% correct, tree: 74% correct), sedentary time (random forest: 96% correct, decision tree: 97% correct), and locomotion time (random forest: 99% correct, decision tree: 96% correct) nearly as well or better than the machine-learning approaches. Preliminary investigation of the models' performance on two free-living people suggests that they may work well outside of controlled conditions.

Download Full-text

FORECASTING PRICES IN THE RENTAL HOUSING MARKET WITH MACHINE LEARNING METHODS

Bulletin of V. N. Karazin Kharkiv National University Economic Series ◽

10.26565/2311-2379-2020-99-12 ◽

2020 ◽

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Regression Models ◽

Data Science ◽

Polynomial Regression ◽

Short Term ◽

Learning Methods ◽

Machine Learning Methods ◽

Pricing Factors

The study of pricing factors in the market of the short-term rental has been done. Airbnb was chosen as the object of the study; it is a platform for accommodation, search, and rental around the world. At the beginning of 2021, the company offers 7 million homes from more than 220 countries. The Data Science methods play a significant role in the company's success. One of the key algorithms of the company is the pricing algorithm. Using the "Price Recommendations" feature, the homeowner can analyze which dates are most likely to be booked at the current price and which are not, it helps form a favorable offer. The system calculates the recommended cost of housing based on hundreds of parameters, some of which are easy to recognize, but there are less obvious factors that can also affect demand. The paper proposes an algorithm for identifying implicit pricing factors in the short-term rental market using machine learning methods, which includes: 1) data mining and data preparation; 2) building and analysis of linear regression models; 3) building and analysis of nonlinear regression models. The study was based on ads from the Airbnb site in Washington and New York using scripts developed in Python. The following models are built and analyzed: simple linear regression, multiple linear regression, polynomial regression, decision trees, random forest, and boosting. The results of the study showed that the most important factors are accommodates, cleaning_fee, room_type, bedrooms. But based on the model evaluation criteria, they cannot be used for implementation: linear models are of low quality, while the random forest, boosting, and trees are overfitted. Still the results can be used in conducting business analysis.

Download Full-text

Demonstration of a Low-Cost Multi-Pollutant Network to Quantify Intra-Urban Spatial Variations in Air Pollutant Source Impacts and to Evaluate Environmental Justice

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph16142523 ◽

2019 ◽

Vol 16 (14) ◽

pp. 2523 ◽

Cited By ~ 8

Author(s):

Rebecca Tanzer ◽

Carl Malings ◽

Aliaksei Hauryliuk ◽

R. Subramanian ◽

Albert A. Presto

Keyword(s):

Environmental Justice ◽

Regulatory Networks ◽

Low Cost ◽

Background Level ◽

Poverty Line ◽

Point Sources ◽

Air Pollutant ◽

Industrial Facilities ◽

Wide Range ◽

Coefficient Of Divergence

Air quality monitoring has traditionally been conducted using sparsely distributed, expensive reference monitors. To understand variations in PM2.5 on a finely resolved spatiotemporal scale a dense network of over 40 low-cost monitors was deployed throughout and around Pittsburgh, Pennsylvania, USA. Monitor locations covered a wide range of site types with varying traffic and restaurant density, varying influences from local sources, and varying socioeconomic (environmental justice, EJ) characteristics. Variability between and within site groupings was observed. Concentrations were higher near the source-influenced sites than the Urban or Suburban Residential sites. Gaseous pollutants (NO2 and SO2) were used to differentiate between traffic (higher NO2 concentrations) and industrial (higher SO2 concentrations) sources of PM2.5. Statistical analysis proved these differences to be significant (coefficient of divergence > 0.2). The highest mean PM2.5 concentrations were measured downwind (east) of the two industrial facilities while background level PM2.5 concentrations were measured at similar distances upwind (west) of the point sources. Socioeconomic factors, including the fraction of non-white population and fraction of population living under the poverty line, were not correlated with increases in PM2.5 or NO2 concentration. The analysis conducted here highlights differences in PM2.5 concentration within site groupings that have similar land use thus demonstrating the utility of a dense sensor network. Our network captures temporospatial pollutant patterns that sparse regulatory networks cannot.

Download Full-text

An Efficient Covid19 Epidemic Analysis and Prediction Model Using Machine Learning Algorithms

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v17i11.25209 ◽

2021 ◽

Vol 17 (11) ◽

pp. 176

Author(s):

A Lakshmanarao ◽

M Raja Babu ◽

T Srinivasa Ravi Kiran

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Polynomial Regression ◽

Machine Learning Algorithms ◽

World Health ◽

Main Concern ◽

Python Language ◽

Precautionary Measures ◽

Health Organization

The whole world is experiencing a novel infection called Coronavirus brought about by a Covid since 2019. The main concern about this disease is the absence of proficient authentic medicine The World Health Organization (WHO) proposed a few precautionary measures to manage the spread of illness and to lessen the defilement in this manner decreasing cases. In this paper, we analyzed the Coronavirus dataset accessible in Kaggle. The past contributions from a few researchers of comparative work covered a limited number of days. Our paper used the covid19 data till May 2021. The number of confirmed cases, recovered cases, and death cases are considered for analysis. The corona cases are analyzed in a daily, weekly manner to get insight into the dataset. After extensive analysis, we proposed machine learning regressors for covid 19 predictions. We applied linear regression, polynomial regression, Decision Tree Regressor, Random Forest Regressor. Decision Tree and Random Forest given an r-square value of 0.99. We also predicted future cases with these four algorithms. We can able to predict future cases better with the polynomial regression technique. This prediction can help to take preventive measures to control covid19 in near future. All the experiments are conducted with python language

Download Full-text

Air-to-Air Path Loss Prediction Based on Machine Learning Methods in Urban Environments

Wireless Communications and Mobile Computing ◽

10.1155/2018/8489326 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 15

Author(s):

Yan Zhang ◽

Jinxiao Wen ◽

Guanshu Yang ◽

Zunwen He ◽

Xinran Luo

Keyword(s):

Machine Learning ◽

Random Forest ◽

Prediction Models ◽

Low Cost ◽

Path Loss ◽

Absolute Error ◽

Urban Environments ◽

Machine Learning Algorithms ◽

Training Data ◽

Prediction Errors

Recently, unmanned aerial vehicle (UAV) plays an important role in many applications because of its high flexibility and low cost. To realize reliable UAV communications, a fundamental work is to investigate the propagation characteristics of the channels. In this paper, we propose path loss models for the UAV air-to-air (AA) scenario based on machine learning. A ray-tracing software is employed to generate samples for multiple routes in a typical urban environment, and different altitudes of Tx and Rx UAVs are taken into consideration. Two machine-learning algorithms, Random Forest and KNN, are exploited to build prediction models on the basis of the training data. The prediction performance of trained models is assessed on the test set according to the metrics including the mean absolute error (MAE) and root mean square error (RMSE). Meanwhile, two empirical models are presented for comparison. It is shown that the machine-learning-based models are able to provide high prediction accuracy and acceptable computational efficiency in the AA scenario. Moreover, Random Forest outperforms other models and has the smallest prediction errors. Further investigation is made to evaluate the impacts of five different parameters on the path loss. It is demonstrated that the path visibility is crucial for the path loss.

Download Full-text

Water Quality Monitoring and Prediction of Water Quality at College Premises using Internet of Things

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e1014.0785s319 ◽

2019 ◽

Vol 8 (5S3) ◽

pp. 53-57

Keyword(s):

Machine Learning ◽

Water Quality ◽

Random Forest ◽

Real Time ◽

Multilayer Perceptron ◽

Low Cost ◽

Water Quality Monitoring ◽

Quality Monitoring ◽

Machine Learning Techniques ◽

Learning Techniques

IoT is becoming more popular and effective tool for any real time application. It has been involved for various water quality monitoring system to maintain the water hygiene level. The main objective is to build a system that regularly monitors the water quality and manages the sustainability. This system deals with specific standards like low cost background and system efficiency when compared to other studies. In this paper, IoT based real time monitoring of water quality system is implemented along with Machine learning techniques such as J48, Multilayer Perceptron (MLP), and Random Forest. These machine learning techniques are compared based on the hyper-parameters and the results were obtained. The attributes such as pH, Dissolved Oxygen (DO), turbidity, conductivity obtained from the corresponding sensors are used to create a prediction model which classifies the quality of water. Measurement of water quality and reporting system is implemented by using Arduino controller, GSM/GPRS module for gathering data in real time. The collected data are then analyzed using WEKA interface which is a visualization tool used for the analysis of data and prediction modeling.The Random forest technique outperforms J48 and Multilayer perceptron by giving 98.89% of correctly classified instances.

Download Full-text