APPLICATION OF MACHINE LEARNING TO FILL IN THE MISSING MONITORING DATA OF AIR QUALITY

In this paper, three machine learning models have been applied to predict and fill in the missing monitoring data of air quality for Gia Lam and Nha Trang stations in Hanoi and Khanh Hoa respectively, including Autoregressive Moving Average (ARMA), Artificial Neural Network (ANN), and Support Vector Regression (SVR). Two air pollutants being NO2 and PM10 were selected for this study. The experimental results showed that the performance of all three studied models is better than that of some traditional approaches, including Multiple Linear Regression (LR) and Spline interpolation. Besides that, ARMA, ANN and SVR can capture the fluctuation of concentrations of the selected pollutants. These results indicated that the machine learning is a feasible approach to deal with the missing of data which is one of the biggest problems of air quality monitoring stations in Viet Nam.

Download Full-text

Machine Learning-Based Prediction of Air Quality

Applied Sciences ◽

10.3390/app10249151 ◽

2020 ◽

Vol 10 (24) ◽

pp. 9151

Author(s):

Yun-Chia Liang ◽

Yona Maimury ◽

Angela Hsiang-Ling Chen ◽

Josue Rodolfo Cuevas Juarez

Keyword(s):

Machine Learning ◽

Air Quality ◽

Random Forest ◽

Prediction Models ◽

Superior Performance ◽

Support Vector ◽

Economic Activities ◽

Adaptive Boosting ◽

Series Of Experiments ◽

Artificial Neural Network Ann

Air, an essential natural resource, has been compromised in terms of quality by economic activities. Considerable research has been devoted to predicting instances of poor air quality, but most studies are limited by insufficient longitudinal data, making it difficult to account for seasonal and other factors. Several prediction models have been developed using an 11-year dataset collected by Taiwan’s Environmental Protection Administration (EPA). Machine learning methods, including adaptive boosting (AdaBoost), artificial neural network (ANN), random forest, stacking ensemble, and support vector machine (SVM), produce promising results for air quality index (AQI) level predictions. A series of experiments, using datasets for three different regions to obtain the best prediction performance from the stacking ensemble, AdaBoost, and random forest, found the stacking ensemble delivers consistently superior performance for R2 and RMSE, while AdaBoost provides best results for MAE.

Download Full-text

Prediction of Healing Performance of Autogenous Healing Concrete Using Machine Learning

Materials ◽

10.3390/ma14154068 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4068

Author(s):

Xu Huang ◽

Mirna Wasouf ◽

Jessada Sresakoolchai ◽

Sakdirat Kaewunruen

Keyword(s):

Machine Learning ◽

Search Algorithm ◽

Weather Conditions ◽

Prediction Performance ◽

Machine Learning Algorithms ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Support Vector ◽

Self Healing ◽

Artificial Neural Network Ann

Cracks typically develop in concrete due to shrinkage, loading actions, and weather conditions; and may occur anytime in its life span. Autogenous healing concrete is a type of self-healing concrete that can automatically heal cracks based on physical or chemical reactions in concrete matrix. It is imperative to investigate the healing performance that autogenous healing concrete possesses, to assess the extent of the cracking and to predict the extent of healing. In the research of self-healing concrete, testing the healing performance of concrete in a laboratory is costly, and a mass of instances may be needed to explore reliable concrete design. This study is thus the world’s first to establish six types of machine learning algorithms, which are capable of predicting the healing performance (HP) of self-healing concrete. These algorithms involve an artificial neural network (ANN), a k-nearest neighbours (kNN), a gradient boosting regression (GBR), a decision tree regression (DTR), a support vector regression (SVR) and a random forest (RF). Parameters of these algorithms are tuned utilising grid search algorithm (GSA) and genetic algorithm (GA). The prediction performance indicated by coefficient of determination (R2) and root mean square error (RMSE) measures of these algorithms are evaluated on the basis of 1417 data sets from the open literature. The results show that GSA-GBR performs higher prediction performance (R2GSA-GBR = 0.958) and stronger robustness (RMSEGSA-GBR = 0.202) than the other five types of algorithms employed to predict the healing performance of autogenous healing concrete. Therefore, reliable prediction accuracy of the healing performance and efficient assistance on the design of autogenous healing concrete can be achieved.

Download Full-text

Autoregressive moving average based anycast with support vector machine clustering in mobile ad‐hoc networks

Transactions on Emerging Telecommunications Technologies ◽

10.1002/ett.4432 ◽

2021 ◽

Author(s):

Subhankar Ghosh ◽

Anuradha Banerjee ◽

Abu Sufian ◽

Sachin Kumar Gupta

Keyword(s):

Support Vector Machine ◽

Ad Hoc Networks ◽

Mobile Ad Hoc Networks ◽

Ad Hoc ◽

Moving Average ◽

Support Vector ◽

Autoregressive Moving Average ◽

Mobile Ad Hoc ◽

Hoc Networks

Download Full-text

Cardiovascular Disease Prediction from Electrocardiogram by using Machine Learning Method: A Snapshot from the Subjects of the Malaysian Cohort

10.21203/rs.2.22561/v1 ◽

2020 ◽

Author(s):

Nazrul Anuar Nayan ◽

Hafifah Ab Hamid ◽

Mohd Zubir Suboh ◽

Noraidatulakma Abdullah ◽

Rosmina Jaafar ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Nearest Neighbor ◽

Total Cholesterol Level ◽

Population Based ◽

Support Vector ◽

K Nearest Neighbor ◽

Cvd Risk ◽

Linear Discriminant ◽

Artificial Neural Network Ann

Abstract Background: Cardiovascular disease (CVD) is the leading cause of deaths worldwide. In 2017, CVD contributed to 13,503 deaths in Malaysia. The current approaches for CVD prediction are usually invasive and costly. Machine learning (ML) techniques allow an accurate prediction by utilizing the complex interactions among relevant risk factors. Results: This study presents a case–control study involving 60 participants from The Malaysian Cohort, which is a prospective population-based project. Five parameters, namely, the R–R interval and root mean square of successive differences extracted from electrocardiogram (ECG), systolic and diastolic blood pressures, and total cholesterol level, were statistically significant in predicting CVD. Six ML algorithms, namely, linear discriminant analysis, linear and quadratic support vector machines, decision tree, k-nearest neighbor, and artificial neural network (ANN), were evaluated to determine the most accurate classifier in predicting CVD risk. ANN, which achieved 90% specificity, 90% sensitivity, and 90% accuracy, demonstrated the highest prediction performance among the six algorithms. Conclusions: In summary, by utilizing ML techniques, ECG data can serve as a good parameter for CVD prediction among the Malaysian multiethnic population.

Download Full-text

MACHINE LEARNING ALGORITHMS FOR IDENTIFICATION OF ABNORMAL GLOW CURVES AND ASSOCIATED ABNORMALITY IN CaSO4:DY-BASED PERSONNEL MONITORING DOSIMETERS

Radiation Protection Dosimetry ◽

10.1093/rpd/ncaa108 ◽

2020 ◽

Vol 190 (3) ◽

pp. 342-351

Author(s):

Munir S Pathan ◽

S M Pradhan ◽

T Palani Selvam

Keyword(s):

Machine Learning ◽

Glow Curve ◽

Good Accuracy ◽

Machine Learning Algorithms ◽

Support Vector ◽

Computationally Efficient ◽

Artificial Neural Network Ann ◽

First Time

Abstract In the present study, machine learning (ML) methods for the identification of abnormal glow curves (GC) of CaSO4:Dy-based thermoluminescence dosimeters in individual monitoring are presented. The classifier algorithms, random forest (RF), artificial neural network (ANN) and support vector machine (SVM) are employed for identifying not only the abnormal glow curve but also the type of abnormality. For the first time, the simplest and computationally efficient algorithm based on RF is presented for GC classifications. About 4000 GCs are used for the training and validation of ML algorithms. The performance of all algorithms is compared by using various parameters. Results show a fairly good accuracy of 99.05% for the classification of GCs by RF algorithm. Whereas 96.7% and 96.1% accuracy is achieved using ANN and SVM, respectively. The RF-based classifier is recommended for GC classification as well as in assisting the fault determination of the TLD reader system.

Download Full-text

Forecasting hydrologic parameters using linear and nonlinear stochastic models

Journal of Water and Climate Change ◽

10.2166/wcc.2019.249 ◽

2019 ◽

Vol 11 (4) ◽

pp. 1284-1301

Author(s):

Hamed Nozari ◽

Fateme Tavakoli

Keyword(s):

Support Vector Machine ◽

Moving Average ◽

Sustainable Use ◽

Rain Gauge ◽

Support Vector ◽

Autoregressive Moving Average ◽

Monthly Precipitation ◽

Precipitation Prediction ◽

Armax Model ◽

Monthly Discharge

Abstract One of the most important bases in the management of catchments and sustainable use of water resources is the prediction of hydrological parameters. In this study, support vector machine (SVM), support vector machine combined with wavelet transform (W-SVM), autoregressive moving average with exogenous variable (ARMAX) model, and autoregressive integrated moving average (ARIMA) models were used to predict monthly values of precipitation, discharge, and evaporation. For this purpose, the monthly time series of rain-gauge, hydrometric, and evaporation-gauge stations located in the catchment area of Hamedan during a 25-year period (1991–2015) were used. Out of this statistical period, 17 years (1991–2007), 4 years (2008–2011), and 4 years (2012–2015) were used for training, calibration, and validation of the models, respectively. The results showed that the ARIMA, SVM, ARMAX, and W-SVM ranked from first to fourth in the monthly precipitation prediction and SVM, ARIMA, ARMAX, and W-SVM were ranked from first to fourth in the monthly discharge and monthly evaporation prediction. It can be said that the SVM has fewer adjustable parameters than other models. Thus, the model is able to predict hydrological changes with greater ease and in less time, because of which it is preferred to other methods.

Download Full-text

Air Pollutants Prediction in Shenzhen Based on ARIMA and Prophet Method

E3S Web of Conferences ◽

10.1051/e3sconf/201913605001 ◽

2019 ◽

Vol 136 ◽

pp. 05001 ◽

Cited By ~ 2

Author(s):

Ziyuan Ye

Keyword(s):

Air Quality ◽

Hybrid Model ◽

Air Pollutants ◽

Moving Average ◽

Mixing Time ◽

Quality Monitoring ◽

Air Quality Monitoring ◽

Autoregressive Integrated Moving Average ◽

Moving Average Model ◽

Air Quality Monitoring Stations

In order to improve the accuracy of predicting the air pollutants in Shenzhen, a hybrid model based on ARIMA (Autoregressive Integrated Moving Average model) and prophet for mixing time and space relationships was proposed. First, ARIMA and Prophet method were applied to train the data from 11 air quality monitoring stations and gave them different weights. Then, finished the calculation about weight of impact in each air quality monitoring station to final results. Finally, built up the hybrid model and did the error evaluation. The result of the experiments illustrated that this hybrid method can improve the air pollutants prediction in Shenzhen.

Download Full-text

Interaction-Based Behavioral Analysis of Twitter Social Network Accounts

Applied Sciences ◽

10.3390/app9204448 ◽

2019 ◽

Vol 9 (20) ◽

pp. 4448 ◽

Cited By ~ 3

Author(s):

İş ◽

Tuncer

Keyword(s):

Nearest Neighbor ◽

Moving Average ◽

Support Vector ◽

K Nearest Neighbor ◽

Asset Values ◽

Methodological Approaches ◽

Twitter Users ◽

Basic Characteristics ◽

Artificial Neural Network Ann ◽

Media Manipulation

This article considers methodological approaches to determine and prevent social media manipulation specific to Twitter. Behavioral analyses of Twitter users were performed by using their profile structures and interaction types, and Twitter users were classified according to their effect size values by determining their asset values. User profiles were classified into three different categories, namely popular-active, observer-passive, and spam-bot-malicious by using k-nearest neighbor (K-NN), support vector machine (SVM), and artificial neural network (ANN) algorithms. For classification, the study used the basic characteristics of users, such as density, centralization, and diameter, as well as suggested time series such as the simple moving average and cumulative moving average. The highest accuracy was obtained by the K-NN algorithm. The results obtained with K-NN for all classes were higher than the F1-Score values obtained for the other algorithms. According to the results obtained, classification accuracy values were found to reach a maximum of 96.81% and a minimum of 92.33%. Our classification results showed that the proposed method was satisfactory for popular-active, observer-passive, and spam-bot-malicious account separation.

Download Full-text

Cardiovascular Disease Prediction from Electrocardiogram by Using Machine Learning

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v16i07.13569 ◽

2020 ◽

Vol 16 (07) ◽

pp. 34

Author(s):

Nayan Nazrul Anuar ◽

Ab Hamid Hafifah ◽

Suboh Mohd Zubir ◽

Abdullah Noraidatulakma ◽

Jaafar Rosmina ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Nearest Neighbor ◽

Total Cholesterol Level ◽

Population Based ◽

Support Vector ◽

K Nearest Neighbor ◽

Cvd Risk ◽

Linear Discriminant ◽

Artificial Neural Network Ann

<p>Cardiovascular disease (CVD) is the leading cause of deaths worldwide. In 2017, CVD contributed to 13,503 deaths in Malaysia. The current approaches for CVD prediction are usually invasive and costly. Machine learning (ML) techniques allow an accurate prediction by utilizing the complex interactions among relevant risk factors. This study presents a case–control study involving 60 participants from The Malaysian Cohort, which is a prospective population-based project. Five parameters, namely, the R–R interval and root mean square of successive differences extracted from electrocardiogram (ECG), systolic and diastolic blood pressures, and total cholesterol level, were statistically significant in predicting CVD. Six ML algorithms, namely, linear discriminant analysis, linear and quadratic support vector machines, decision tree, k-nearest neighbor, and artificial neural network (ANN), were evaluated to determine the most accurate classifier in predicting CVD risk. ANN, which achieved 90% specificity, 90% sensitivity, and 90% accuracy, demonstrated the highest prediction performance among the six algorithms. In summary, by utilizing ML techniques, ECG data can serve as a good parameter for CVD prediction among the Malaysian multiethnic population.</p>

Download Full-text

A Machine Learning Approach to Predict Air Quality in California

Complexity ◽

10.1155/2020/8049504 ◽

2020 ◽

Vol 2020 ◽

pp. 1-23 ◽

Cited By ~ 2

Author(s):

Mauro Castelli ◽

Fabiana Martins Clemente ◽

Aleš Popovič ◽

Sara Silva ◽

Leonardo Vanneschi

Keyword(s):

Machine Learning ◽

Air Quality ◽

Environmental Protection Agency ◽

Urban Areas ◽

Principal Component ◽

Ground Level ◽

Support Vector ◽

Validation Data ◽

Pollutant Concentrations ◽

Rbf Kernel

Predicting air quality is a complex task due to the dynamic nature, volatility, and high variability in time and space of pollutants and particulates. At the same time, being able to model, predict, and monitor air quality is becoming more and more relevant, especially in urban areas, due to the observed critical impact of air pollution on citizens’ health and the environment. In this paper, we employ a popular machine learning method, support vector regression (SVR), to forecast pollutant and particulate levels and to predict the air quality index (AQI). Among the various tested alternatives, radial basis function (RBF) was the type of kernel that allowed SVR to obtain the most accurate predictions. Using the whole set of available variables revealed a more successful strategy than selecting features using principal component analysis. The presented results demonstrate that SVR with RBF kernel allows us to accurately predict hourly pollutant concentrations, like carbon monoxide, sulfur dioxide, nitrogen dioxide, ground-level ozone, and particulate matter 2.5, as well as the hourly AQI for the state of California. Classification into six AQI categories defined by the US Environmental Protection Agency was performed with an accuracy of 94.1% on unseen validation data.

Download Full-text