scholarly journals APPLICATION OF MACHINE LEARNING TO FILL IN THE MISSING MONITORING DATA OF AIR QUALITY

2018 ◽  
Vol 56 (2C) ◽  
pp. 104-110
Author(s):  
Mac Duy Hung

In this paper, three machine learning models have been applied to predict and fill in the missing monitoring data of air quality for Gia Lam and Nha Trang stations in Hanoi and Khanh Hoa respectively, including Autoregressive Moving Average (ARMA), Artificial Neural Network (ANN), and Support Vector Regression (SVR). Two air pollutants being NO2 and PM10 were selected for this study. The experimental results showed that the performance of all three studied models is better than that of some traditional approaches, including Multiple Linear Regression (LR) and Spline interpolation. Besides that, ARMA, ANN and SVR can capture the fluctuation of concentrations of the selected pollutants. These results indicated that the machine learning is a feasible approach to deal with the missing of data which is one of the biggest problems of air quality monitoring stations in Viet Nam. 

2020 ◽  
Vol 10 (24) ◽  
pp. 9151
Author(s):  
Yun-Chia Liang ◽  
Yona Maimury ◽  
Angela Hsiang-Ling Chen ◽  
Josue Rodolfo Cuevas Juarez

Air, an essential natural resource, has been compromised in terms of quality by economic activities. Considerable research has been devoted to predicting instances of poor air quality, but most studies are limited by insufficient longitudinal data, making it difficult to account for seasonal and other factors. Several prediction models have been developed using an 11-year dataset collected by Taiwan’s Environmental Protection Administration (EPA). Machine learning methods, including adaptive boosting (AdaBoost), artificial neural network (ANN), random forest, stacking ensemble, and support vector machine (SVM), produce promising results for air quality index (AQI) level predictions. A series of experiments, using datasets for three different regions to obtain the best prediction performance from the stacking ensemble, AdaBoost, and random forest, found the stacking ensemble delivers consistently superior performance for R2 and RMSE, while AdaBoost provides best results for MAE.


Materials ◽  
2021 ◽  
Vol 14 (15) ◽  
pp. 4068
Author(s):  
Xu Huang ◽  
Mirna Wasouf ◽  
Jessada Sresakoolchai ◽  
Sakdirat Kaewunruen

Cracks typically develop in concrete due to shrinkage, loading actions, and weather conditions; and may occur anytime in its life span. Autogenous healing concrete is a type of self-healing concrete that can automatically heal cracks based on physical or chemical reactions in concrete matrix. It is imperative to investigate the healing performance that autogenous healing concrete possesses, to assess the extent of the cracking and to predict the extent of healing. In the research of self-healing concrete, testing the healing performance of concrete in a laboratory is costly, and a mass of instances may be needed to explore reliable concrete design. This study is thus the world’s first to establish six types of machine learning algorithms, which are capable of predicting the healing performance (HP) of self-healing concrete. These algorithms involve an artificial neural network (ANN), a k-nearest neighbours (kNN), a gradient boosting regression (GBR), a decision tree regression (DTR), a support vector regression (SVR) and a random forest (RF). Parameters of these algorithms are tuned utilising grid search algorithm (GSA) and genetic algorithm (GA). The prediction performance indicated by coefficient of determination (R2) and root mean square error (RMSE) measures of these algorithms are evaluated on the basis of 1417 data sets from the open literature. The results show that GSA-GBR performs higher prediction performance (R2GSA-GBR = 0.958) and stronger robustness (RMSEGSA-GBR = 0.202) than the other five types of algorithms employed to predict the healing performance of autogenous healing concrete. Therefore, reliable prediction accuracy of the healing performance and efficient assistance on the design of autogenous healing concrete can be achieved.


2020 ◽  
Author(s):  
Nazrul Anuar Nayan ◽  
Hafifah Ab Hamid ◽  
Mohd Zubir Suboh ◽  
Noraidatulakma Abdullah ◽  
Rosmina Jaafar ◽  
...  

Abstract Background: Cardiovascular disease (CVD) is the leading cause of deaths worldwide. In 2017, CVD contributed to 13,503 deaths in Malaysia. The current approaches for CVD prediction are usually invasive and costly. Machine learning (ML) techniques allow an accurate prediction by utilizing the complex interactions among relevant risk factors. Results: This study presents a case–control study involving 60 participants from The Malaysian Cohort, which is a prospective population-based project. Five parameters, namely, the R–R interval and root mean square of successive differences extracted from electrocardiogram (ECG), systolic and diastolic blood pressures, and total cholesterol level, were statistically significant in predicting CVD. Six ML algorithms, namely, linear discriminant analysis, linear and quadratic support vector machines, decision tree, k-nearest neighbor, and artificial neural network (ANN), were evaluated to determine the most accurate classifier in predicting CVD risk. ANN, which achieved 90% specificity, 90% sensitivity, and 90% accuracy, demonstrated the highest prediction performance among the six algorithms. Conclusions: In summary, by utilizing ML techniques, ECG data can serve as a good parameter for CVD prediction among the Malaysian multiethnic population.


2020 ◽  
Vol 190 (3) ◽  
pp. 342-351
Author(s):  
Munir S Pathan ◽  
S M Pradhan ◽  
T Palani Selvam

Abstract In the present study, machine learning (ML) methods for the identification of abnormal glow curves (GC) of CaSO4:Dy-based thermoluminescence dosimeters in individual monitoring are presented. The classifier algorithms, random forest (RF), artificial neural network (ANN) and support vector machine (SVM) are employed for identifying not only the abnormal glow curve but also the type of abnormality. For the first time, the simplest and computationally efficient algorithm based on RF is presented for GC classifications. About 4000 GCs are used for the training and validation of ML algorithms. The performance of all algorithms is compared by using various parameters. Results show a fairly good accuracy of 99.05% for the classification of GCs by RF algorithm. Whereas 96.7% and 96.1% accuracy is achieved using ANN and SVM, respectively. The RF-based classifier is recommended for GC classification as well as in assisting the fault determination of the TLD reader system.


2019 ◽  
Vol 11 (4) ◽  
pp. 1284-1301
Author(s):  
Hamed Nozari ◽  
Fateme Tavakoli

Abstract One of the most important bases in the management of catchments and sustainable use of water resources is the prediction of hydrological parameters. In this study, support vector machine (SVM), support vector machine combined with wavelet transform (W-SVM), autoregressive moving average with exogenous variable (ARMAX) model, and autoregressive integrated moving average (ARIMA) models were used to predict monthly values of precipitation, discharge, and evaporation. For this purpose, the monthly time series of rain-gauge, hydrometric, and evaporation-gauge stations located in the catchment area of Hamedan during a 25-year period (1991–2015) were used. Out of this statistical period, 17 years (1991–2007), 4 years (2008–2011), and 4 years (2012–2015) were used for training, calibration, and validation of the models, respectively. The results showed that the ARIMA, SVM, ARMAX, and W-SVM ranked from first to fourth in the monthly precipitation prediction and SVM, ARIMA, ARMAX, and W-SVM were ranked from first to fourth in the monthly discharge and monthly evaporation prediction. It can be said that the SVM has fewer adjustable parameters than other models. Thus, the model is able to predict hydrological changes with greater ease and in less time, because of which it is preferred to other methods.


2019 ◽  
Vol 136 ◽  
pp. 05001 ◽  
Author(s):  
Ziyuan Ye

In order to improve the accuracy of predicting the air pollutants in Shenzhen, a hybrid model based on ARIMA (Autoregressive Integrated Moving Average model) and prophet for mixing time and space relationships was proposed. First, ARIMA and Prophet method were applied to train the data from 11 air quality monitoring stations and gave them different weights. Then, finished the calculation about weight of impact in each air quality monitoring station to final results. Finally, built up the hybrid model and did the error evaluation. The result of the experiments illustrated that this hybrid method can improve the air pollutants prediction in Shenzhen.


2019 ◽  
Vol 9 (20) ◽  
pp. 4448 ◽  
Author(s):  
İş ◽  
Tuncer

This article considers methodological approaches to determine and prevent social media manipulation specific to Twitter. Behavioral analyses of Twitter users were performed by using their profile structures and interaction types, and Twitter users were classified according to their effect size values by determining their asset values. User profiles were classified into three different categories, namely popular-active, observer-passive, and spam-bot-malicious by using k-nearest neighbor (K-NN), support vector machine (SVM), and artificial neural network (ANN) algorithms. For classification, the study used the basic characteristics of users, such as density, centralization, and diameter, as well as suggested time series such as the simple moving average and cumulative moving average. The highest accuracy was obtained by the K-NN algorithm. The results obtained with K-NN for all classes were higher than the F1-Score values obtained for the other algorithms. According to the results obtained, classification accuracy values were found to reach a maximum of 96.81% and a minimum of 92.33%. Our classification results showed that the proposed method was satisfactory for popular-active, observer-passive, and spam-bot-malicious account separation.


Author(s):  
Nayan Nazrul Anuar ◽  
Ab Hamid Hafifah ◽  
Suboh Mohd Zubir ◽  
Abdullah Noraidatulakma ◽  
Jaafar Rosmina ◽  
...  

<p>Cardiovascular disease (CVD) is the leading cause of deaths worldwide. In 2017, CVD contributed to 13,503 deaths in Malaysia. The current approaches for CVD prediction are usually invasive and costly. Machine learning (ML) techniques allow an accurate prediction by utilizing the complex interactions among relevant risk factors. This study presents a case–control study involving 60 participants from The Malaysian Cohort, which is a prospective population-based project. Five parameters, namely, the R–R interval and root mean square of successive differences extracted from electrocardiogram (ECG), systolic and diastolic blood pressures, and total cholesterol level, were statistically significant in predicting CVD. Six ML algorithms, namely, linear discriminant analysis, linear and quadratic support vector machines, decision tree, k-nearest neighbor, and artificial neural network (ANN), were evaluated to determine the most accurate classifier in predicting CVD risk. ANN, which achieved 90% specificity, 90% sensitivity, and 90% accuracy, demonstrated the highest prediction performance among the six algorithms. In summary, by utilizing ML techniques, ECG data can serve as a good parameter for CVD prediction among the Malaysian multiethnic population.</p>


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-23 ◽  
Author(s):  
Mauro Castelli ◽  
Fabiana Martins Clemente ◽  
Aleš Popovič ◽  
Sara Silva ◽  
Leonardo Vanneschi

Predicting air quality is a complex task due to the dynamic nature, volatility, and high variability in time and space of pollutants and particulates. At the same time, being able to model, predict, and monitor air quality is becoming more and more relevant, especially in urban areas, due to the observed critical impact of air pollution on citizens’ health and the environment. In this paper, we employ a popular machine learning method, support vector regression (SVR), to forecast pollutant and particulate levels and to predict the air quality index (AQI). Among the various tested alternatives, radial basis function (RBF) was the type of kernel that allowed SVR to obtain the most accurate predictions. Using the whole set of available variables revealed a more successful strategy than selecting features using principal component analysis. The presented results demonstrate that SVR with RBF kernel allows us to accurately predict hourly pollutant concentrations, like carbon monoxide, sulfur dioxide, nitrogen dioxide, ground-level ozone, and particulate matter 2.5, as well as the hourly AQI for the state of California. Classification into six AQI categories defined by the US Environmental Protection Agency was performed with an accuracy of 94.1% on unseen validation data.


Sign in / Sign up

Export Citation Format

Share Document