A Novel Random Forest and its Application on Classification of Air Quality

Author(s):  
Hualing Yi ◽  
Qingyu Xiong ◽  
Qinghong Zou ◽  
Rui Xu ◽  
Kai Wang ◽  
...  
Keyword(s):  
2020 ◽  
Vol 10 (24) ◽  
pp. 9151
Author(s):  
Yun-Chia Liang ◽  
Yona Maimury ◽  
Angela Hsiang-Ling Chen ◽  
Josue Rodolfo Cuevas Juarez

Air, an essential natural resource, has been compromised in terms of quality by economic activities. Considerable research has been devoted to predicting instances of poor air quality, but most studies are limited by insufficient longitudinal data, making it difficult to account for seasonal and other factors. Several prediction models have been developed using an 11-year dataset collected by Taiwan’s Environmental Protection Administration (EPA). Machine learning methods, including adaptive boosting (AdaBoost), artificial neural network (ANN), random forest, stacking ensemble, and support vector machine (SVM), produce promising results for air quality index (AQI) level predictions. A series of experiments, using datasets for three different regions to obtain the best prediction performance from the stacking ensemble, AdaBoost, and random forest, found the stacking ensemble delivers consistently superior performance for R2 and RMSE, while AdaBoost provides best results for MAE.


Author(s):  
Ahmad R. Alsaber ◽  
Jiazhu Pan ◽  
Adeeba Al-Hurban 

In environmental research, missing data are often a challenge for statistical modeling. This paper addressed some advanced techniques to deal with missing values in a data set measuring air quality using a multiple imputation (MI) approach. MCAR, MAR, and NMAR missing data techniques are applied to the data set. Five missing data levels are considered: 5%, 10%, 20%, 30%, and 40%. The imputation method used in this paper is an iterative imputation method, missForest, which is related to the random forest approach. Air quality data sets were gathered from five monitoring stations in Kuwait, aggregated to a daily basis. Logarithm transformation was carried out for all pollutant data, in order to normalize their distributions and to minimize skewness. We found high levels of missing values for NO2 (18.4%), CO (18.5%), PM10 (57.4%), SO2 (19.0%), and O3 (18.2%) data. Climatological data (i.e., air temperature, relative humidity, wind direction, and wind speed) were used as control variables for better estimation. The results show that the MAR technique had the lowest RMSE and MAE. We conclude that MI using the missForest approach has a high level of accuracy in estimating missing values. MissForest had the lowest imputation error (RMSE and MAE) among the other imputation methods and, thus, can be considered to be appropriate for analyzing air quality data.


IEEE Access ◽  
2021 ◽  
Vol 9 ◽  
pp. 45993-45999
Author(s):  
Ung Yang ◽  
Seungwon Oh ◽  
Seung Gon Wi ◽  
Bok-Rye Lee ◽  
Sang-Hyun Lee ◽  
...  

Author(s):  
Balajee Alphonse ◽  
Venkatesan Rajagopal ◽  
Sudhakar Sengan ◽  
Kousalya Kittusamy ◽  
Amudha Kandasamy ◽  
...  

2016 ◽  
Vol 51 (20) ◽  
pp. 2853-2862 ◽  
Author(s):  
Serkan Ballı

The aim of this study is to diagnose and classify the failure modes for two serial fastened sandwich composite plates using data mining techniques. The composite material used in the study was manufactured using glass fiber reinforced layer and aluminum sheets. Obtained results of previous experimental study for sandwich composite plates, which were mechanically fastened with two serial pins or bolts were used for classification of failure modes. Furthermore, experimental data from previous study consists of different geometrical parameters for various applied preload moments as 0 (pinned), 2, 3, 4, and 5 Nm (bolted). In this study, data mining methods were applied by using these geometrical parameters and pinned/bolted joint configurations. Therefore, three geometrical parameters and 100 test data were used for classification by utilizing support vector machine, Naive Bayes, K-Nearest Neighbors, Logistic Regression, and Random Forest methods. According to experiments, Random Forest method achieved better results than others and it was appropriate for diagnosing and classification of the failure modes. Performances of all data mining methods used were discussed in terms of accuracy and error ratios.


Sign in / Sign up

Export Citation Format

Share Document