A Novel Random Forest and its Application on Classification of Air Quality

Air, an essential natural resource, has been compromised in terms of quality by economic activities. Considerable research has been devoted to predicting instances of poor air quality, but most studies are limited by insufficient longitudinal data, making it difficult to account for seasonal and other factors. Several prediction models have been developed using an 11-year dataset collected by Taiwan’s Environmental Protection Administration (EPA). Machine learning methods, including adaptive boosting (AdaBoost), artificial neural network (ANN), random forest, stacking ensemble, and support vector machine (SVM), produce promising results for air quality index (AQI) level predictions. A series of experiments, using datasets for three different regions to obtain the best prediction performance from the stacking ensemble, AdaBoost, and random forest, found the stacking ensemble delivers consistently superior performance for R2 and RMSE, while AdaBoost provides best results for MAE.

Download Full-text

Handling Complex Missing Data Using Random Forest Approach for an Air Quality Monitoring Dataset: A Case Study of Kuwait Environmental Data (2012 to 2018)

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18031333 ◽

2021 ◽

Vol 18 (3) ◽

pp. 1333

Author(s):

Ahmad R. Alsaber ◽

Jiazhu Pan ◽

Adeeba Al-Hurban

Keyword(s):

Air Quality ◽

Missing Data ◽

Random Forest ◽

Missing Values ◽

Imputation Method ◽

Environmental Data ◽

Environmental Research ◽

Quality Data ◽

Data Set ◽

Air Quality Data

In environmental research, missing data are often a challenge for statistical modeling. This paper addressed some advanced techniques to deal with missing values in a data set measuring air quality using a multiple imputation (MI) approach. MCAR, MAR, and NMAR missing data techniques are applied to the data set. Five missing data levels are considered: 5%, 10%, 20%, 30%, and 40%. The imputation method used in this paper is an iterative imputation method, missForest, which is related to the random forest approach. Air quality data sets were gathered from five monitoring stations in Kuwait, aggregated to a daily basis. Logarithm transformation was carried out for all pollutant data, in order to normalize their distributions and to minimize skewness. We found high levels of missing values for NO2 (18.4%), CO (18.5%), PM10 (57.4%), SO2 (19.0%), and O3 (18.2%) data. Climatological data (i.e., air temperature, relative humidity, wind direction, and wind speed) were used as control variables for better estimation. The results show that the MAR technique had the lowest RMSE and MAE. We conclude that MI using the missForest approach has a high level of accuracy in estimating missing values. MissForest had the lowest imputation error (RMSE and MAE) among the other imputation methods and, thus, can be considered to be appropriate for analyzing air quality data.

Download Full-text

Classification of Germination Images of Pear Pollen Using Random Forest and Convolution Neural Network Models

IEEE Access ◽

10.1109/access.2021.3067677 ◽

2021 ◽

Vol 9 ◽

pp. 45993-45999

Author(s):

Ung Yang ◽

Seungwon Oh ◽

Seung Gon Wi ◽

Bok-Rye Lee ◽

Sang-Hyun Lee ◽

...

Keyword(s):

Neural Network ◽

Random Forest ◽

Network Models ◽

Convolution Neural Network ◽

Neural Network Models

Download Full-text

Image Classification of Rice Leaf Diseases Using Random Forest Algorithm

2021 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunication Engineering ◽

10.1109/ectidamtncon51128.2021.9425696 ◽

2021 ◽

Author(s):

Panuwat Mekha ◽

Nutnicha Teeyasuksaet

Keyword(s):

Random Forest ◽

Image Classification ◽

Random Forest Algorithm ◽

Rice Leaf

Download Full-text

Modeling and multi-class classification of vibroarthographic signals via time domain curvilinear divergence random forest

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-020-02869-0 ◽

2021 ◽

Author(s):

Balajee Alphonse ◽

Venkatesan Rajagopal ◽

Sudhakar Sengan ◽

Kousalya Kittusamy ◽

Amudha Kandasamy ◽

...

Keyword(s):

Random Forest ◽

Time Domain ◽

Multi Class Classification

Download Full-text

Classification of Headache Disorder Using Random Forest Algorithm

2020 4th International Conference on Informatics and Computational Sciences (ICICoS) ◽

10.1109/icicos51170.2020.9299105 ◽

2020 ◽

Author(s):

Dhiyaussalam ◽

Adi Wibowo ◽

Fajar Agung Nugroho ◽

Eko Adi Sarwoko ◽

I Made Agus Setiawan

Keyword(s):

Random Forest ◽

Headache Disorder ◽

Random Forest Algorithm

Download Full-text

Spatial sampling effect on data structure and Random Forest classification of tissue types in High Definition and Standard Definition FT-IR imaging

Chemometrics and Intelligent Laboratory Systems ◽

10.1016/j.chemolab.2021.104407 ◽

2021 ◽

pp. 104407

Author(s):

Danuta Liberda ◽

Karolina Kosowska ◽

Paulina Koziol ◽

Tomasz P. Wrobel

Keyword(s):

Data Structure ◽

Random Forest ◽

Spatial Sampling ◽

High Definition ◽

Standard Definition ◽

Sampling Effect ◽

Random Forest Classification ◽

Forest Classification ◽

Ft Ir

Download Full-text

A data mining approach to the diagnosis of failure modes for two serial fastened sandwich composite plates

Journal of Composite Materials ◽

10.1177/0021998316679720 ◽

2016 ◽

Vol 51 (20) ◽

pp. 2853-2862 ◽

Cited By ~ 2

Author(s):

Serkan Ballı

Keyword(s):

Data Mining ◽

Random Forest ◽

Failure Modes ◽

Composite Plates ◽

Study Data ◽

Sandwich Composite ◽

Support Vector ◽

Geometrical Parameters ◽

Mining Methods

The aim of this study is to diagnose and classify the failure modes for two serial fastened sandwich composite plates using data mining techniques. The composite material used in the study was manufactured using glass fiber reinforced layer and aluminum sheets. Obtained results of previous experimental study for sandwich composite plates, which were mechanically fastened with two serial pins or bolts were used for classification of failure modes. Furthermore, experimental data from previous study consists of different geometrical parameters for various applied preload moments as 0 (pinned), 2, 3, 4, and 5 Nm (bolted). In this study, data mining methods were applied by using these geometrical parameters and pinned/bolted joint configurations. Therefore, three geometrical parameters and 100 test data were used for classification by utilizing support vector machine, Naive Bayes, K-Nearest Neighbors, Logistic Regression, and Random Forest methods. According to experiments, Random Forest method achieved better results than others and it was appropriate for diagnosing and classification of the failure modes. Performances of all data mining methods used were discussed in terms of accuracy and error ratios.

Download Full-text