Stacking Regression Algorithms to Predict PM2.5 in the Smart City Using Internet of Things
Background and Objective: With the increase in populations in urban areas, there is an increase in pollution also. Air pollution is one of the challenging environmental issues in smart cities. Real-time monitoring of air quality can help the administration to take appropriate decisions on time. Development in the Internet of Things based sensors has changed the way to monitor air quality. Methods: In this paper, we have applied two-stage regressions. In the first stage, ten regression algorithms (Decision Tree, Random Forest, Elastic Net, Adaboost, Extra Tree, Linear Regression, Lasso, XGBoost, Light GBM, AdaBoost, and Multi-Layer Perceptron) is applied and in second stage best four algorithms are picked and stacking ensemble algorithms is applied using python to predict the PM2.5 pollutants in air. Data set of five Chinese cities (Beijing, Chengdu, Guangzhou, Shanghai, and Shenyang) has taken into consideration and compared based on MAE (Mean Absolute Error), RMSE (Root Mean Square Error), and R2 parameters. Results and Conclusion: We observed that out of ten regression algorithms applied extra tree algorithm is giving the highest performance on all the five datasets, and stacking further improves the performance. Feature importance for Sheyang, and Beijing city is computed using three regression algorithms, and we found the four most important features are Humidity, wind speed, wind direction, and dew point.