Predicting Daily Urban Fine Particulate Matter Concentrations Using a Random Forest Model

Abstract. We use 2011–2019 aerosol optical depth (AOD) observations from the Geostationary Ocean Color Imager (GOCI) instrument over East Asia to infer 24-h daily surface fine particulate matter (PM2.5) concentrations at continuous 6x6 km2 resolution over eastern China, South Korea, and Japan. This is done with a random forest (RF) algorithm applied to the gap-filled GOCI AODs and other data and trained with PM2.5 observations from the three national networks. The predicted 24-h PM2.5 concentrations for sites entirely withheld from training in a ten-fold crossvalidation procedure correlate highly with network observations (R2 = 0.89) with single-value precision of 26–32 % depending on country. Prediction of annual mean values has R2 = 0.96 and single-value precision of 12 %. The RF algorithm is only moderately successful for diagnosing local exceedances of the National Ambient Air Quality Standard (NAAQS) because these exceedances are typically within the single-value precisions of the RF, and also because of RF smoothing of extreme PM2.5 concentrations. The area-weighted and population-weighted trends of RF PM2.5 concentrations for eastern China, South Korea, and Japan show steady 2015–2019 declines consistent with surface networks, but the surface networks in eastern China and South Korea underestimate population exposure. Further examination of RF PM2.5 fields for South Korea identifies hotspots where surface network sites were initially lacking and shows 2015–2019 PM2.5 decreases across the country except for flat concentrations in the Seoul metropolitan area. Inspection of monthly PM2.5 time series in Beijing, Seoul, and Tokyo shows that the RF algorithm successfully captures observed seasonal variations of PM2.5 even though AOD and PM2.5 often have opposite seasonalities. Application of the RF algorithm to urban pollution episodes in Seoul and Beijing demonstrates high skill in reproducing the observed day-to-day variations in air quality as well as spatial patterns on the 6 km scale. Comparison to a CMAQ simulation for the Korean peninsula demonstrates the value of the continuous RF PM2.5 fields for testing air quality models, including over North Korea where they offer a unique resource.

Download Full-text

Evaluation of random forest regression and multiple linear regression for predicting indoor fine particulate matter concentrations in a highly polluted city

Environmental Pollution ◽

10.1016/j.envpol.2018.11.034 ◽

2019 ◽

Vol 245 ◽

pp. 746-753 ◽

Cited By ~ 30

Author(s):

Weiran Yuchi ◽

Enkhjargal Gombojav ◽

Buyantushig Boldbaatar ◽

Jargalsaikhan Galsuren ◽

Sarangerel Enkhmaa ◽

...

Keyword(s):

Particulate Matter ◽

Random Forest ◽

Linear Regression ◽

Multiple Linear Regression ◽

Fine Particulate Matter ◽

Random Forest Regression ◽

Fine Particulate

Download Full-text

Developing an Advanced PM2.5 Exposure Model in Lima, Peru

Remote Sensing ◽

10.3390/rs11060641 ◽

2019 ◽

Vol 11 (6) ◽

pp. 641 ◽

Cited By ~ 9

Author(s):

Bryan Vu ◽

Odón Sánchez ◽

Jianzhao Bi ◽

Qingyang Xiao ◽

Nadia Hansel ◽

...

Keyword(s):

Random Forest ◽

Spatial Resolution ◽

Cross Validation ◽

Fine Particulate Matter ◽

Epidemiological Studies ◽

Random Forest Model ◽

Exposure Model ◽

Validation Dataset ◽

Good Precision ◽

Forest Model

It is well recognized that exposure to fine particulate matter (PM2.5) affects health adversely, yet few studies from South America have documented such associations due to the sparsity of PM2.5 measurements. Lima’s topography and aging vehicular fleet results in severe air pollution with limited amounts of monitors to effectively quantify PM2.5 levels for epidemiologic studies. We developed an advanced machine learning model to estimate daily PM2.5 concentrations at a 1 km2 spatial resolution in Lima, Peru from 2010 to 2016. We combined aerosol optical depth (AOD), meteorological fields from the European Centre for Medium-Range Weather Forecasts (ECMWF), parameters from the Weather Research and Forecasting model coupled with Chemistry (WRF-Chem), and land use variables to fit a random forest model against ground measurements from 16 monitoring stations. Overall cross-validation R2 (and root mean square prediction error, RMSE) for the random forest model was 0.70 (5.97 μg/m3). Mean PM2.5 for ground measurements was 24.7 μg/m3 while mean estimated PM2.5 was 24.9 μg/m3 in the cross-validation dataset. The mean difference between ground and predicted measurements was −0.09 μg/m3 (Std.Dev. = 5.97 μg/m3), with 94.5% of observations falling within 2 standard deviations of the difference indicating good agreement between ground measurements and predicted estimates. Surface downwards solar radiation, temperature, relative humidity, and AOD were the most important predictors, while percent urbanization, albedo, and cloud fraction were the least important predictors. Comparison of monthly mean measurements between ground and predicted PM2.5 shows good precision and accuracy from our model. Furthermore, mean annual maps of PM2.5 show consistent lower concentrations in the coast and higher concentrations in the mountains, resulting from prevailing coastal winds blown from the Pacific Ocean in the west. Our model allows for construction of long-term historical daily PM2.5 measurements at 1 km2 spatial resolution to support future epidemiological studies.

Download Full-text

Design of Machine Learning Prediction System Based on the Internet of Things Framework for Monitoring Fine PM Concentrations

Environments ◽

10.3390/environments8100099 ◽

2021 ◽

Vol 8 (10) ◽

pp. 99

Author(s):

Shun-Yuan Wang ◽

Wen-Bin Lin ◽

Yu-Chieh Shu

Keyword(s):

Machine Learning ◽

Air Pollution ◽

Particulate Matter ◽

Random Forest ◽

Internet Of Things ◽

Random Forest Model ◽

The Internet ◽

Learning Models ◽

Forest Model ◽

The Internet Of Things

In this study, a mobile air pollution sensing unit based on the Internet of Things framework was designed for monitoring the concentration of fine particulate matter in three urban areas. This unit was developed using the NodeMCU-32S microcontroller, PMS5003-G5 (particulate matter sensing module), and Ublox NEO-6M V2 (GPS positioning module). The sensing unit transmits data of the particulate matter concentration and coordinates of a polluted location to the backend server through 3G and 4G telecommunication networks for data collection. This system will complement the government’s PM2.5 data acquisition system. Mobile monitoring stations meet the air pollution monitoring needs of some areas that require special observation. For example, an AIoT development system will be installed. At intersections with intensive traffic, it can be used as a reference for government transportation departments or environmental inspection departments for environmental quality monitoring or evacuation of traffic flow. Furthermore, the particulate matter distributions in three areas, namely Xinzhuang, Sanchong, and Luzhou Districts, which are all in New Taipei City of Taiwan, were estimated using machine learning models, the data of stationary monitoring stations, and the measurements of the mobile sensing system proposed in this study. Four types of learning models were trained, namely the decision tree, random forest, multilayer perceptron, and radial basis function neural network, and their prediction results were evaluated. The root mean square error was used as the performance indicator, and the learning results indicate that the random forest model outperforms the other models for both the training and testing sets. To examine the generalizability of the learning models, the models were verified in relation to data measured on three days: 15 February, 28 February, and 1 March 2019. A comparison between the model predicted and the measured data indicates that the random forest model provides the most stable and accurate prediction values and could clearly present the distribution of highly polluted areas. The results of these models are visualized in the form of maps by using a web application. The maps allow users to understand the distribution of polluted areas intuitively.

Download Full-text