Evaluation of random forest regression and multiple linear regression for predicting indoor fine particulate matter concentrations in a highly polluted city

2019 ◽  
Vol 245 ◽  
pp. 746-753 ◽  
Author(s):  
Weiran Yuchi ◽  
Enkhjargal Gombojav ◽  
Buyantushig Boldbaatar ◽  
Jargalsaikhan Galsuren ◽  
Sarangerel Enkhmaa ◽  
...  
2021 ◽  
Author(s):  
Drew C. Pendergrass ◽  
Daniel J. Jacob ◽  
Shixian Zhai ◽  
Jhoon Kim ◽  
Ja-Ho Koo ◽  
...  

Abstract. We use 2011–2019 aerosol optical depth (AOD) observations from the Geostationary Ocean Color Imager (GOCI) instrument over East Asia to infer 24-h daily surface fine particulate matter (PM2.5) concentrations at continuous 6x6 km2 resolution over eastern China, South Korea, and Japan. This is done with a random forest (RF) algorithm applied to the gap-filled GOCI AODs and other data and trained with PM2.5 observations from the three national networks. The predicted 24-h PM2.5 concentrations for sites entirely withheld from training in a ten-fold crossvalidation procedure correlate highly with network observations (R2 = 0.89) with single-value precision of 26–32 % depending on country. Prediction of annual mean values has R2 = 0.96 and single-value precision of 12 %. The RF algorithm is only moderately successful for diagnosing local exceedances of the National Ambient Air Quality Standard (NAAQS) because these exceedances are typically within the single-value precisions of the RF, and also because of RF smoothing of extreme PM2.5 concentrations. The area-weighted and population-weighted trends of RF PM2.5 concentrations for eastern China, South Korea, and Japan show steady 2015–2019 declines consistent with surface networks, but the surface networks in eastern China and South Korea underestimate population exposure. Further examination of RF PM2.5 fields for South Korea identifies hotspots where surface network sites were initially lacking and shows 2015–2019 PM2.5 decreases across the country except for flat concentrations in the Seoul metropolitan area. Inspection of monthly PM2.5 time series in Beijing, Seoul, and Tokyo shows that the RF algorithm successfully captures observed seasonal variations of PM2.5 even though AOD and PM2.5 often have opposite seasonalities. Application of the RF algorithm to urban pollution episodes in Seoul and Beijing demonstrates high skill in reproducing the observed day-to-day variations in air quality as well as spatial patterns on the 6 km scale. Comparison to a CMAQ simulation for the Korean peninsula demonstrates the value of the continuous RF PM2.5 fields for testing air quality models, including over North Korea where they offer a unique resource.


2019 ◽  
Vol 46 (5) ◽  
pp. 353-363 ◽  
Author(s):  
Chaozhe Jiang ◽  
Ping Huang ◽  
Javad Lessan ◽  
Liping Fu ◽  
Chao Wen

Accurate prediction of recoverable train delay can support the train dispatchers’ decision-making with timetable rescheduling and improving service reliability. In this paper, we present the results of an effort aimed to develop primary delay recovery (PDR) predictor model using train operation records from Wuhan-Guangzhou (W-G) high-speed railway. To this end, we first identified the main variables that contribute to delay, including dwell buffer time, running buffer time, magnitude of primary delay time, and individual sections’ influence. Different models are applied and calibrated to predict the PDR. The validation results on test datasets indicate that the random forest regression (RFR) model outperforms the other three alternative models, namely, multiple linear regression (MLR), support vector machine (SVM), and artificial neural networks (ANN) regarding prediction accuracy measure. Specifically, the evaluation results show that when the prediction tolerance is less than 1 min, the RFR model can achieve up to 80.4% of prediction accuracy, while the accuracy level is 44.4%, 78.5%, and 78.5% for MLR, SVM, and ANN models, respectively.


2020 ◽  
Vol 82 (8) ◽  
pp. 1586-1602
Author(s):  
Bahareh Beigzadeh ◽  
Mehdi Bahrami ◽  
Mohammad Javad Amiri ◽  
Mohammad Reza Mahmoudi

Abstract The mathematical model's usage in water quality prediction has received more interest recently. In this research, the potential of random forest regression (RFR), Bayesian multiple linear regression (BMLR), and multiple linear regression (MLR) were examined to predict the amount of 2,4-dichlorophenoxy acetic acid (2,4-D) elimination by rice husk biochar from synthetic wastewater, using five input operating parameters including initial 2,4-D concentration, adsorbent dosage, pH, reaction time, and temperature. The equilibrium and kinetic adsorption data were fitted best to the Freundlich and pseudo-first-order models. The thermodynamic parameters also indicated the exothermic and spontaneous nature of adsorption. The modeling results indicated an R2 of 0.994, 0.992, and 0.945 and RMSE of 1.92, 6.17, and 2.10 for the relationship between the model-estimated and measured values of 2,4-D removal for RFR, BMLR, and MLR, respectively. Overall performances indicated more proficiency of RFR than the BMLR and MLR models due to its capability in capturing the non-linear relationships between input data and their associated removal capacities. The sensitivity analysis demonstrated that the 2,4-D adsorption process is more sensitive to initial 2,4-D concentration and adsorbent dosage. Thus, it is possible to permanently monitor waters more cost-effectively with the suggested model application.


2018 ◽  
Vol 52 (7) ◽  
pp. 4173-4179 ◽  
Author(s):  
Cole Brokamp ◽  
Roman Jandarov ◽  
Monir Hossain ◽  
Patrick Ryan

Foods ◽  
2020 ◽  
Vol 9 (9) ◽  
pp. 1326
Author(s):  
Tao Zhang ◽  
Shanshan Zhang ◽  
Lan Chen ◽  
Hao Ding ◽  
Pengfei Wu ◽  
...  

To identify metabolic biomarkers related to the freshness of chilled chicken, ultra-high-performance liquid chromatography–mass spectrometry (UHPLC–MS/MS) was used to obtain profiles of the metabolites present in chilled chicken stored for different lengths of time. Random forest regression analysis and stepwise multiple linear regression were used to identify key metabolic biomarkers related to the freshness of chilled chicken. A total of 265 differential metabolites were identified during storage of chilled chicken. Of these various metabolites, 37 were selected as potential biomarkers by random forest regression analysis. Receiver operating characteristic (ROC) curve analysis indicated that the biomarkers identified using random forest regression analysis showed a strong correlation with the freshness of chilled chicken. Subsequently, stepwise multiple linear regression analysis based on the biomarkers identified by using random forest regression analysis identified indole-3-carboxaldehyde, uridine monophosphate, s-phenylmercapturic acid, gluconic acid, tyramine, and Serylphenylalanine as key metabolic biomarkers. In conclusion, our study characterized the metabolic profiles of chilled chicken stored for different lengths of time and identified six key metabolic biomarkers related to the freshness of chilled chicken. These findings can contribute to a better understanding of the changes in the metabolic profiles of chilled chicken during storage and provide a basis for the further development of novel detection methods for the freshness of chilled chicken.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Guanghua Yan ◽  
Xi Chen ◽  
Yun Zhang

Based on the population change data of 2005-2009, 2010-2014, 2015-2019 and 2005-2019, the shrinking cities in Northeast China are determined to analyze their spatial distribution pattern. And the influencing factors and effects of Shrinking Cities in Northeast China are explored by using multiple linear regression method and random forest regression method. The results show that: 1) In space, the shrinking cities in Northeast China are mainly distributed in the “land edge” areas represented by Changbai Mountain, Sanjiang Plain, Xiaoxing’an Mountain and Daxing’an Mountain. In terms of time, the contraction center shows an obvious trend of moving northward, while the opposite expansion center shows a trend of moving southward, and the Shrinking Cities gather further; 2) in the study of influencing factors, the results of multiple linear regression and random forest regression show that socio-economic factors play a major role in the formation of shrinking cities; 3) the precision of random forest regression is higher than that of multiple linear regression. The results show that per capita GDP has the greatest impact on the contraction intensity, followed by the unemployment rate, science and education expenses and the average wage of on-the-job workers. Among the four influencing factors, only the unemployment rate promotes the contraction, and the other three influencing factors inhibit the formation of shrinking cities to various degrees.


Sign in / Sign up

Export Citation Format

Share Document