scholarly journals A National-Scale 1-km Resolution PM2.5 Estimation Model over Japan Using MAIAC AOD and a Two-Stage Random Forest Model

2021 ◽  
Vol 13 (18) ◽  
pp. 3657
Author(s):  
Chau-Ren Jung ◽  
Wei-Ting Chen ◽  
Shoji F. Nakayama

Satellite-based models for estimating concentrations of particulate matter with an aerodynamic diameter less than 2.5 μm (PM2.5) have seldom been developed in islands with complex topography over the monsoon area, where the transport of PM2.5 is influenced by both the synoptic-scale winds and local-scale circulations compared with the continental regions. We validated Multi-Angle Implementation of Atmospheric Correction (MAIAC) aerosol optical depth (AOD) with ground observations in Japan and developed a 1-km-resolution national-scale model between 2011 and 2016 to estimate daily PM2.5 concentrations. A two-stage random forest model integrating MAIAC AOD with meteorological variables and land use data was applied to develop the model. The first-stage random forest model was used to impute the missing AOD values. The second-stage random forest model was then utilised to estimate ground PM2.5 concentrations. Ten-fold cross-validation was performed to evaluate the model performance. There was good consistency between MAIAC AOD and ground truth in Japan (correlation coefficient = 0.82 and 74.62% of data falling within the expected error). For model training, the model showed a training coefficient of determination (R2) of 0.98 and a root mean square error (RMSE) of 1.22 μg/m3. For the 10-fold cross-validation, the cross-validation R2 and RMSE of the model were 0.86 and 3.02 μg/m3, respectively. A subsite validation was used to validate the model at the grids overlapping with the AERONET sites, and the model performance was excellent at these sites with a validation R2 (RMSE) of 0.94 (1.78 μg/m3). Additionally, the model performance increased as increased AOD coverage. The top-ten important predictors for estimating ground PM2.5 concentrations were day of the year, temperature, AOD, relative humidity, 10-m-height zonal wind, 10-m-height meridional wind, boundary layer height, precipitation, surface pressure, and population density. MAIAC AOD showed high retrieval accuracy in Japan. The performance of the satellite-based model was excellent, which showed that PM2.5 estimates derived from the model were reliable and accurate. These estimates can be used to assess both the short-term and long-term effects of PM2.5 on health outcomes in epidemiological studies.

Energies ◽  
2020 ◽  
Vol 13 (7) ◽  
pp. 1786
Author(s):  
Linh T. T. Ho ◽  
Laurent Dubus ◽  
Matteo De Felice ◽  
Alberto Troccoli

Hydro power can provide a source of dispatchable low-carbon electricity and a storage solution in a climate-dependent energy mix with high shares of wind and solar production. Therefore, understanding the effect climate has on hydro power generation is critical to ensure a stable energy supply, particularly at a continental scale. Here, we introduce a framework using climate data to model hydro power generation at the country level based on a machine learning method, the random forest model, to produce a publicly accessible hydro power dataset from 1979 to present for twelve European countries. In addition to producing a consistent European hydro power generation dataset covering the past 40 years, the specific novelty of this approach is to focus on the lagged effect of climate variability on hydro power. Specifically, multiple lagged values of temperature and precipitation are used. Overall, the model shows promising results, with the correlation values ranging between 0.85 and 0.98 for run-of-river and between 0.73 and 0.90 for reservoir-based generation. Compared to the more standard optimal lag approach the normalised mean absolute error reduces by an average of 10.23% and 5.99%, respectively. The model was also implemented over six Italian bidding zones to also test its skill at the sub-country scale. The model performance is only slightly degraded at the bidding zone level, but this also depends on the actual installed capacity, with higher capacities displaying higher performance. The framework and results presented could provide a useful reference for applications such as pan-European (continental) hydro power planning and for system adequacy and extreme events assessments.


EP Europace ◽  
2019 ◽  
Vol 21 (9) ◽  
pp. 1307-1312 ◽  
Author(s):  
Wei-Syun Hu ◽  
Meng-Hsuen Hsieh ◽  
Cheng-Li Lin

Abstract Aims We aimed to construct a random forest model to predict atrial fibrillation (AF) in Chinese population. Methods and results This study was comprised of 682 237 subjects with or without AF. Each subject had 19 features that included the subjects’ age, gender, underlying diseases, CHA2DS2-VASc score, and follow-up period. The data were split into train and test sets at an approximate 9:1 ratio: 614 013 data points were placed into the train set and 68 224 data points were placed into the test set. In this study, weighted average F1, precision, and recall values were used to measure prediction model performance. The F1, precision, and recall values were calculated across the train set, the test set, and all data. The area under receiving operating characteristic (ROC) curve was also used to evaluate the performance of the prediction model. The prediction model achieved a k-fold cross-validation accuracy of 0.979 (k = 10). In the test set, the prediction model achieved an F1 value of 0.968, precision value of 0.958, and recall value of 0.979. The area under ROC curve of the model was 0.948 (95% confidence interval 0.947–0.949). This model was validated with a separate dataset. Conclusions This study showed a novel AF risk prediction scheme for Chinese individuals with random forest model methodology.


2021 ◽  
Vol 248 ◽  
pp. 105146 ◽  
Author(s):  
Tingting Jiang ◽  
Bin Chen ◽  
Zhen Nie ◽  
Zhehao Ren ◽  
Bing Xu ◽  
...  

2021 ◽  
Vol 5 (7 (113)) ◽  
pp. 59-65
Author(s):  
Nadia Moneem Al-Abdaly ◽  
Salwa R. Al-Taai ◽  
Hamza Imran ◽  
Majed Ibrahim

Because of the incorporation of discontinuous fibers, steel fiber-reinforced concrete (SFRC) outperforms regular concrete. However, due to its complexity and limited available data, the development of SFRC strength prediction techniques is still in its infancy when compared to that of standard concrete. In this paper, the compressive strength of steel fiber-reinforced concrete was predicted from different variables using the Random forest model. Case studies of 133 samples were used for this aim. To design and validate the models, we generated training and testing datasets. The proposed models were developed using ten important material parameters for steel fiber-reinforced concrete characterization. To minimize training and testing split bias, the approach used in this study was validated using the 10-fold Cross-Validation procedure. To determine the optimal hyperparameters for the Random Forest algorithm, the Grid Search Cross-Validation approach was utilized. The root mean square error (RMSE), coefficient of determination (R2), and mean absolute error (MAE) between measured and estimated values were used to validate and compare the models. The prediction performance with RMSE=5.66, R2=0.88 and MAE=3.80 for the Random forest model. Compared with the traditional linear regression model, the outcomes showed that the Random forest model is able to produce enhanced predictive results of the compressive strength of steel fiber-reinforced concrete. The findings show that hyperparameter tuning with grid search and cross-validation is an efficient way to find the optimal parameters for the RF method. Also, RF produces good results and gives an alternate way for anticipating the compressive strength of SFRC


Author(s):  
Hyunje Yang ◽  
Honggeun Lim ◽  
Hyung Tae Choi

Soil water holding capacities (SWHCs) is important input factor in hydrological simulation models for sustainable water management. Forests that covered 63% of South Korea are the main source of clean water, and it is essential to estimate SWHCs on a nationwide scale for effective forest water resources management. However, there are a few studies estimating SWHCs on a nationwide scale in the temperate regions especially in South Korea. Fortunately, forest spatial big data have been collected on a national scale, and the nationwide prediction of the SWHC can be possible with this dataset. In this study, spatial prediction of forest SWHCs (saturated water content, water content at pF1.8 and 2.7) was conducted with 953 forest soil samples and forest spatial big dataset. 4 soil properties and 14 environmental covariates were used for predicting SWHCs. Simple linear regression and random forest model were compared for selecting the optimal predictive model. From the variable importance analysis, environmental covariates had as big importance as soil properties had. And prediction performance of the model with environmental covariates as the input data was higher than that of the model with soil properties. Comparing two models, the random forest model could accurately and stably predict SWHCs than the simple linear model. As a result of spatial prediction of SWHCs at the national scale through the random forest model and the forest spatial big dataset, it was confirmed that higher SWHCs were distributed along with the Baekdudaegan, the watershed-crest-line in South Korea.


2020 ◽  
Vol 12 (12) ◽  
pp. 1986 ◽  
Author(s):  
Johanna Orellana-Alvear ◽  
Rolando Célleri ◽  
Rütger Rollenbeck ◽  
Paul Muñoz ◽  
Pablo Contreras ◽  
...  

Discharge forecasting is a key component for early warning systems and extremely useful for decision makers. Forecasting models require accurate rainfall estimations of high spatial resolution and other geomorphological characteristics of the catchment, which are rarely available in remote mountain regions such as the Andean highlands. While radar data is available in some mountain areas, the absence of a well distributed rain gauge network makes it hard to obtain accurate rainfall maps. Thus, this study explored a Random Forest model and its ability to leverage native radar data (i.e., reflectivity) by providing a simplified but efficient discharge forecasting model for a representative mountain catchment in the southern Andes of Ecuador. This model was compared with another that used as input derived radar rainfall (i.e., rainfall depth), obtained after the transformation from reflectivity to rainfall rate by using a local Z-R relation and a rain gauge-based bias adjustment. In addition, the influence of a soil moisture proxy was evaluated. Radar and runoff data from April 2015 to June 2017 were used. Results showed that (i) model performance was similar by using either native or derived radar data as inputs (0.66 < NSE < 0.75; 0.72 < KGE < 0.78). Thus, exhaustive pre-processing for obtaining radar rainfall estimates can be avoided for discharge forecasting. (ii) Soil moisture representation as input of the model did not significantly improve model performance (i.e., NSE increased from 0.66 to 0.68). Finally, this native radar data-based model constitutes a promising alternative for discharge forecasting in remote mountain regions where ground monitoring is scarce and hardly available.


Author(s):  
Lijuan Yang ◽  
Hanqiu Xu ◽  
Shaode Yu

AbstractThe coarse moderate-resolution imaging spectroradiometer (MODIS) aerosol optical depth (AOD) product (spatial resolution: 3 km) retrieved by dark-target algorithm always generates the missing values when being adopted to estimate the ground-level PM2.5 concentrations. In this study, we developed a two-stage random forest using MODIS 3 km AOD to obtain the PM2.5 concentrations with full-coverage in a contiguous coastal developed region, i.e., Yangtze River Delta-Fujian-Pearl River Delta region of China (YRD-FJ-PRD). A first-stage random forest integrated six meteorological fields was employed to predict the missing values of AOD product, and the combined AOD (i.e., random forest derived AOD and MODIS 3 km AOD) incorporated with other ancillary variables were developed for predicting PM2.5 concentrations within a second-stage random forest model. The results showed that the first-stage random forest could explain 94% of the AOD variability over YRD-FJ-PRD region, and we achieved a site-based cross validation (CV) R2 of 0.87 and a time-based CV R2 of 0.85, respectively. The full-coverage PM2.5 concentrations illustrated a spatial pattern with annual-mean PM2.5 of 46, 40 and 35 μg/m3 in YRD, PRD and FJ, respectively, sharing the same trend with previous studies. Our results indicated that the proposed two-stage random forest model could be effectively used for PM2.5 estimation in different areas.


2019 ◽  
Vol 11 (6) ◽  
pp. 722 ◽  
Author(s):  
Xiaofang Sun ◽  
Guicai Li ◽  
Meng Wang ◽  
Zemeng Fan

Accurate estimation of forest aboveground biomass (AGB) is important for carbon accounting. Forest AGB estimation has been conducted with a variety of data sources and prediction methods, but many uncertainties still exist. In this study, six prediction methods, including Gaussian processes, stepwise linear regression, nonlinear regression using a logistic model, partial least squares regression, random forest, and support vector machines were used to estimate forest AGB in Jiangxi Province, China, by combining Geoscience Laser Altimeter System (GLAS) data, Moderate Resolution Imaging Spectroradiometer (MODIS) data, and field measurements. We compared the effect of three factors (prediction methods, sample sizes of field measurements, and cross-validation settings) on the predictive quality of the methods. The results showed that the prediction methods had the most considerable effect on the prediction quality. In most cases, random forest produced more accurate estimates than the other methods. The sample sizes had an obvious effect on accuracy, especially for the random forest model. The accuracy increased with increasing sample sizes. The random forest algorithm with a large number of field measurements, was the most precise (coefficient of determination (R2) = 0.73, root mean square error (RMSE) = 23.58 Mg/ha). Increasing the number of folds within the cross-validation settings improved the R2 values. However, no apparent change occurred in RMSE for different numbers of folds. Finally, the wall-to-wall forest AGB map over the study area was generated using the random forest model.


2019 ◽  
Vol 11 (6) ◽  
pp. 641 ◽  
Author(s):  
Bryan Vu ◽  
Odón Sánchez ◽  
Jianzhao Bi ◽  
Qingyang Xiao ◽  
Nadia Hansel ◽  
...  

It is well recognized that exposure to fine particulate matter (PM2.5) affects health adversely, yet few studies from South America have documented such associations due to the sparsity of PM2.5 measurements. Lima’s topography and aging vehicular fleet results in severe air pollution with limited amounts of monitors to effectively quantify PM2.5 levels for epidemiologic studies. We developed an advanced machine learning model to estimate daily PM2.5 concentrations at a 1 km2 spatial resolution in Lima, Peru from 2010 to 2016. We combined aerosol optical depth (AOD), meteorological fields from the European Centre for Medium-Range Weather Forecasts (ECMWF), parameters from the Weather Research and Forecasting model coupled with Chemistry (WRF-Chem), and land use variables to fit a random forest model against ground measurements from 16 monitoring stations. Overall cross-validation R2 (and root mean square prediction error, RMSE) for the random forest model was 0.70 (5.97 μg/m3). Mean PM2.5 for ground measurements was 24.7 μg/m3 while mean estimated PM2.5 was 24.9 μg/m3 in the cross-validation dataset. The mean difference between ground and predicted measurements was −0.09 μg/m3 (Std.Dev. = 5.97 μg/m3), with 94.5% of observations falling within 2 standard deviations of the difference indicating good agreement between ground measurements and predicted estimates. Surface downwards solar radiation, temperature, relative humidity, and AOD were the most important predictors, while percent urbanization, albedo, and cloud fraction were the least important predictors. Comparison of monthly mean measurements between ground and predicted PM2.5 shows good precision and accuracy from our model. Furthermore, mean annual maps of PM2.5 show consistent lower concentrations in the coast and higher concentrations in the mountains, resulting from prevailing coastal winds blown from the Pacific Ocean in the west. Our model allows for construction of long-term historical daily PM2.5 measurements at 1 km2 spatial resolution to support future epidemiological studies.


PLoS ONE ◽  
2012 ◽  
Vol 7 (8) ◽  
pp. e43847 ◽  
Author(s):  
Mingjun Wang ◽  
Xing-Ming Zhao ◽  
Kazuhiro Takemoto ◽  
Haisong Xu ◽  
Yuan Li ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document