A data-driven method of traffic emissions mapping with land use random forest models

Over the years, rampant wildfires have plagued the state of California, creating economic and environmental loss. In 2018, wildfires cost nearly 800 million dollars in economic loss and claimed more than 100 lives in California. Over 1.6 million acres of land has burned and caused large sums of environmental damage. Although, recently, researchers have introduced machine learning models and algorithms in predicting the wildfire risks, these results focused on special perspectives and were restricted to a limited number of data parameters. In this paper, we have proposed two data-driven machine learning approaches based on random forest models to predict the wildfire risk at areas near Monticello and Winters, California. This study demonstrated how the models were developed and applied with comprehensive data parameters such as powerlines, terrain, and vegetation in different perspectives that improved the spatial and temporal accuracy in predicting the risk of wildfire including fire ignition. The combined model uses the spatial and the temporal parameters as a single combined dataset to train and predict the fire risk, whereas the ensemble model was fed separate parameters that were later stacked to work as a single model. Our experiment shows that the combined model produced better results compared to the ensemble of random forest models on separate spatial data in terms of accuracy. The models were validated with Receiver Operating Characteristic (ROC) curves, learning curves, and evaluation metrics such as: accuracy, confusion matrices, and classification report. The study results showed and achieved cutting-edge accuracy of 92% in predicting the wildfire risks, including ignition by utilizing the regional spatial and temporal data along with standard data parameters in Northern California.

Download Full-text

Assessing soil organic carbon stocks under land-use change scenarios using random forest models

Carbon Management ◽

10.1080/17583004.2018.1553434 ◽

2019 ◽

Vol 10 (1) ◽

pp. 63-77 ◽

Cited By ~ 6

Author(s):

K. Nabiollahi ◽

Sh. Eskandari ◽

R. Taghizadeh-Mehrjardi ◽

R. Kerry ◽

J. Triantafilis

Keyword(s):

Land Use ◽

Random Forest ◽

Organic Carbon ◽

Land Use Change ◽

Soil Organic Carbon ◽

Carbon Stocks ◽

Soil Organic Carbon Stocks ◽

Forest Models ◽

Random Forest Models

Download Full-text

A multi-step machine learning approach to assess the impact of COVID-19 lockdown on NO2 attributable deaths in Milan and Rome, Italy

Environmental Health ◽

10.1186/s12940-021-00825-9 ◽

2022 ◽

Vol 21 (1) ◽

Author(s):

Luca Boniardi ◽

Federica Nobile ◽

Massimo Stafoggia ◽

Paola Michelozzi ◽

Carla Ancona

Keyword(s):

Machine Learning ◽

Land Use ◽

Air Pollution ◽

Random Forest ◽

Citizen Science ◽

Learning Approach ◽

Machine Learning Approach ◽

Forest Models ◽

Random Forest Models ◽

The Impact

Abstract Background Air pollution is one of the main concerns for the health of European citizens, and cities are currently striving to accomplish EU air pollution regulation. The 2020 COVID-19 lockdown measures can be seen as an unintended but effective experiment to assess the impact of traffic restriction policies on air pollution. Our objective was to estimate the impact of the lockdown measures on NO2 concentrations and health in the two largest Italian cities. Methods NO2 concentration datasets were built using data deriving from a 1-month citizen science monitoring campaign that took place in Milan and Rome just before the Italian lockdown period. Annual mean NO2 concentrations were estimated for a lockdown scenario (Scenario 1) and a scenario without lockdown (Scenario 2), by applying city-specific annual adjustment factors to the 1-month data. The latter were estimated deriving data from Air Quality Network stations and by applying a machine learning approach. NO2 spatial distribution was estimated at a neighbourhood scale by applying Land Use Random Forest models for the two scenarios. Finally, the impact of lockdown on health was estimated by subtracting attributable deaths for Scenario 1 and those for Scenario 2, both estimated by applying literature-based dose–response function on the counterfactual concentrations of 10 μg/m3. Results The Land Use Random Forest models were able to capture 41–42% of the total NO2 variability. Passing from Scenario 2 (annual NO2 without lockdown) to Scenario 1 (annual NO2 with lockdown), the population-weighted exposure to NO2 for Milan and Rome decreased by 15.1% and 15.3% on an annual basis. Considering the 10 μg/m3 counterfactual, prevented deaths were respectively 213 and 604. Conclusions Our results show that the lockdown had a beneficial impact on air quality and human health. However, compliance with the current EU legal limit is not enough to avoid a high number of NO2 attributable deaths. This contribution reaffirms the potentiality of the citizen science approach and calls for more ambitious traffic calming policies and a re-evaluation of the legal annual limit value for NO2 for the protection of human health.

Download Full-text

Incorporating space and time into random forest models for analyzing geospatial patterns of drug-related crime incidents in a major U.S. metropolitan area

Computers Environment and Urban Systems ◽

10.1016/j.compenvurbsys.2021.101599 ◽

2021 ◽

Vol 87 ◽

pp. 101599

Author(s):

Zhiyue Xia ◽

Kathleen Stewart ◽

Junchuan Fan

Keyword(s):

Random Forest ◽

Metropolitan Area ◽

Space And Time ◽

Forest Models ◽

Random Forest Models

Download Full-text

Landslide susceptibility assessment for a transmission line in Gansu Province, China by using a hybrid approach of fractal theory, information value, and random forest models

Environmental Earth Sciences ◽

10.1007/s12665-021-09737-w ◽

2021 ◽

Vol 80 (12) ◽

Author(s):

Binbin Zhao ◽

Yunfeng Ge ◽

Hongzhi Chen

Keyword(s):

Random Forest ◽

Landslide Susceptibility ◽

Fractal Theory ◽

Hybrid Approach ◽

Gansu Province ◽

Information Value ◽

Susceptibility Assessment ◽

Landslide Susceptibility Assessment ◽

Forest Models ◽

Random Forest Models

Download Full-text

Random forest models of 305-days milk yield for Holstein cows in Bulgaria

10.1063/5.0034778 ◽

2020 ◽

Author(s):

A. Yordanova ◽

H. Kulina

Keyword(s):

Random Forest ◽

Milk Yield ◽

Holstein Cows ◽

Forest Models ◽

Random Forest Models

Download Full-text

Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2012040103 ◽

2012 ◽

Vol 8 (2) ◽

pp. 44-63 ◽

Cited By ~ 30

Author(s):

Baoxun Xu ◽

Joshua Zhexue Huang ◽

Graham Williams ◽

Qiang Wang ◽

Yunming Ye

Keyword(s):

Random Forest ◽

High Dimensional Data ◽

Real Life ◽

Classification Performance ◽

Feature Weighting ◽

Random Forest Model ◽

High Dimensional ◽

Forest Model ◽

Forest Models ◽

Random Forest Models

The selection of feature subspaces for growing decision trees is a key step in building random forest models. However, the common approach using randomly sampling a few features in the subspace is not suitable for high dimensional data consisting of thousands of features, because such data often contains many features which are uninformative to classification, and the random sampling often doesn’t include informative features in the selected subspaces. Consequently, classification performance of the random forest model is significantly affected. In this paper, the authors propose an improved random forest method which uses a novel feature weighting method for subspace selection and therefore enhances classification performance over high-dimensional data. A series of experiments on 9 real life high dimensional datasets demonstrated that using a subspace size of features where M is the total number of features in the dataset, our random forest model significantly outperforms existing random forest models.

Download Full-text

Gully erosion zonation mapping using integrated geographically weighted regression with certainty factor and random forest models in GIS

Journal of Environmental Management ◽

10.1016/j.jenvman.2018.11.110 ◽

2019 ◽

Vol 232 ◽

pp. 928-942 ◽

Cited By ~ 46

Author(s):

Alireza Arabameri ◽

Biswajeet Pradhan ◽

Khalil Rezaei

Keyword(s):

Random Forest ◽

Geographically Weighted Regression ◽

Gully Erosion ◽

Weighted Regression ◽

Certainty Factor ◽

Forest Models ◽

Random Forest Models

Download Full-text

Forest floor temperature and greenness link significantly to canopy attributes in South Africa’s fragmented coastal forests

10.7287/peerj.preprints.27168 ◽

2018 ◽

Author(s):

Marion Pfeifer ◽

Michael JW Boyle ◽

Stuart Dunning ◽

Pieter Olivier

Keyword(s):

Land Use ◽

Food Security ◽

Surface Temperature ◽

Habitat Quality ◽

Ground Surface ◽

Quality Metrics ◽

Ground Vegetation ◽

Ground Surface Temperature ◽

Forest Models ◽

Random Forest Models

Tropical landscapes are changing rapidly due to changes in land use and land management. Being able to predict and monitor land use change impacts on species for conservation or food security concerns requires the use of habitat quality metrics, that are consistent, can be mapped using above - ground sensor data and are relevant for species performance. Here, we focus on ground surface temperature (Thermalground) and ground vegetation greenness (NDVIdown) as potentially suitable metrics of habitat quality. We measure both across habitats differing in tree cover (natural grassland to forest edges to forests and tree plantations) in the human-modified coastal forested landscapes of Kwa-Zulua Natal, South Africa. We show that both habitat quality metrics decline linearly as a function of increasing canopy closure (FCover, %) and canopy leaf area index (LAI). Opening canopies by about 20% or reducing canopy leaf area by 1% would result in an increase of temperatures on the ground by more than 1°C, and an increase in ground vegetation greenness by 0.2 and 0.14 respectively. Upscaling LAI and FCover to develop maps from Landsat imagery using random forest models allowed us to map Thermalground and NDVIdown using the linear relationships. However, map accuracy was constrained by the predictive capacity of the random forest models predicting canopy attributes and the linear models linking canopy attributes to the habitat quality metrics. Accounting for micro-scale variation in temperature is seen as essential to improve biodiversity impact predictions. Our upscaling approach suggests that mapping ground surface temperature based on radiation and vegetation properties might be possible, and that canopy cover maps could provide a useful tool for mapping habitat quality metrics that matter to species. However, we need to increase sampling of surface temperature spatially and temporally to improve and validate upscaled models. We also need to link surface temperature maps to demographic traits of species of different threat status or functions in landscapes with different disturbance and management histories testing for generalities in relationships. The derived understanding could then be exploited for targeted landscape restoration that benefits biodiversity conservation and food security sustainably at the landscape scale.

Download Full-text

Data Mining Crystallization Kinetics

10.26434/chemrxiv.11708286 ◽

2020 ◽

Author(s):

Cameron Brown ◽

Diego Maldonado ◽

Antony Vassileiou ◽

Blair Johnston ◽

Alastair Florence

Keyword(s):

Random Forest ◽

Kinetic Parameters ◽

Crystallization Kinetics ◽

Balance Model ◽

Forest Models ◽

Vast Literature ◽

Random Forest Models ◽

Kinetic Expression ◽

Population Balances ◽

Different Sources

<p>Population balance model is a valuable modelling tool which facilitates the optimization and understanding of crystallization processes. However, in order to use this tool, it is necessary to have previous knowledge of the crystallization kinetics, specifically crystal growth and nucleation. The majority of approaches to achieve proper estimations of kinetic parameters required experimental data. Across time, a vast literature about the estimation of kinetic parameters and population balances have been published. Considering the availability of data, this work built a database with information on solute, solvent, kinetic expression, parameters, crystallization method and seeding. Correlations were assessed and clusters structures identified by hierarchical clustering analysis. The final database contains 336 data of kinetic parameters from 185 different sources. The data were analysed using kinetic parameters of the most common expressions. Subsequently, clusters were identified for each kinetic model. With these clusters, classification random forest models were made using solute descriptors, seeding, solvent, and crystallization methods as classifiers. Random forest models had an overall classification accuracy higher than 70% whereby they were useful to provide rough estimates of kinetic parameters, although these methods have some limitations.</p>

Download Full-text