Economic indicators and bioenergy supply in developed economies: QROF-DEMATEL and random forest models

Over the years, rampant wildfires have plagued the state of California, creating economic and environmental loss. In 2018, wildfires cost nearly 800 million dollars in economic loss and claimed more than 100 lives in California. Over 1.6 million acres of land has burned and caused large sums of environmental damage. Although, recently, researchers have introduced machine learning models and algorithms in predicting the wildfire risks, these results focused on special perspectives and were restricted to a limited number of data parameters. In this paper, we have proposed two data-driven machine learning approaches based on random forest models to predict the wildfire risk at areas near Monticello and Winters, California. This study demonstrated how the models were developed and applied with comprehensive data parameters such as powerlines, terrain, and vegetation in different perspectives that improved the spatial and temporal accuracy in predicting the risk of wildfire including fire ignition. The combined model uses the spatial and the temporal parameters as a single combined dataset to train and predict the fire risk, whereas the ensemble model was fed separate parameters that were later stacked to work as a single model. Our experiment shows that the combined model produced better results compared to the ensemble of random forest models on separate spatial data in terms of accuracy. The models were validated with Receiver Operating Characteristic (ROC) curves, learning curves, and evaluation metrics such as: accuracy, confusion matrices, and classification report. The study results showed and achieved cutting-edge accuracy of 92% in predicting the wildfire risks, including ignition by utilizing the regional spatial and temporal data along with standard data parameters in Northern California.

Download Full-text

Incorporating space and time into random forest models for analyzing geospatial patterns of drug-related crime incidents in a major U.S. metropolitan area

Computers Environment and Urban Systems ◽

10.1016/j.compenvurbsys.2021.101599 ◽

2021 ◽

Vol 87 ◽

pp. 101599

Author(s):

Zhiyue Xia ◽

Kathleen Stewart ◽

Junchuan Fan

Keyword(s):

Random Forest ◽

Metropolitan Area ◽

Space And Time ◽

Forest Models ◽

Random Forest Models

Download Full-text

Landslide susceptibility assessment for a transmission line in Gansu Province, China by using a hybrid approach of fractal theory, information value, and random forest models

Environmental Earth Sciences ◽

10.1007/s12665-021-09737-w ◽

2021 ◽

Vol 80 (12) ◽

Author(s):

Binbin Zhao ◽

Yunfeng Ge ◽

Hongzhi Chen

Keyword(s):

Random Forest ◽

Landslide Susceptibility ◽

Fractal Theory ◽

Hybrid Approach ◽

Gansu Province ◽

Information Value ◽

Susceptibility Assessment ◽

Landslide Susceptibility Assessment ◽

Forest Models ◽

Random Forest Models

Download Full-text

Random forest models of 305-days milk yield for Holstein cows in Bulgaria

10.1063/5.0034778 ◽

2020 ◽

Author(s):

A. Yordanova ◽

H. Kulina

Keyword(s):

Random Forest ◽

Milk Yield ◽

Holstein Cows ◽

Forest Models ◽

Random Forest Models

Download Full-text

Classifying Very High-Dimensional Data with Random Forests Built from Small Subspaces

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2012040103 ◽

2012 ◽

Vol 8 (2) ◽

pp. 44-63 ◽

Cited By ~ 30

Author(s):

Baoxun Xu ◽

Joshua Zhexue Huang ◽

Graham Williams ◽

Qiang Wang ◽

Yunming Ye

Keyword(s):

Random Forest ◽

High Dimensional Data ◽

Real Life ◽

Classification Performance ◽

Feature Weighting ◽

Random Forest Model ◽

High Dimensional ◽

Forest Model ◽

Forest Models ◽

Random Forest Models

The selection of feature subspaces for growing decision trees is a key step in building random forest models. However, the common approach using randomly sampling a few features in the subspace is not suitable for high dimensional data consisting of thousands of features, because such data often contains many features which are uninformative to classification, and the random sampling often doesn’t include informative features in the selected subspaces. Consequently, classification performance of the random forest model is significantly affected. In this paper, the authors propose an improved random forest method which uses a novel feature weighting method for subspace selection and therefore enhances classification performance over high-dimensional data. A series of experiments on 9 real life high dimensional datasets demonstrated that using a subspace size of features where M is the total number of features in the dataset, our random forest model significantly outperforms existing random forest models.

Download Full-text

Gully erosion zonation mapping using integrated geographically weighted regression with certainty factor and random forest models in GIS

Journal of Environmental Management ◽

10.1016/j.jenvman.2018.11.110 ◽

2019 ◽

Vol 232 ◽

pp. 928-942 ◽

Cited By ~ 46

Author(s):

Alireza Arabameri ◽

Biswajeet Pradhan ◽

Khalil Rezaei

Keyword(s):

Random Forest ◽

Geographically Weighted Regression ◽

Gully Erosion ◽

Weighted Regression ◽

Certainty Factor ◽

Forest Models ◽

Random Forest Models

Download Full-text

Data Mining Crystallization Kinetics

10.26434/chemrxiv.11708286 ◽

2020 ◽

Author(s):

Cameron Brown ◽

Diego Maldonado ◽

Antony Vassileiou ◽

Blair Johnston ◽

Alastair Florence

Keyword(s):

Random Forest ◽

Kinetic Parameters ◽

Crystallization Kinetics ◽

Balance Model ◽

Forest Models ◽

Vast Literature ◽

Random Forest Models ◽

Kinetic Expression ◽

Population Balances ◽

Different Sources

<p>Population balance model is a valuable modelling tool which facilitates the optimization and understanding of crystallization processes. However, in order to use this tool, it is necessary to have previous knowledge of the crystallization kinetics, specifically crystal growth and nucleation. The majority of approaches to achieve proper estimations of kinetic parameters required experimental data. Across time, a vast literature about the estimation of kinetic parameters and population balances have been published. Considering the availability of data, this work built a database with information on solute, solvent, kinetic expression, parameters, crystallization method and seeding. Correlations were assessed and clusters structures identified by hierarchical clustering analysis. The final database contains 336 data of kinetic parameters from 185 different sources. The data were analysed using kinetic parameters of the most common expressions. Subsequently, clusters were identified for each kinetic model. With these clusters, classification random forest models were made using solute descriptors, seeding, solvent, and crystallization methods as classifiers. Random forest models had an overall classification accuracy higher than 70% whereby they were useful to provide rough estimates of kinetic parameters, although these methods have some limitations.</p>

Download Full-text

For Honor, for Toxicity

Proceedings of the ACM on Human-Computer Interaction ◽

10.1145/3474680 ◽

2021 ◽

Vol 5 (CHI PLAY) ◽

pp. 1-29

Author(s):

Alessandro Canossa ◽

Dmitry Salimov ◽

Ahmad Azadvar ◽

Casper Harteveld ◽

Georgios Yannakakis

Keyword(s):

Machine Learning ◽

Random Forest ◽

Random Forests ◽

Initial Study ◽

Unfair Advantage ◽

Offensive Behavior ◽

Forest Models ◽

Random Forest Models ◽

Action Type ◽

Degree Of Severity

Is it possible to detect toxicity in games just by observing in-game behavior? If so, what are the behavioral factors that will help machine learning to discover the unknown relationship between gameplay and toxic behavior? In this initial study, we examine whether it is possible to predict toxicity in the MOBA gameFor Honor by observing in-game behavior for players that have been labeled as toxic (i.e. players that have been sanctioned by Ubisoft community managers). We test our hypothesis of detecting toxicity through gameplay with a dataset of almost 1,800 sanctioned players, and comparing these sanctioned players with unsanctioned players. Sanctioned players are defined by their toxic action type (offensive behavior vs. unfair advantage) and degree of severity (warned vs. banned). Our findings, based on supervised learning with random forests, suggest that it is not only possible to behaviorally distinguish sanctioned from unsanctioned players based on selected features of gameplay; it is also possible to predict both the sanction severity (warned vs. banned) and the sanction type (offensive behavior vs. unfair advantage). In particular, all random forest models predict toxicity, its severity, and type, with an accuracy of at least 82%, on average, on unseen players. This research shows that observing in-game behavior can support the work of community managers in moderating and possibly containing the burden of toxic behavior.

Download Full-text

Prediction of Gas Turbine Trip: a Novel Methodology Based on Random Forest Models

10.1115/gt2021-58916 ◽

2021 ◽

Author(s):

Enzo Losi ◽

Mauro Venturini ◽

Lucrezia Manservigi ◽

Giuseppe Fabio Ceschini ◽

Giovanni Bechini ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Gas Turbine ◽

Gas Turbines ◽

Remaining Useful Life ◽

Training Data ◽

The Novel ◽

Novel Approach ◽

Forest Models ◽

Random Forest Models

Abstract A gas turbine trip is an unplanned shutdown, of which the most relevant consequences are business interruption and a reduction of equipment remaining useful life. Thus, understanding the underlying causes of gas turbine trip would allow predicting its occurrence in order to maximize gas turbine profitability and improve its availability. In the ever competitive Oil & Gas sector, data mining and machine learning are increasingly being employed to support a deeper insight and improved operation of gas turbines. Among the various machine learning tools, Random Forests are an ensemble learning method consisting of an aggregation of decision tree classifiers. This paper presents a novel methodology aimed at exploiting information embedded in the data and develops Random Forest models, aimed at predicting gas turbine trip based on information gathered during a timeframe of historical data acquired from multiple sensors. The novel approach exploits time series segmentation to increase the amount of training data, thus reducing overfitting. First, data are transformed according to a feature engineering methodology developed in a separate work by the same authors. Then, Random Forest models are trained and tested on unseen observations to demonstrate the benefits of the novel approach. The superiority of the novel approach is proved by considering two real-word case-studies, involving filed data taken during three years of operation of two fleets of Siemens gas turbines located in different regions. The novel methodology allows values of Precision, Recall and Accuracy in the range 75–85 %, thus demonstrating the industrial feasibility of the predictive methodology.

Download Full-text

Abstract MP31: Blood DNA Methylation Signatures of Incident Coronary Heart Disease: An Epigenome-wide Analysis in the Strong Heart Study

Circulation ◽

10.1161/circ.141.suppl_1.mp31 ◽

2020 ◽

Vol 141 (Suppl_1) ◽

Author(s):

Ana Navas-Acien ◽

Arce Domingo-Relloso ◽

Maria Tellez-Plaza ◽

Lizbeth Gomez ◽

Miguel Herreros ◽

...

Keyword(s):

Random Forest ◽

Cross Validation ◽

Single Model ◽

Prediction Ability ◽

Targeted Analysis ◽

C Statistic ◽

Heart Study ◽

Forest Models ◽

Random Forest Models ◽

Traditional Risk Factors

Background: In the US, American Indians suffer a disproportionate burden of CHD compared to other racial/ethnic groups. Additional strategies are needed to identify individuals at risk. Objectives: Investigate the association of blood DNA methylation (DNAm) with incident CHD in the Strong Heart Study and the prediction ability of DNAm beyond traditional risk factors. We maximized prediction ability using Bayesian Hierarchical Cox (BHCox) and Survival Random Forest models, which allow large numbers of CpGs in a single model, instead of considering them individually. Methods: Among 2325 men and women 45-74 years old in 1989-1991, 557 CHD events were identified over 20 years of follow-up. DNAm was measured in 790,026 CpGs, pre-processed and corrected for batch effects. We ran adjusted BHCox models for subsets of CpGs selected from similarly adjusted Cox models for individual CpGs. Prediction ability of CpGs in BHCox was further evaluated with Survival Random Forest, which is robust to overfitting. We also conducted a targeted analysis using Cox regression. Results: 26 CpGs associated with CVD in previous studies were nominally associated with incident CHD in a targeted analysis. The cross-validated C index for the model with traditional risk factors was 0.703. In BHCox, further entering 30K CpGs in a single model, resulted in 231 CpGs being significantly associated with incident CHD, with a cross-validation C statistic of 0.855. In Survival Random Forest, further entering the 231 CpGs from the BHCox model resulted in a cross-validation C statistic of 0.771 and 182 CpGs with variable importance (VIMP) greater than zero. The top CpGs in BHCox were also VIMP>0 in random forest models and were located in SLC24A1 (calcium exchanger), SHBG (sex hormone binding globulin), and LINC00346 (non-protein coding RNA 346). Conclusions: We found novel CpG sites prospectively associated with incident CHD and confirmed signals from previous studies. DNAm might help to identify individuals at high risk of developing CHD.

Download Full-text