Random Forest Regression for Optimizing Variable Planting Rates for Corn and Soybean Using High-Resolution Topographical and Soil Data

ABSTRACTIn recent years, planting machinery that enables precise control of the planting rates has become available for corn (Zea mays L.) and soybean (Glycine max L.). With increasingly available topographical and soil information, there is a growing interest in developing variable rate planting strategies to exploit variation in the agri-landscape in order to maximize production. A random forest regression-based approach was developed to model the interactions between planting rate, topography, and soil characteristics and their effects on yield based on on-farm variable rate planting trials for corn and soybean conducted at 27 sites in New York between 2014 and 2018 (57 site-years) in collaboration with the New York Corn and Soybean Growers Association. Planting rate ranked highly in terms of random forest regression variable importance while explaining relatively minimal yield variation in the linear context, indicating that yield response to planting rate likely depends on complex interactions with agri-landscape features. Models were moderately predictive of yield within site-years and across years at a particular site, while the ability to predict yield across sites was low. Relatedly, variable importance measures for the topographical and soil features varied considerably across sites. Together, these results suggest that local testing may provide the most accurate optimized planting rate designs due to the unique set of conditions at each site. The proposed method was extended to identify the optimal variable rate planting design for maximizing yield at each site given the topographical and soil data, and empirical validation of the resulting designs is currently underway.

Download Full-text

On the behaviour of permutation‐based variable importance measures in random forest clustering

Journal of Chemometrics ◽

10.1002/cem.3135 ◽

2019 ◽

Vol 33 (8) ◽

Author(s):

Stefano Nembrini

Keyword(s):

Random Forest ◽

Variable Importance ◽

Variable Importance Measures

Download Full-text

Letter to the Editor: On the stability and ranking of predictors from random forest variable importance measures

Briefings in Bioinformatics ◽

10.1093/bib/bbr016 ◽

2011 ◽

Vol 12 (4) ◽

pp. 369-373 ◽

Cited By ~ 66

Author(s):

K. K. Nicodemus

Keyword(s):

Random Forest ◽

Variable Importance ◽

Letter To The Editor ◽

Variable Importance Measures ◽

The Stability

Download Full-text

An experimental study of the intrinsic stability of random forest variable importance measures

BMC Bioinformatics ◽

10.1186/s12859-016-0900-5 ◽

2016 ◽

Vol 17 (1) ◽

Cited By ~ 20

Author(s):

Huazhen Wang ◽

Fan Yang ◽

Zhiyuan Luo

Keyword(s):

Experimental Study ◽

Random Forest ◽

Variable Importance ◽

Intrinsic Stability ◽

Variable Importance Measures

Download Full-text

Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival

Statistics in Medicine ◽

10.1002/sim.7803 ◽

2018 ◽

Vol 38 (4) ◽

pp. 558-582 ◽

Cited By ~ 34

Author(s):

Hemant Ishwaran ◽

Min Lu

Keyword(s):

Random Forest ◽

Confidence Intervals ◽

Variable Importance ◽

Standard Errors ◽

Random Forest Regression

Download Full-text

Process Variable Importance Analysis by Use of Random Forests in a Shapley Regression Framework

Minerals ◽

10.3390/min10050420 ◽

2020 ◽

Vol 10 (5) ◽

pp. 420

Author(s):

Chris Aldrich

Keyword(s):

Random Forest ◽

Case Studies ◽

Random Forests ◽

Performance Indicator ◽

Variable Importance ◽

Predictor Variables ◽

Importance Measure ◽

Variable Importance Measure ◽

Operational Variables ◽

Variable Importance Measures

Linear regression is often used as a diagnostic tool to understand the relative contributions of operational variables to some key performance indicator or response variable. However, owing to the nature of plant operations, predictor variables tend to be correlated, often highly so, and this can lead to significant complications in assessing the importance of these variables. Shapley regression is seen as the only axiomatic approach to deal with this problem but has almost exclusively been used with linear models to date. In this paper, the approach is extended to random forests, and the results are compared with some of the empirical variable importance measures widely used with these models, i.e., permutation and Gini variable importance measures. Four case studies are considered, of which two are based on simulated data and two on real world data from the mineral process industries. These case studies suggest that the random forest Shapley variable importance measure may be a more reliable indicator of the influence of predictor variables than the other measures that were considered. Moreover, the results obtained with the Gini variable importance measure was as reliable or better than that obtained with the permutation measure of the random forest.

Download Full-text

Empirical characterization of random forest variable importance measures

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2007.08.015 ◽

2008 ◽

Vol 52 (4) ◽

pp. 2249-2260 ◽

Cited By ~ 441

Author(s):

Kellie J. Archer ◽

Ryan V. Kimes

Keyword(s):

Random Forest ◽

Variable Importance ◽

Variable Importance Measures

Download Full-text

On the behaviour of permutation‐based variable importance measures in random forest clustering

Journal of Chemometrics ◽

10.1002/cem.3178 ◽

2019 ◽

Vol 33 (8) ◽

Author(s):

Stefano Nembrini ◽

Tiziano Frigoli

Keyword(s):

Random Forest ◽

Variable Importance ◽

Variable Importance Measures

Download Full-text

Predicting JNK1 Inhibitors Regulating Autophagy in Cancer using Random Forest Classifier

10.1101/459669 ◽

2018 ◽

Author(s):

Chetna Kumari ◽

Naidu Subbarao ◽

Muhammad Abulaish

Keyword(s):

Machine Learning ◽

Protein Kinase ◽

Random Forest ◽

Protein Kinases ◽

Variable Importance ◽

Random Forest Classifier ◽

Selective Inhibitors ◽

Molecular Features ◽

Variable Importance Measures

AbstractAutophagy (in Greek: self-eating) is the cellular process for delivery of heterogenic intracellular material to lysosomal digestion. Protein kinases are integral to the autophagy process, and when dysregulated or mutated cause several human diseases. Atg1, the first autophagy-related protein identified is a serine/threonine protein kinases (STPKs). mTOR (mammalian Target of Rapamycin), AMPK (AMP-activated protein kinase), Akt, MAPK (mitogen-activated protein kinase) and PKC (protein kinase C) are other STPKs which regulate various components/steps of autophagy, and are often deregulated in cancer. MAPK have three subfamilies – ERKs, p38, and JNKs. JNKs (c-Jun N-terminal Kinases) have three isoforms in mammals – JNK1, JNK2, and JNK3, each with distinct cellular locations and functions. JNK1 plays role in starvation induced activation of autophagy, and the context-specific role of autophagy in tumorigenesis establish JNK1 a challenging anticancer drug target. Since JNKs are closely related to other members of MAPK family (p38, MAP kinase and the ERKs), it is difficult to design JNK-selective inhibitors. Designing JNK isoform-selective inhibitors are even more challenging as the ATP-binding sites among all JNKs are highly conserved. Although limited informations are available to explore computational approaches to predict JNK1 inhibitors, it seems diificult to find literature exploring machine learning techniques to predict JNKs inhibitors. This study aims to apply machine learning to predict JNK1 inhibitors regulating autophagy in cancer using Random Forest (RF). Here, RF algorithm is used for two purposes‐ to select and rank the molecular descriptors calculated using PaDEL descriptor software and as clasifier. The descriptors are prioritized by calculating Variable Importance Measures (VIMs) using functions based on mean square error (IncMSE) and node purity (IncNodePurity) of RF. The classification models based on a set of 22 prioritized descriptors shows accuracy 86.36%, precision 88.27% and AUC (Area Under ROC curve) 0.8914. We conclude that machine learning-based compound classification using Random Forest is one of the ligand-based approach that can be opted for virtual screening of large compound library of JNK1 bioactives.Author SummaryOut of the three isoforms of JNKs (cJun N-terminal Kinases) in human (each with distinct cellular locations and functions), JNK1 plays role in starvation induced activation of autophagy. The role of JNK1 in autophagy modulation and dual role of autophagy in tumor cells makes JNK1 a promising anticancer drug target. Since JNKs are closely related to other members of MAPK (Mitogen-Activated Protein Kinases) family, it is difficult to design JNK selective inhibitors. Designing JNK isoformselective inhibitors are even more challenging as the ATP binding sites among all JNKs are highly conserved. Random forest classifier usually outperforms several other machine learning algorithms for classification and prediction tasks in diverse areas of research. In this work, we have used Random Forest algorithm for two purposes: (i) calculating variable importance measures to rank and select molecular features, and (ii) predicting JNK1 inhibitors regulating autophagy in cancer. We have used paDEL calculated molecular features of JNK1 bioactivity dataset from ChEMBL database to build classification models using random forest classifier. Our results show that by optimally selecting features from top 10% based on variable importance measure the classification accuracy is high, and the classification model proposed in this study can be integrated with drug design pipeline to virtually screen compound libraries for predicting JNK1 inhibitors.

Download Full-text

Addressing Measurement Error in Random Forests using Quantitative Bias Analysis

American Journal of Epidemiology ◽

10.1093/aje/kwab010 ◽

2021 ◽

Author(s):

Tammy Jiang ◽

Jaimie L Gradus ◽

Timothy L Lash ◽

Matthew P Fox

Keyword(s):

Machine Learning ◽

Measurement Error ◽

Random Forest ◽

Random Forests ◽

Model Performance ◽

Variable Importance ◽

Bias Analysis ◽

Variable Importance Measures ◽

Quantitative Bias Analysis ◽

The Impact

Abstract Although variables are often measured with error, the impact of measurement error on machine learning predictions is seldom quantified. The purpose of this study was to assess the impact of measurement error on random forest model performance and variable importance. First, we assessed the impact of misclassification (i.e., measurement error of categorical variables) of predictors on random forest model performance (e.g., accuracy, sensitivity) and variable importance (mean decrease in accuracy) using data from the United States National Comorbidity Survey Replication (2001 - 2003). Second, we simulated datasets in which we know the true model performance and variable importance measures and could verify that quantitative bias analysis was recovering the truth in misclassified versions of the datasets. Our findings show that measurement error in the data used to construct random forests can distort model performance and variable importance measures, and that bias analysis can recover the correct results. This study highlights the utility of applying quantitative bias analysis in machine learning to quantify the impact of measurement error on study results.

Download Full-text

Bias in random forest variable importance measures: Illustrations, sources and a solution

BMC Bioinformatics ◽

10.1186/1471-2105-8-25 ◽

2007 ◽

Vol 8 (1) ◽

Cited By ~ 1197

Author(s):

Carolin Strobl ◽

Anne-Laure Boulesteix ◽

Achim Zeileis ◽

Torsten Hothorn

Keyword(s):

Random Forest ◽

Variable Importance ◽

Variable Importance Measures

Download Full-text