Generating Tree-Level Harvest Predictions from Forest Inventories with Random Forests

Wood supply predictions from forest inventories involve two steps. First, it is predicted whether harvests occur on a plot in a given time period. Second, for plots on which harvests are predicted to occur, the harvested volume is predicted. This research addresses this second step. For forests with more than one species and/or forests with trees of varying dimensions, overall harvested volume predictions are not satisfactory and more detailed predictions are required. The study focuses on southwest Germany where diverse forest types are found. Predictions are conducted for plots on which harvests occurred in the 2002–2012 period. For each plot, harvest probabilities of sample trees are predicted and used to derive the harvested volume (m³ over bark in 10 years) per hectare. Random forests (RFs) have become popular prediction models as they define the interactions and relationships of variables in an automatized way. However, their suitability for predicting harvest probabilities for inventory sample trees is questionable and has not yet been examined. Generalized linear mixed models (GLMMs) are suitable in this context as they can account for the nested structure of tree-level data sets (trees nested in plots). It is unclear if RFs can cope with this data structure. This research aims to clarify this question by comparing two RFs—an RF based on conditional inference trees (CTree-RF), and an RF based on classification and regression trees (CART-RF)—with a GLMM. For this purpose, the models were fitted on training data and evaluated on an independent test set. Both RFs achieved better prediction results than the GLMM. Regarding plot-level harvested volumes per ha, they achieved higher variances explained (VEs) and significantly (p < 0.05) lower mean absolute residuals when compared to the GLMM. VEs were 0.38 (CTree-RF), 0.37 (CART-RF), and 0.31 (GLMM). Root means squared errors were 138.3, 139.9 and 145.5, respectively. The research demonstrates the suitability and advantages of RFs for predicting harvest decisions on the level of inventory sample trees. RFs can become important components within the generation of business-as-usual wood supply scenarios worldwide as they are able to learn and predict harvest decisions from NFIs in an automatized and self-adapting way. The applied approach is not restricted to specific forests or harvest regimes and delivers detailed species and dimension information for the harvested volumes.

Download Full-text

Conditional Inference Trees and Random Forests

A Practical Handbook of Corpus Linguistics ◽

10.1007/978-3-030-46216-1_25 ◽

2020 ◽

pp. 611-643

Author(s):

Natalia Levshina

Keyword(s):

Random Forests ◽

Conditional Inference ◽

Conditional Inference Trees

Download Full-text

USING CONDITIONAL INFERENCE TREES AND RANDOM FORESTS TO PREDICT THE BIOACCUMULATION POTENTIAL OF ORGANIC CHEMICALS

Environmental Toxicology and Chemistry ◽

10.1002/etc.2150 ◽

2013 ◽

Vol 32 (5) ◽

pp. 1187-1195 ◽

Cited By ~ 15

Author(s):

Sebastian Strempel ◽

Monika Nendza ◽

Martin Scheringer ◽

Konrad Hungerbühler

Keyword(s):

Random Forests ◽

Conditional Inference ◽

Organic Chemicals ◽

Conditional Inference Trees

Download Full-text

The positioning of concessive adverbial clauses in English: assessing the importance of discourse-pragmatic and processing-based constraints

English Language and Linguistics ◽

10.1017/s1360674312000305 ◽

2013 ◽

Vol 17 (1) ◽

pp. 1-23 ◽

Cited By ~ 19

Author(s):

DANIEL WIECHMANN ◽

ELMA KERZ

Keyword(s):

Language Processing ◽

Random Forests ◽

General Type ◽

Main Clause ◽

Conditional Inference ◽

Relative Importance ◽

Multifactorial Analysis ◽

Subordinate Clauses ◽

Conditional Inference Trees ◽

Adverbial Clauses

English permits adverbial subordinate clauses to be placed either before or after their associated main clause. Previous research has shown that the positioning is conditioned by various factors from the domains of semantics, discourse pragmatics and language processing. With the exception of Diessel (2008), these factors have never been investigated in concert, which makes it difficult to understand their relative importance. Diessel's study, however, discusses only temporal constructions and identifies iconicity of sequence as the strongest predictor of clause position. Since this explanation is, in principle, unavailable for other types of subordinate clauses, the generalizability of Diessel's findings is somewhat limited. The present study offers a multifactorial analysis of 2,000 concessive constructions from the written part of the BNC and assesses the variable importance of six factors for the ordering choice, showing that semantic and discourse-pragmatic factors are much stronger predictors of clause position than processing-based, weight-related ones. On a methodological note, the study proposes that random forests using conditional inference trees constitute the preferred tool for the general type of problem investigated here.

Download Full-text

INCREMENTAL DEVELOPMENT OF FAULT PREDICTION MODELS

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194013500447 ◽

2013 ◽

Vol 23 (10) ◽

pp. 1399-1425 ◽

Cited By ~ 4

Author(s):

YUE JIANG ◽

BOJAN CUKIC ◽

TIM MENZIES ◽

JIE LIN

Keyword(s):

Life Cycle ◽

Prediction Models ◽

Statistical Significance ◽

Fault Prediction ◽

Training Data ◽

Data Sets ◽

Data Repositories ◽

Software Fault Prediction ◽

Incremental Development ◽

Code Metrics

The identification of fault-prone modules has a significant impact on software quality assurance. In addition to prediction accuracy, one of the most important goals is to detect fault prone modules as early as possible in the development lifecycle. Requirements, design, and code metrics have been successfully used for predicting fault-prone modules. In this paper, we investigate the benefits of the incremental development of software fault prediction models. We compare the performance of these models as the volume of data and their life cycle origin (design, code, or their combination) evolve during project development. We analyze 14 data sets from publicly available software engineering data repositories. These data sets offer both design and code metrics. Using a number of modeling techniques and statistical significance tests, we confirm that increasing the volume of training data improves model performance. Further models built from code metrics typically outperform those that are built using design metrics only. However, both types of models prove to be useful as they can be constructed in different phases of the life cycle. Code-based models can be used to increase the effectiveness of assigning verification and validation activities late in the development life cycle. We also conclude that models that utilize a combination of design and code level metrics outperform models which use either one metric set exclusively.

Download Full-text

Do Anglian dialects of Old English have a partial pro-drop property?

Referential Null Subjects in Early English ◽

10.1093/oso/9780198808237.003.0003 ◽

2019 ◽

pp. 53-92

Author(s):

Kristian A. Rusten

Keyword(s):

Old English ◽

Random Forests ◽

Fixed Effects ◽

Analytical Techniques ◽

Statistical Modelling ◽

Descriptive Statistics ◽

Conditional Inference ◽

The West ◽

Conditional Inference Trees ◽

Logistic Regression Modelling

Chapter 3 deals with the question of whether Anglian dialects of Old English, in contrast to the West Saxon literary standard, had a partial pro-drop property. The chapter investigates this ‘dialect-split hypothesis’ by means of descriptive statistics and inferential statistical modelling. It is also noted that what has been interpreted as diatopic variation could also be representative of other types of variation, and consequently the variables of translation status, period, and genre are also investigated, in addition to dialect. The primary analytical techniques used in this chapter are generalized fixed-effects logistic regression modelling and random forests of conditional inference trees. The chapter concludes that the dialect-split hypothesis must be considered falsified.

Download Full-text

Place-name Mutation Variation in Wales and Patagonia

Journal of Celtic Linguistics ◽

10.16922/jcl.21.5 ◽

2020 ◽

Vol 21 (1) ◽

pp. 143-172

Author(s):

Morgan Sleeper

Keyword(s):

Random Forests ◽

Statistical Approach ◽

Conditional Inference ◽

Conversational Speech ◽

Place Names ◽

Mutation Strategy ◽

Significant Difference ◽

Conditional Inference Trees ◽

Corpus Data

This study uses corpus data of modern conversational speech to examine variation in the mutation of place-names in Welsh as spoken in both Wales and Patagonia. Specifically, it considers how speakers from both areas mutate (or do not mutate) place-names following the nasal mutation trigger yn 'in', through a two-step statistical approach of conditional inference trees and random forests. Results show no significant difference in how speakers from Wales and Patagonia mutate place-names in this environment, but that the radical initial consonant, speaker age, and place-name type – including the geographical, linguistic, and cultural 'Welshness' of the place-name – all significantly affect mutation behaviour. Furthermore, while nasal mutation is present in the data, the results also illustrate the growing use of soft mutation as an alternate mutation strategy following yn.

Download Full-text

Variation in negation in Seto

Studies in Language ◽

10.1075/sl.19063.lin ◽

2021 ◽

Author(s):

Liina Lindström ◽

Maarja-Liisa Pilvik ◽

Helen Plado

Keyword(s):

Random Forests ◽

Quantitative Methods ◽

Conditional Inference ◽

Double Negation ◽

Regression Modelling ◽

Conditional Inference Trees ◽

Third Person ◽

Fixed Expressions ◽

Present Tense

Abstract Seto is an exceptional language in the Uralic family due to its systematic use of postverbal negation, although preverbal and double negation marking are also used. Postverbal negation is still the most frequent and unmarked pattern occurring in about 74% of negative clauses in Seto. This paper analyzes variation between pre- and postverbal negation in East Seto (spoken in present-day Russia), based on data gathered during fieldwork trips in 2010–2013. By applying quantitative methods that are used in variationist studies (regression modelling, conditional inference trees, and random forests), we determine the variables affecting the choice between pre- and postverbal negation. Marked preverbal negation occurs more likely with first and third person, cognition verbs, and present tense, all of which are often used in fixed expressions like I don’t know. We also found a strong structural persistence effect in the data and remarkable differences between individual speakers.

Download Full-text

Models, forests, and trees of York English: Was/were variation as a case study for statistical practice

Language Variation and Change ◽

10.1017/s0954394512000129 ◽

2012 ◽

Vol 24 (2) ◽

pp. 135-178 ◽

Cited By ~ 196

Author(s):

Sali A. Tagliamonte ◽

R. Harald Baayen

Keyword(s):

Random Forests ◽

Linear Models ◽

Random Effect ◽

Mixed Effects ◽

Conditional Inference ◽

Mixed Effects Models ◽

Future Research ◽

Recent Developments ◽

Conditional Inference Trees ◽

Complementary Techniques

AbstractWhat is the explanation for vigorous variation between was and were in plural existential constructions, and what is the optimal tool for analyzing it? Previous studies of this phenomenon have used the variable rule program, a generalized linear model; however, recent developments in statistics have introduced new tools, including mixed-effects models, random forests, and conditional inference trees that may open additional possibilities for data exploration, analysis, and interpretation. In a step-by-step demonstration, we show how this well-known variable benefits from these complementary techniques. Mixed-effects models provide a principled way of assessing the importance of random-effect factors such as the individuals in the sample. Random forests provide information about the importance of predictors, whether factorial or continuous, and do so also for unbalanced designs with high multicollinearity, cases for which the family of linear models is less appropriate. Conditional inference trees straightforwardly visualize how multiple predictors operate in tandem. Taken together, the results confirm that polarity, distance from verb to plural element, and the nature of the DP are significant predictors. Ongoing linguistic change and social reallocation via morphologization are operational. Furthermore, the results make predictions that can be tested in future research. We conclude that variationist research can be substantially enriched by an expanded tool kit.

Download Full-text

Phonological properties of word classes and directionality in conversion

WORD Structure ◽

10.3366/word.2017.0108 ◽

2017 ◽

Vol 10 (2) ◽

pp. 204-234 ◽

Cited By ~ 3

Author(s):

Arne Lohmann

Keyword(s):

Random Forests ◽

Large Scale ◽

Word Formation ◽

Conditional Inference ◽

Limited Attention ◽

Word Class ◽

Conditional Inference Trees ◽

Word Classes ◽

Accurate Indicator ◽

English Noun

In the study of the word-formation process of conversion, one particularly difficult task is to determine the directionality of the process, that is, to decide which word represents the base and which the derived word. One possibility to inform this decision that has received only limited attention is to capitalize on word-class-specific phonological properties. This paper empirically investigates this option for English noun-verb conversion by building on recent findings on phonological differences between these two word classes. A large-scale study of phonological properties is carried out on CELEX data, employing the quantitative techniques of conditional inference trees and random forests. An important result of this analysis is that the accuracy of phonological cues varies widely across different subsamples in the data. Essentially this means that phonological cues can be used as a criterion to determine the directionality of words that are at least two syllables in length. When restricted to this part of the lexicon, phonological properties represent a fairly accurate indicator of source word class and are therefore a useful addition to the linguist's toolkit for determining directionality in conversion. Based on this result, the paper also discusses the relations of phonological properties to other criteria commonly employed to determine directionality.

Download Full-text

Predictors of remission from body dysmorphic disorder after internet-delivered cognitive behavior therapy: a machine learning approach

10.31234/osf.io/eqcdx ◽

2019 ◽

Author(s):

Oskar Flygare ◽

Jesper Enander ◽

Erik Andersson ◽

Brjánn Ljótsson ◽

Volen Z Ivanov ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forests ◽

Clinical Utility ◽

Body Dysmorphic Disorder ◽

Prediction Models ◽

Behavioral Therapy ◽

Learning Approach ◽

Learning Approaches ◽

Machine Learning Approach

**Background:** Previous attempts to identify predictors of treatment outcomes in body dysmorphic disorder (BDD) have yielded inconsistent findings. One way to increase precision and clinical utility could be to use machine learning methods, which can incorporate multiple non-linear associations in prediction models. **Methods:** This study used a random forests machine learning approach to test if it is possible to reliably predict remission from BDD in a sample of 88 individuals that had received internet-delivered cognitive behavioral therapy for BDD. The random forest models were compared to traditional logistic regression analyses. **Results:** Random forests correctly identified 78% of participants as remitters or non-remitters at post-treatment. The accuracy of prediction was lower in subsequent follow-ups (68%, 66% and 61% correctly classified at 3-, 12- and 24-month follow-ups, respectively). Depressive symptoms, treatment credibility, working alliance, and initial severity of BDD were among the most important predictors at the beginning of treatment. By contrast, the logistic regression models did not identify consistent and strong predictors of remission from BDD. **Conclusions:** The results provide initial support for the clinical utility of machine learning approaches in the prediction of outcomes of patients with BDD. **Trial registration:** ClinicalTrials.gov ID: NCT02010619.

Download Full-text