Combining a predicted diameter distribution with an estimate based on a small sample of diameters

2011 ◽  
Vol 41 (4) ◽  
pp. 750-762 ◽  
Author(s):  
Lauri Mehtätalo ◽  
Carlos Comas ◽  
Timo Pukkala ◽  
Marc Palahí

The diameter distribution of a forest stand is of great interest in many situations, including forest management planning and the related prediction of growth and yield. The estimation of the diameter distribution may be based on, for example, a measured sample of diameters or the application of previously estimated parameter prediction models (PPMs), which relate the parameters of an assumed distribution function to some stand characteristics. We propose combining these two information sources. The approach is adopted from the mixed-effects modelling theory. The PPMs are treated as mixed-effects models, the residuals being stand effects. These stand effects are predicted using a small sample of tree diameters with the best linear predictor. A study conducted with a Spanish pine data set showed that in a situation where the predictors of the PPM include errrors, the prediction can be improved even by using a sample plot of as few as five sample trees. Vice versa, a distribution based on a sample plot of 3–15 sample trees can be significantly improved by utilizing existing PPMs. An additional simulation study was conducted to further investigate how the violation of different underlying assumptions of the method affects the performance.

2003 ◽  
Vol 33 (3) ◽  
pp. 430-434 ◽  
Author(s):  
Annika Kangas ◽  
Matti Maltamo

Diameter distribution of the growing stock is essential in many forest management planning problems. The diameter distribution is the basis for predicting, for example, timber assortments of a stand. Usually the predicted diameter distribution is scaled so that the stem number (or basal area) corresponds to the measured value (or predicted future value), but it may be difficult to obtain a distribution that gives correct estimates for all known variables. Diameter distributions that are compatible with all available information can be obtained using an approach adopted from sampling theory, the calibration estimation. In calibration estimation, the original predicted frequencies are modified so that they respect a set of constraints, the calibration equations. In this paper, an example of utilizing diameter distributions in growth and yield predictions is presented. The example is based on individual tree growth models of Scots pine (Pinus sylvestris L.). Calibration estimation was utilized in predicting the diameter distribution at the beginning of the simulation period. Then, trees were picked from the distribution and their development was predicted with individual tree models. In predicting the current stand characteristics, calibrated diameter distributions proved to be efficient. However, in predicting future yields, calibration estimation did not significantly improve the accuracy of the results.


2021 ◽  
Author(s):  
Josh B Bankston ◽  
Charles O Sabatia ◽  
Krishna P Poudel

Abstract Distribution of tree diameters in a stand is characterized using models that predict diameter moments and/or percentiles in conjunction with a mathematical system to recover the parameters of an assumed statistical distribution. Studies have compared Weibull diameter distribution recovery systems but arrived at different conclusions regarding the best approach for recovering a stand’s diameter distribution from predicted stand-level statistics. We assessed the effects of sample plot size and diameter moments/percentiles prediction models on the accuracy of three approaches used in recovering Weibull distribution parameters—method of moments, percentile method, and moments-percentile hybrid method. Data from five plot sizes, four of which were virtually created from existing larger plots, from unthinned loblolly pine (Pinus taeda) plantations, were used to fit moments/percentile prediction models and to evaluate the accuracy of the diameter distribution recovered using three approaches. Both plot size and prediction model form affected the accuracy of the recovery approaches as indicated by the changes in their ranking from one plot size to another for the same model form. The method of moments approach ranked best when the evaluation error index did not account for tree stumpage value, but the moments-percentile hybrid approach ranked best when stumpage value was considered. Study Implications Diameter distribution recovery techniques make it possible to disaggregate trees per unit area, predicted by the whole stand growth and yield models, into diameter and utilization product classes. Thus, the techniques provide insights into stand structure, which can guide management decisions such as thinning and selection harvesting. The techniques are also used to generate yield tables by product class, which are important inputs into harvest scheduling optimization programs. An accurate diameter recovery technique is therefore critical to forest management and planning. Based on the findings of this study, the best approach of developing a diameter distribution recovery system for unthinned loblolly pine plantations would be to use the hybrid approach, with tree diameter data collected from plots of at least one-tenth hectare. The well-known (and, most likely, widely used) method of moments approach may not be the best choice. For predicting stand diameter moments and order statistics used in a diameter distribution recovery system, it would be best to use a linear additive model that incorporates a measure of stand density, such as relative spacing and/or number of trees per unit area, and a measure of the stand’s stage of development, such as dominant height and/or age.


2019 ◽  
Vol 21 (9) ◽  
pp. 662-669 ◽  
Author(s):  
Junnan Zhao ◽  
Lu Zhu ◽  
Weineng Zhou ◽  
Lingfeng Yin ◽  
Yuchen Wang ◽  
...  

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.


2019 ◽  
Vol 11 (1) ◽  
pp. 156-173
Author(s):  
Spenser Robinson ◽  
A.J. Singh

This paper shows Leadership in Energy and Environmental Design (LEED) certified hospitality properties exhibit increased expenses and earn lower net operating income (NOI) than non-certified buildings. ENERGY STAR certified properties demonstrate lower overall expenses than non-certified buildings with statistically neutral NOI effects. Using a custom sample of all green buildings and their competitive data set as of 2013 provided by Smith Travel Research (STR), the paper documents potential reasons for this result including increased operational expenses, potential confusion with certified and registered LEED projects in the data, and qualitative input. The qualitative input comes from a small sample survey of five industry professionals. The paper provides one of the only analyses on operating efficiencies with LEED and ENERGY STAR hospitality properties.


2020 ◽  
Author(s):  
Adrian Norman Goodwin

Abstract Diameter distribution models based on probability density functions are integral to many forest growth and yield systems, where they are used to estimate product volumes within diameter classes. The three-parameter Weibull function with a constrained nonnegative lower bound is commonly used because of its flexibility and ease of fitting. This study compared Weibull and reverse Weibull functions with and without a lower bound constraint and left-hand truncation, across three large unthinned plantation cohorts in which 81% of plots had negatively skewed diameter distributions. Near-optimal lower bounds for the unconstrained Weibull function were negative for negatively skewed data, and the left-truncated Weibull using these bounds was 14.2% more accurate than the constrained Weibull, based on the Kolmogorov-Smirnov statistic. The truncated reverse Weibull fit dominant tree distributions 23.7% more accurately than the constrained Weibull, based on a mean absolute difference statistic. This work indicates that a blind spot may have developed in plantation growth modeling systems deploying constrained Weibull functions, and that left-truncation of unconstrained functions could substantially improve model accuracy for negatively skewed distributions.


Forests ◽  
2020 ◽  
Vol 11 (5) ◽  
pp. 523 ◽  
Author(s):  
Félicien Meunier ◽  
Sruthi M. Krishna Moorthy ◽  
Hannes P. T. De Deurwaerder ◽  
Robin Kreus ◽  
Jan Van den Bulcke ◽  
...  

Research Highlights: We investigated the variability of vessel diameter distributions within the liana growth form among liana individuals originating from a single site in Laussat, French Guiana. Background and Objectives: Lianas (woody vines) are key components of tropical forests. Lianas are believed to be strong competitors for water, thanks to their presumed efficient vascular systems. However, unlike tropical trees, lianas are overlooked in field data collection. As a result, lianas are often referred to as a homogeneous growth form while little is known about the hydraulic architecture variation among liana individuals. Materials and Methods: We measured several wood hydraulic and structural traits (e.g., basic specific gravity, vessel area, and vessel diameter distribution) of 22 liana individuals in a single sandy site in Laussat, French Guiana. We compared the liana variability of these wood traits and the correlations among them with an existing liana pantropical dataset and two published datasets of trees originating from different, but species-rich, tropical sites. Results: Liana vessel diameter distribution and density were heterogeneous among individuals: there were two orders of magnitude difference between the smallest (4 µm) and the largest (494 µm) vessel diameters, a 50-fold difference existed between extreme vessel densities ranging from 1.8 to 89.3 vessels mm−2, the mean vessel diameter varied between 26 µm and 271 µm, and the individual theoretical stem hydraulic conductivity estimates ranged between 28 and 1041 kg m−1 s−1 MPa−1. Basic specific gravity varied between 0.26 and 0.61. Consequently, liana wood trait variability, even within a small sample, was comparable in magnitude with tree surveys from other tropical sites and the pantropical liana dataset. Conclusions: This study illustrates that even controlling for site and soil type, liana traits are heterogeneous and cannot be considered as a homogeneous growth form. Our results show that the liana hydraulic architecture heterogeneity across and within sites warrants further investigation in order to categorize lianas into functional groups in the same way as trees


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Philipp Rentzsch ◽  
Max Schubach ◽  
Jay Shendure ◽  
Martin Kircher

Abstract Background Splicing of genomic exons into mRNAs is a critical prerequisite for the accurate synthesis of human proteins. Genetic variants impacting splicing underlie a substantial proportion of genetic disease, but are challenging to identify beyond those occurring at donor and acceptor dinucleotides. To address this, various methods aim to predict variant effects on splicing. Recently, deep neural networks (DNNs) have been shown to achieve better results in predicting splice variants than other strategies. Methods It has been unclear how best to integrate such process-specific scores into genome-wide variant effect predictors. Here, we use a recently published experimental data set to compare several machine learning methods that score variant effects on splicing. We integrate the best of those approaches into general variant effect prediction models and observe the effect on classification of known pathogenic variants. Results We integrate two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu), a widely used tool for genome-wide variant effect prediction that we previously developed to weight and integrate diverse collections of genomic annotations. With this new model, CADD-Splice, we show that inclusion of splicing DNN effect scores substantially improves predictions across multiple variant categories, without compromising overall performance. Conclusions While splice effect scores show superior performance on splice variants, specialized predictors cannot compete with other variant scores in general variant interpretation, as the latter account for nonsense and missense effects that do not alter splicing. Although only shown here for splice scores, we believe that the applied approach will generalize to other specific molecular processes, providing a path for the further improvement of genome-wide variant effect prediction.


2021 ◽  
pp. 174077452110285
Author(s):  
Conner L Jackson ◽  
Kathryn Colborn ◽  
Dexiang Gao ◽  
Sangeeta Rao ◽  
Hannah C Slater ◽  
...  

Background: Cluster-randomized trials allow for the evaluation of a community-level or group-/cluster-level intervention. For studies that require a cluster-randomized trial design to evaluate cluster-level interventions aimed at controlling vector-borne diseases, it may be difficult to assess a large number of clusters while performing the additional work needed to monitor participants, vectors, and environmental factors associated with the disease. One such example of a cluster-randomized trial with few clusters was the “efficacy and risk of harms of repeated ivermectin mass drug administrations for control of malaria” trial. Although previous work has provided recommendations for analyzing trials like repeated ivermectin mass drug administrations for control of malaria, additional evaluation of the multiple approaches for analysis is needed for study designs with count outcomes. Methods: Using a simulation study, we applied three analysis frameworks to three cluster-randomized trial designs (single-year, 2-year parallel, and 2-year crossover) in the context of a 2-year parallel follow-up of repeated ivermectin mass drug administrations for control of malaria. Mixed-effects models, generalized estimating equations, and cluster-level analyses were evaluated. Additional 2-year parallel designs with different numbers of clusters and different cluster correlations were also explored. Results: Mixed-effects models with a small sample correction and unweighted cluster-level summaries yielded both high power and control of the Type I error rate. Generalized estimating equation approaches that utilized small sample corrections controlled the Type I error rate but did not confer greater power when compared to a mixed model approach with small sample correction. The crossover design generally yielded higher power relative to the parallel equivalent. Differences in power between analysis methods became less pronounced as the number of clusters increased. The strength of within-cluster correlation impacted the relative differences in power. Conclusion: Regardless of study design, cluster-level analyses as well as individual-level analyses like mixed-effects models or generalized estimating equations with small sample size corrections can both provide reliable results in small cluster settings. For 2-year parallel follow-up of repeated ivermectin mass drug administrations for control of malaria, we recommend a mixed-effects model with a pseudo-likelihood approximation method and Kenward–Roger correction. Similarly designed studies with small sample sizes and count outcomes should consider adjustments for small sample sizes when using a mixed-effects model or generalized estimating equation for analysis. Although the 2-year parallel follow-up of repeated ivermectin mass drug administrations for control of malaria is already underway as a parallel trial, applying the simulation parameters to a crossover design yielded improved power, suggesting that crossover designs may be valuable in settings where the number of available clusters is limited. Finally, the sensitivity of the analysis approach to the strength of within-cluster correlation should be carefully considered when selecting the primary analysis for a cluster-randomized trial.


2008 ◽  
Vol 54 (1) ◽  
pp. 31-35
Author(s):  
Thomas G. Matney ◽  
Emily B. Schultz

Abstract Many growth and yield models have used statistical probability distributions to estimate the diameter distribution of a stand at any age. Equations for approximating individual tree diameter growth and survival probabilities from dbh can be derived from these models. A general procedure for determining the functions is discussed and illustrated using a loblolly pine spacing study. The results from the spacing study show that it is possible to define tree diameter growth and survival probability functions from diameter distributions with an accuracy sufficient to obtain a link between the individual tree and diameter growth and yield models.


2015 ◽  
Vol 17 (5) ◽  
pp. 719-732
Author(s):  
Dulakshi Santhusitha Kumari Karunasingha ◽  
Shie-Yui Liong

A simple clustering method is proposed for extracting representative subsets from lengthy data sets. The main purpose of the extracted subset of data is to use it to build prediction models (of the form of approximating functional relationships) instead of using the entire large data set. Such smaller subsets of data are often required in exploratory analysis stages of studies that involve resource consuming investigations. A few recent studies have used a subtractive clustering method (SCM) for such data extraction, in the absence of clustering methods for function approximation. SCM, however, requires several parameters to be specified. This study proposes a clustering method, which requires only a single parameter to be specified, yet it is shown to be as effective as the SCM. A method to find suitable values for the parameter is also proposed. Due to having only a single parameter, using the proposed clustering method is shown to be orders of magnitudes more efficient than using SCM. The effectiveness of the proposed method is demonstrated on phase space prediction of three univariate time series and prediction of two multivariate data sets. Some drawbacks of SCM when applied for data extraction are identified, and the proposed method is shown to be a solution for them.


Sign in / Sign up

Export Citation Format

Share Document