Reducing sample size through the use of a composite estimator: an application to timber volume estimation

1986 ◽  
Vol 16 (5) ◽  
pp. 1116-1118 ◽  
Author(s):  
Edwin J. Green ◽  
William E. Strawderman

A method for determining the appropriate sample size to produce an estimate with a stated allowable percent error when the sample data is to be combined with prior information is presented. Application of the method in the case where the objective is to estimate volume per acre and prior knowledge is represented by a yield equation demonstrates that this method can reduce the amount of sample information that would be required if the yield equation were to be ignored.

2019 ◽  
Author(s):  
Elisa Benedetti ◽  
Maja Pučić-Baković ◽  
Toma Keser ◽  
Nathalie Gerstner ◽  
Mustafa Büyüközkan ◽  
...  

AbstractCorrelation networks are commonly used to statistically extract biological interactions between omics markers. Network edge selection is typically based on the significance of the underlying correlation coefficients. A statistical cutoff, however, is not guaranteed to capture biological reality, and heavily depends on dataset properties such as sample size. We here propose an alternative, innovative approach to address the problem of network reconstruction. Specifically, we developed a cutoff selection algorithm that maximizes the agreement to a given ground truth. We first evaluate the approach on IgG glycomics data, for which the biochemical pathway is known and well-characterized. The optimal network outperforms networks obtained with statistical cutoffs and is robust with respect to sample size. Importantly, we can show that even in the case of incomplete or incorrect prior knowledge, the optimal network is close to the true optimum. We then demonstrate the generalizability of the approach on an untargeted metabolomics and a transcriptomics dataset from The Cancer Genome Atlas (TCGA). For the transcriptomics case, we demonstrate that the optimized network is superior to statistical networks in systematically retrieving interactions that were not included in the biological reference used for the optimization. Overall, this paper shows that using prior information for correlation network inference is superior to using regular statistical cutoffs, even if the prior information is incomplete or partially inaccurate.


2018 ◽  
Vol 28 (6) ◽  
pp. 1664-1675 ◽  
Author(s):  
TB Brakenhoff ◽  
KCB Roes ◽  
S Nikolakopoulos

The sample size of a randomized controlled trial is typically chosen in order for frequentist operational characteristics to be retained. For normally distributed outcomes, an assumption for the variance needs to be made which is usually based on limited prior information. Especially in the case of small populations, the prior information might consist of only one small pilot study. A Bayesian approach formalizes the aggregation of prior information on the variance with newly collected data. The uncertainty surrounding prior estimates can be appropriately modelled by means of prior distributions. Furthermore, within the Bayesian paradigm, quantities such as the probability of a conclusive trial are directly calculated. However, if the postulated prior is not in accordance with the true variance, such calculations are not trustworthy. In this work we adapt previously suggested methodology to facilitate sample size re-estimation. In addition, we suggest the employment of power priors in order for operational characteristics to be controlled.


2021 ◽  
Vol 29 (6) ◽  
pp. 0-0

Digital technology has had changed the uncertain nature of the process of new venture idea generation, and it has also brought unprecedented opportunities for the generation of new digital venture ideas. To explore how startups can deal with major challenges brought by digital technology and create new digital venture ideas, this paper focuses on micro level entrepreneurial actions, and constructs a theoretical model of the relationship among networking capabilities, IT capabilities, prior knowledge and new digital venture ideas. Furthermore, through the hierarchical linear regression analysis of 278 sample data, the paper finds that in the context of digitalization, both networking capabilities and IT capabilities have a positive impact on the generation of new digital venture ideas. In addition, prior knowledge plays an moderating role in the relationship between IT capabilities and new digital venture ideas. This paper explore how startups can build new digital venture ideas in the context of digitalization, which guides small enterprises in responding to new challenges.


2018 ◽  
Author(s):  
Arghavan Bahadorinejad ◽  
Ivan Ivanov ◽  
Johanna W Lampe ◽  
Meredith AJ Hullar ◽  
Robert S Chapkin ◽  
...  

AbstractWe propose a Bayesian method for the classification of 16S rRNA metagenomic profiles of bacterial abundance, by introducing a Poisson-Dirichlet-Multinomial hierarchical model for the sequencing data, constructing a prior distribution from sample data, calculating the posterior distribution in closed form; and deriving an Optimal Bayesian Classifier (OBC). The proposed algorithm is compared to state-of-the-art classification methods for 16S rRNA metagenomic data, including Random Forests and the phylogeny-based Metaphyl algorithm, for varying sample size, classification difficulty, and dimensionality (number of OTUs), using both synthetic and real metagenomic data sets. The results demonstrate that the proposed OBC method, with either noninformative or constructed priors, is competitive or superior to the other methods. In particular, in the case where the ratio of sample size to dimensionality is small, it was observed that the proposed method can vastly outperform the others.Author summaryRecent studies have highlighted the interplay between host genetics, gut microbes, and colorectal tumor initiation/progression. The characterization of microbial communities using metagenomic profiling has therefore received renewed interest. In this paper, we propose a method for classification, i.e., prediction of different outcomes, based on 16S rRNA metagenomic data. The proposed method employs a Bayesian approach, which is suitable for data sets with small ration of number of available instances to the dimensionality. Results using both synthetic and real metagenomic data show that the proposed method can outperform other state-of-the-art metagenomic classification algorithms.


1984 ◽  
Vol 14 (6) ◽  
pp. 803-810 ◽  
Author(s):  
Gordon A. Maclean ◽  
George L. Martin

A procedure is described for estimating timber volume from high-precision measurement of the cross-sectional area of a canopy profile on medium-scale vertical aerial photographs. Timber volume data were obtained from 75 data points in a study area containing several forest types, and canopy profile areas were measured with a stereoplotter at the corresponding points on the aerial photographs. Film density values were also measured along each profile using a scanning microdensitometer. Canopy profile area was found to be independent of the direction of the profile relative to the flight line of the photography. The relation between timber volume and profile area was found to be highly significant, semilogarithmic, and species dependent, with regression R2 values ranging from 0.67 to 0.79. The area under a curve obtained by plotting film density values is not sufficiently correlated with timber volume to be a significant independent variable, either alone or with profile area. However, film density information was found to be of significant value in correcting the profile areas for canopy microopenings too small to be measured with a stereoplotter. With the area of microopenings included as a separate independent variable, regression R2 values range from 0.82 to 0.88.


Author(s):  
A. TETERUKOVSKIY

A problem of automatic detection of tracks in aerial photos is considered. We adopt a Bayesian approach and base our inference on an a priori knowledge of the structure of tracks. The probability of a pixel to belong to a track depends on how the pixel gray level differs from the gray levels of pixels in the neighborhood and on additional prior information. Several suggestions on how to formalize the prior knowledge about the shape of the tracks are made. The Gibbs sampler is used to construct the most probable configuration of tracks in the area. The method is applied to aerial photos with cell size of 1 sq. m. Even for detection of trails of width comparable with or smaller than the cell size, positive results can be achieved.


1989 ◽  
Vol 12 (1) ◽  
pp. 187-187
Author(s):  
Robyn M. Dawes

In my comments in BBS (Random generators, ganzfields, analysis, and theory, 1987, 10:581-82) regarding psi, I mistakenly ascribed to Professor Honorton the position that "good experimenters knew in advance that the assertion in the paper I cited (1985), and in fact regards it as a rather foolish one (personal communication 6/25/88). This incorrect assertion was based on my inference - not his - that the most plausible alternative to optional stopping for the negative correlation between sample size and effect size (and even z-scores) was prior knowledge leading to the necessity of sampling fewer observations when the expectation of the estimated effect size was larger.Honorton, C. (1985) The Ganzfeld psi experiment: A critical appraisal. Journal of Parapsychology 49:51-91.


Jurnal Wasian ◽  
2016 ◽  
Vol 3 (2) ◽  
pp. 91
Author(s):  
Relawan Kuswandi

Precise forest inventory to estimate standing stock is needed in forest management planning.  Therefore, it is necessary to have proper and reliable tools in estimating merchantable timber volume. This research was intended to build an accurate model to estimate timber volume for  merchantable species in logging concession of PT Wapoga Mutiara Timber, Sarmi Regency.  Regression equation between diameter and length did not have a significant correlation (coefficient of determination, R2 = 6.7 %). The best equation to estimate table tree volume based on validation test in logging concession of PT Wapoga Mutiara Timber was Log V = - 3.34 + 2.16 log d.     


2018 ◽  
Vol 51 (4) ◽  
pp. 1151-1161 ◽  
Author(s):  
Andreas Haahr Larsen ◽  
Lise Arleth ◽  
Steen Hansen

The structure of macromolecules can be studied by small-angle scattering (SAS), but as this is an ill-posed problem, prior knowledge about the sample must be included in the analysis. Regularization methods are used for this purpose, as already implemented in indirect Fourier transformation and bead-modeling-based analysis of SAS data, but not yet in the analysis of SAS data with analytical form factors. To fill this gap, a Bayesian regularization method was implemented, where the prior information was quantified as probability distributions for the model parameters and included via a functional S. The quantity Q = χ2 + αS was then minimized and the value of the regularization parameter α determined by probability maximization. The method was tested on small-angle X-ray scattering data from a sample of nanodiscs and a sample of micelles. The parameters refined with the Bayesian regularization method were closer to the prior values as compared with conventional χ2 minimization. Moreover, the errors on the refined parameters were generally smaller, owing to the inclusion of prior information. The Bayesian method stabilized the refined values of the fitted model upon addition of noise and can thus be used to retrieve information from data with low signal-to-noise ratio without risk of overfitting. Finally, the method provides a measure for the information content in data, N g, which represents the effective number of retrievable parameters, taking into account the imposed prior knowledge as well as the noise level in data.


Sign in / Sign up

Export Citation Format

Share Document