sampling variation
Recently Published Documents


TOTAL DOCUMENTS

88
(FIVE YEARS 7)

H-INDEX

15
(FIVE YEARS 1)

Life ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 496
Author(s):  
Adrián Ruiz-Villalba ◽  
Jan M. Ruijter ◽  
Maurice J. B. van den Hoff

In the analysis of quantitative PCR (qPCR) data, the quantification cycle (Cq) indicates the position of the amplification curve with respect to the cycle axis. Because Cq is directly related to the starting concentration of the target, and the difference in Cq values is related to the starting concentration ratio, the only results of qPCR analysis reported are often Cq, ΔCq or ΔΔCq values. However, reporting of Cq values ignores the fact that Cq values may differ between runs and machines, and, therefore, cannot be compared between laboratories. Moreover, Cq values are highly dependent on the PCR efficiency, which differs between assays and may differ between samples. Interpreting reported Cq values, assuming a 100% efficient PCR, may lead to assumed gene expression ratios that are 100-fold off. This review describes how differences in quantification threshold setting, PCR efficiency, starting material, PCR artefacts, pipetting errors and sampling variation are at the origin of differences and variability in Cq values and discusses the limits to the interpretation of observed Cq values. These issues can be avoided by calculating efficiency-corrected starting concentrations per reaction. The reporting of gene expression ratios and fold difference between treatments can then easily be based on these starting concentrations.


Landslides ◽  
2021 ◽  
Author(s):  
David J. Peres ◽  
Antonino Cancelliere

AbstractRainfall intensity-duration landslide-triggering thresholds have become widespread for the development of landslide early warning systems. Thresholds can be in principle determined using rainfall event datasets of three types: (a) rainfall events associated with landslides (triggering rainfall) only, (b) rainfall events not associated with landslides (non-triggering rainfall) only, (c) both triggering and non-triggering rainfall. In this paper, through Monte Carlo simulation, we compare these three possible approaches based on the following statistical properties: robustness, sampling variation, and performance. It is found that methods based only on triggering rainfall can be the worst with respect to those three investigated properties. Methods based on both triggering and non-triggering rainfall perform the best, as they could be built to provide the best trade-off between correct and wrong predictions; they are also robust, but still require a quite large sample to sufficiently limit the sampling variation of the threshold parameters. On the other side, methods based on non-triggering rainfall only, which are mostly overlooked in the literature, imply good robustness and low sampling variation, and performances that can often be acceptable and better than thresholds derived from only triggering events. To use solely triggering rainfall—which is the most common practice in the literature—yields to thresholds with the worse statistical properties, except when there is a clear separation between triggering and non-triggering events. Based on these results, it can be stated that methods based only on non-triggering rainfall deserve wider attention. Methods for threshold identification based on only non-triggering rainfall may have the practical advantage that can be in principle used where limited information on landslide occurrence is available (newly instrumented areas). The fact that relatively large samples (about 200 landslides events) are needed for a sufficiently precise estimation of threshold parameters when using triggering rainfall suggests that threshold determination in future applications may start from identifying thresholds from non-triggering events only, and then move to methods considering also the triggering events as landslide information starts to become more available.


Plants ◽  
2021 ◽  
Vol 10 (2) ◽  
pp. 319
Author(s):  
Zhenyu Dang ◽  
Jixuan Yang ◽  
Lin Wang ◽  
Qin Tao ◽  
Fengjun Zhang ◽  
...  

The new sequencing technology enables identification of genome-wide sequence-based variants at a population level and a competitively low cost. The sequence variant-based molecular markers have motivated enormous interest in population and quantitative genetic analyses. Generation of the sequence data involves a sophisticated experimental process embedded with rich non-biological variation. Statistically, the sequencing process indeed involves sampling DNA fragments from an individual sequence. Adequate knowledge of sampling variation of the sequence data generation is one of the key statistical properties for any downstream analysis of the data and for implementing statistically appropriate methods. This paper reports a thorough investigation on modeling the sampling variation of the sequence data from the optimized RAD-seq (Restriction sit associated DNA sequencing) experiments with two parents and their offspring of diploid and autotetraploid potato (Solanum tuberosum L.). The analysis shows significant dispersion in sampling variation of the sequence data over that expected under multinomial distribution as widely assumed in the literature and provides statistical methods for modeling the variation and calculating the model parameters, which may be easily implemented in real sequence datasets. The optimized design of RAD-seq experiments enabled effective control of presentation of undesirable chloroplast DNA and RNA genes in the sequence data generated.


2020 ◽  
Author(s):  
Benjamin J. Burgess ◽  
Drew Purves ◽  
Georgina Mace ◽  
David J. Murrell

AbstractUnderstanding and predicting how multiple co-occurring environmental stressors combine to affect biodiversity and ecosystem services is an on-going grand challenge for ecology. So far progress has been made through accumulating large numbers of smaller-scale individual studies that are then investigated by meta-analyses to look for general patterns. In particular there has been an interest in checking for so-called ecological surprises where stressors interact in a synergistic manner. Recent reviews suggest that such synergisms do not dominate, but few other generalities have emerged. This lack of general prediction and understanding may be due in part to a dearth of ecological theory that can generate clear hypotheses and predictions to tested against empirical data. Here we close this gap by analysing food web models based upon classical ecological theory and comparing their predictions to a large (546 interactions) dataset for the effects of pairs of stressors on freshwater communities, using trophic- and population-level metrics of abundance, density, and biomass as responses. We find excellent overall agreement between the stochastic version of our models and the experimental data, and both conclude additive stressor interactions are the most frequent, but that meta-analyses report antagonistic summary interaction classes. Additionally, we show that the statistical tests used to classify the interactions are very sensitive to sampling variation. It is therefore likely that current weak sampling and low sample sizes are masking many non-additive stressor interactions, which our theory predicts to dominate when sampling variation is removed. This leads us to suspect ecological surprises may be more common than currently reported. Our results highlight the value of developing theory in tandem with empirical tests, and the need to examine the robustness of statistical machinery, especially the widely-used null models, before we can draw strong conclusions about how environmental drivers combine.


2020 ◽  
Vol 69 (6) ◽  
pp. 1200-1211
Author(s):  
Michael Grundler ◽  
Daniel L Rabosky

Abstract The evolutionary dynamics of complex ecological traits—including multistate representations of diet, habitat, and behavior—remain poorly understood. Reconstructing the tempo, mode, and historical sequence of transitions involving such traits poses many challenges for comparative biologists, owing to their multidimensional nature. Continuous-time Markov chains are commonly used to model ecological niche evolution on phylogenetic trees but are limited by the assumption that taxa are monomorphic and that states are univariate categorical variables. A necessary first step in the analysis of many complex traits is therefore to categorize species into a predetermined number of univariate ecological states, but this procedure can lead to distortion and loss of information. This approach also confounds interpretation of state assignments with effects of sampling variation because it does not directly incorporate empirical observations for individual species into the statistical inference model. In this study, we develop a Dirichlet-multinomial framework to model resource use evolution on phylogenetic trees. Our approach is expressly designed to model ecological traits that are multidimensional and to account for uncertainty in state assignments of terminal taxa arising from effects of sampling variation. The method uses multivariate count data across a set of discrete resource categories sampled for individual species to simultaneously infer the number of ecological states, the proportional utilization of different resources by different states, and the phylogenetic distribution of ecological states among living species and their ancestors. The method is general and may be applied to any data expressible as a set of observational counts from different categories. [Comparative methods; Dirichlet multinomial; ecological niche evolution; macroevolution; Markov model.]


Author(s):  
Daniel A Caroff ◽  
Rui Wang ◽  
Zilu Zhang ◽  
Robert Wolf ◽  
Ed Septimus ◽  
...  

Abstract Background The Centers for Medicare and Medicaid Services (CMS) use colon surgical site infection (SSI) rates to rank hospitals and apply financial penalties. The CMS’ risk-adjustment model omits potentially impactful variables that might disadvantage hospitals with complex surgical populations. Methods We analyzed adult patients who underwent colon surgery within facilities associated with HCA Healthcare from 2014 to 2016. SSIs were identified from National Health Safety Network (NHSN) reporting. We trained and validated 3 SSI prediction models, using (1) current CMS model variables, including hospital-specific random effects (HCA-adapted CMS model); (2) demographics and claims-based comorbidities (expanded-claims model); and (3) demographics, claims-based comorbidities, and NHSN variables (claims-plus–electronic health record [EHR] model). Discrimination, calibration, and resulting rankings were compared among all models and the current CMS model with published coefficient values. Results We identified 39 468 colon surgeries in 149 hospitals, resulting in 1216 (3.1%) SSIs. Compared to the HCA-adapted CMS model, the expanded-claims model had similar performance (c-statistic, 0.65 vs 0.67, respectively), while the claims-plus-EHR model was more accurate (c-statistic, 0.70; 95% confidence interval, .67–.73; P = .004). The sampling variation, due to the low surgical volume and small number of infections, contributed 74% of the total variation in observed SSI rates between hospitals. When CMS model rankings were compared to those from the expanded-claims and claims-plus-EHR models, 18 (15%) and 26 (22%) hospitals changed quartiles, respectively, and 10 (8.3%) and 12 (10%) hospitals changed into or out of the lowest-performing quartile, respectively. Conclusions An expanded set of variables improved colon SSI risk predictions and quartile assignments, but low procedure volumes and SSI events remain a barrier to effectively comparing hospitals.


2019 ◽  
Author(s):  
Michael C. Grundler ◽  
Daniel L. Rabosky

ABSTRACTThe evolutionary dynamics of complex ecological traits – including multistate representations of diet, habitat, and behavior – remain poorly understood. Reconstructing the tempo, mode, and historical sequence of transitions involving such traits poses many challenges for comparative biologists, owing to their multidimensional nature and intraspecific variability. Continuous-time Markov chains (CTMC) are commonly used to model ecological niche evolution on phylogenetic trees but are limited by the assumption that taxa are monomorphic and that states are univariate categorical variables. Thus, a necessary first step when using standard CTMC models is to categorize species into a pre-determined number of ecological states. This approach potentially confounds interpretation of state assignments with effects of sampling variation because it does not directly incorporate empirical observations of resource use into the statistical inference model. The neglect of sampling variation, along with univariate representations of true multivariate phenotypes, potentially leads to the distortion and loss of information, with substantial implications for downstream macroevolutionary analyses. In this study, we develop a hidden Markov model using a Dirichlet-multinomial framework to model resource use evolution on phylogenetic trees. Unlike existing CTMC implementations, states are unobserved probability distributions from which observed data are sampled. Our approach is expressly designed to model ecological traits that are intra-specifically variable and to account for uncertainty in state assignments of terminal taxa arising from effects of sampling variation. The method uses multivariate count data for individual species to simultaneously infer the number of ecological states, the proportional utilization of different resources by different states, and the phylogenetic distribution of ecological states among living species and their ancestors. The method is general and may be applied to any data expressible as a set of observational counts from different categories.


2018 ◽  
Vol 124 (1) ◽  
pp. 185-222 ◽  
Author(s):  
John R. Logan ◽  
Andrew Foster ◽  
Jun Ke ◽  
Fan Li

Author(s):  
Tamás Ferenci ◽  
Levente Kovács

Null hypothesis significance testing dominates the current biostatistical practice. However, this routine has many flaws, in particular p-values are very often misused and misinterpreted. Several solutions has been suggested to remedy this situation, the application of Bayes Factors being perhaps the most well-known. Nevertheless, even Bayes Factors are very seldom applied in medical research. This paper investigates the application of Bayes Factors in the analysis of a realistic medical problem using actual data from a representative US survey, and compares the results to those obtained with traditional means. Linear regression is used as an example as it is one of the most basic tools in biostatistics. The effect of sample size and sampling variation is investigated (with resampling) as well as the impact of the choice of prior. Results show that there is a strong relationship between p-values and Bayes Factors, especially for large samples. The application of Bayes Factors should be encouraged evenin spite of this, as the message they convey is much more instructive and scientifically correct than the current typical practice.


Sign in / Sign up

Export Citation Format

Share Document