Protein Domain Hotspots Reveal Functional Mutations across Genes in Cancer

In cancer genomics, frequent recurrence of mutations in independent tumor samples is a strong indication of functional impact. However, rare functional mutations can escape detection by recurrence analysis for lack of statistical power. We address this problem by extending the notion of recurrence of mutations from single genes to gene families that share homologous protein domains. In addition to lowering the threshold of detection, this sharpens the functional interpretation of the impact of mutations, as protein domains more succinctly embody function than entire genes. Mapping mutations in 22 different tumor types to equivalent positions in multiple sequence alignments of protein domains, we confirm well-known functional mutation hotspots and make two types of discoveries: 1) identification and functional interpretation of uncharacterized rare variants in one gene that are equivalent to well-characterized mutations in canonical cancer genes, such as uncharacterizedERBB4(S303F) mutations that are analogous to canonicalERRB2(S310F) mutations in the furin-like domain, and 2) detection of previously unknown mutation hotspots with novel functional implications. With the rapid expansion of cancer genomics projects, protein domain hotspot analysis is likely to provide many more leads linking mutations in proteins to the cancer phenotype.

Download Full-text

Exploring power in response inhibition tasks using the bootstrap: The impact of number of participants, number of trials, effect magnitude, and study design

10.31234/osf.io/eb4jd ◽

2019 ◽

Cited By ~ 1

Author(s):

Curtis David Von Gunten ◽

Bruce D Bartholow

Keyword(s):

Study Design ◽

Effect Size ◽

Statistical Power ◽

Bootstrap Sampling ◽

Reliable Measure ◽

Internal Reliability ◽

The Impact ◽

Power Analyses ◽

Effect Magnitude

A primary psychometric concern with laboratory-based inhibition tasks has been their reliability. However, a reliable measure may not be necessary or sufficient for reliably detecting effects (statistical power). The current study used a bootstrap sampling approach to systematically examine how the number of participants, the number of trials, the magnitude of an effect, and study design (between- vs. within-subject) jointly contribute to power in five commonly used inhibition tasks. The results demonstrate the shortcomings of relying solely on measurement reliability when determining the number of trials to use in an inhibition task: high internal reliability can be accompanied with low power and low reliability can be accompanied with high power. For instance, adding additional trials once sufficient reliability has been reached can result in large gains in power. The dissociation between reliability and power was particularly apparent in between-subject designs where the number of participants contributed greatly to power but little to reliability, and where the number of trials contributed greatly to reliability but only modestly (depending on the task) to power. For between-subject designs, the probability of detecting small-to-medium-sized effects with 150 participants (total) was generally less than 55%. However, effect size was positively associated with number of trials. Thus, researchers have some control over effect size and this needs to be considered when conducting power analyses using analytic methods that take such effect sizes as an argument. Results are discussed in the context of recent claims regarding the role of inhibition tasks in experimental and individual difference designs.

Download Full-text

Ranking cancer drivers via betweenness-based outlier detection and random walks

BMC Bioinformatics ◽

10.1186/s12859-021-03989-w ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Cesim Erten ◽

Aissa Houdjedj ◽

Hilal Kazan

Keyword(s):

Cancer Genomics ◽

Interaction Network ◽

Molecular Data ◽

Alternative Methods ◽

Patient Specific ◽

Cancer Genes ◽

Driver Genes ◽

Cancer Driver ◽

Protein Protein Interaction ◽

Genomic Studies

Abstract Background Recent cancer genomic studies have generated detailed molecular data on a large number of cancer patients. A key remaining problem in cancer genomics is the identification of driver genes. Results We propose BetweenNet, a computational approach that integrates genomic data with a protein-protein interaction network to identify cancer driver genes. BetweenNet utilizes a measure based on betweenness centrality on patient specific networks to identify the so-called outlier genes that correspond to dysregulated genes for each patient. Setting up the relationship between the mutated genes and the outliers through a bipartite graph, it employs a random-walk process on the graph, which provides the final prioritization of the mutated genes. We compare BetweenNet against state-of-the art cancer gene prioritization methods on lung, breast, and pan-cancer datasets. Conclusions Our evaluations show that BetweenNet is better at recovering known cancer genes based on multiple reference databases. Additionally, we show that the GO terms and the reference pathways enriched in BetweenNet ranked genes and those that are enriched in known cancer genes overlap significantly when compared to the overlaps achieved by the rankings of the alternative methods.

Download Full-text

The Impact of Covariates on Statistical Power in Cluster Randomized Designs: Which Level Matters More?

Multivariate Behavioral Research ◽

10.1080/00273171.2012.673898 ◽

2012 ◽

Vol 47 (3) ◽

pp. 392-420 ◽

Cited By ~ 7

Author(s):

Spyros Konstantopoulos

Keyword(s):

Statistical Power ◽

Cluster Randomized ◽

The Impact

Download Full-text

Using a GTR+Γ substitution model for dating sequence divergence when stationarity and time-reversibility assumptions are violated

Bioinformatics ◽

10.1093/bioinformatics/btaa820 ◽

2020 ◽

Vol 36 (Supplement_2) ◽

pp. i884-i894

Author(s):

Jose Barba-Montoya ◽

Qiqing Tao ◽

Sudhir Kumar

Keyword(s):

Divergence Time ◽

Sequence Divergence ◽

Molecular Dating ◽

Divergence Times ◽

Time Reversibility ◽

Sequence Alignments ◽

Divergence Time Estimates ◽

Time Estimates ◽

Substitution Process ◽

The Impact

Abstract Motivation As the number and diversity of species and genes grow in contemporary datasets, two common assumptions made in all molecular dating methods, namely the time-reversibility and stationarity of the substitution process, become untenable. No software tools for molecular dating allow researchers to relax these two assumptions in their data analyses. Frequently the same General Time Reversible (GTR) model across lineages along with a gamma (+Γ) distributed rates across sites is used in relaxed clock analyses, which assumes time-reversibility and stationarity of the substitution process. Many reports have quantified the impact of violations of these underlying assumptions on molecular phylogeny, but none have systematically analyzed their impact on divergence time estimates. Results We quantified the bias on time estimates that resulted from using the GTR + Γ model for the analysis of computer-simulated nucleotide sequence alignments that were evolved with non-stationary (NS) and non-reversible (NR) substitution models. We tested Bayesian and RelTime approaches that do not require a molecular clock for estimating divergence times. Divergence times obtained using a GTR + Γ model differed only slightly (∼3% on average) from the expected times for NR datasets, but the difference was larger for NS datasets (∼10% on average). The use of only a few calibrations reduced these biases considerably (∼5%). Confidence and credibility intervals from GTR + Γ analysis usually contained correct times. Therefore, the bias introduced by the use of the GTR + Γ model to analyze datasets, in which the time-reversibility and stationarity assumptions are violated, is likely not large and can be reduced by applying multiple calibrations. Availability and implementation All datasets are deposited in Figshare: https://doi.org/10.6084/m9.figshare.12594638.

Download Full-text

Assessing the evidence on case management

The British Journal of Psychiatry ◽

10.1192/bjp.181.1.17 ◽

2002 ◽

Vol 181 (1) ◽

pp. 17-21 ◽

Cited By ~ 22

Author(s):

S. J. Ziguras ◽

G. W. Stuart ◽

A. C. Jackson

Keyword(s):

Mental Health ◽

Case Management ◽

Statistical Power ◽

Cochrane Collaboration ◽

Randomised Trials ◽

Inclusion Criteria ◽

Management Effectiveness ◽

Trade Off ◽

The Cochrane Collaboration ◽

The Impact

BackgroundEvidence on the impact of case management is contradictory.AimsTo discuss two different systematic reviews (one conducted by the authors and one conducted through the Cochrane collaboration) that came to contradictory conclusions about the impact of case management in mental health services.MethodWe summarised the findings of the two reviews with respect to case management effectiveness, examined key methodological differences between the two approaches and discuss the impact of these on the validity of the results.ResultsThe differences in conclusions between the two reviews result from the differences in inclusion criteria, namely non-randomised trials, data from unpublished scales and data from variables with skewed distributions. The theoretical and empirical effects of these are discussed.ConclusionsSystematic reviewers may face a trade-off between the application of strict criteria for the inclusion of studies and the amount of data available for analysis and hence statistical power. The available research suggests that case management is generally effective.

Download Full-text

Nipple Confusion

PEDIATRICS ◽

10.1542/peds.92.2.300 ◽

1993 ◽

Vol 92 (2) ◽

pp. 300-301

Author(s):

DOREN FREDRICKSON

Keyword(s):

Breast Feeding ◽

Statistical Power ◽

White Women ◽

Small Sample Size ◽

False Negative ◽

Small Sample ◽

Bottle Feeding ◽

Feeding Duration ◽

Breast Feed ◽

The Impact

To the Editor.— I wish to comment on the study reported by Cronenwett et al,1 which was a fascinating prospective study among married white women who planned to breast-feed. Women were randomly selected to perform either exdusive breast-feeding or partial breast-feeding with bottled human milk supplements to determine the impact of infant temperament and limited bottle-feeding on breast-feeding duration. The authors admit that small sample size and lack of statistical power make a false-negative possible.

Download Full-text

The Impact of Variants at Branchpoint Splicing Elements in Cancer Genes

SSRN Electronic Journal ◽

10.2139/ssrn.3933049 ◽

2021 ◽

Author(s):

Daffodil Canson ◽

Troy Dumenil ◽

Michael Parsons ◽

Tracy O’Mara ◽

Aimee Davidson ◽

...

Keyword(s):

Cancer Genes ◽

The Impact

Download Full-text

Model Averaging with AIC Weights for Hypothesis Testing of Hormesis at Low Doses

Dose-Response ◽

10.1177/1559325817715314 ◽

2017 ◽

Vol 15 (2) ◽

pp. 155932581771531

Author(s):

Steven B. Kim ◽

Nathan Sanders

Keyword(s):

Hypothesis Testing ◽

Dose Response ◽

Statistical Power ◽

Parametric Model ◽

Semiparametric Models ◽

Parametric Models ◽

Large Sample Size ◽

Low Doses ◽

P Values ◽

The Impact

For many dose–response studies, large samples are not available. Particularly, when the outcome of interest is binary rather than continuous, a large sample size is required to provide evidence for hormesis at low doses. In a small or moderate sample, we can gain statistical power by the use of a parametric model. It is an efficient approach when it is correctly specified, but it can be misleading otherwise. This research is motivated by the fact that data points at high experimental doses have too much contribution in the hypothesis testing when a parametric model is misspecified. In dose–response analyses, to account for model uncertainty and to reduce the impact of model misspecification, averaging multiple models have been widely discussed in the literature. In this article, we propose to average semiparametric models when we test for hormesis at low doses. We show the different characteristics of averaging parametric models and averaging semiparametric models by simulation. We apply the proposed method to real data, and we show that P values from averaged semiparametric models are more credible than P values from averaged parametric methods. When the true dose–response relationship does not follow a parametric assumption, the proposed method can be an alternative robust approach.

Download Full-text

New approaches to increase statistical power in TBI trials: Insights from the IMPACT study

Reconstructive Neurosurgery - Acta Neurochirurgica Supplementum ◽

10.1007/978-3-211-78205-7_20 ◽

2009 ◽

pp. 119-124 ◽

Cited By ~ 9

Author(s):

Andrew I. R. Maas ◽

H. F. Lingsma ◽

Keyword(s):

Statistical Power ◽

Impact Study ◽

New Approaches ◽

The Impact

Download Full-text

A Statistical Framework for Evolutionary Analysis of Recurrent Somatic Mutations in Cancers

10.1101/2020.04.10.036095 ◽

2020 ◽

Author(s):

Xun Gu

Keyword(s):

Somatic Mutations ◽

Cancer Genomics ◽

Computational Procedure ◽

The Cancer Genome Atlas ◽

Component Model ◽

Evolutionary Analysis ◽

Cancer Genes ◽

Component Mixture ◽

Two Component ◽

Empirical Bayesian Method

AbstractCurrent cancer genomics databases have accumulated millions of somatic mutations that remain to be further explored, faciltating enormous high throuput analyses to explore the underlying mechanisms that may contribute to malignant initiation or progression. In the context of over-dominant passenger mutations (unrelated to cancers), the challenge is to identify somatic mutations that are cancer-driving. Under the notion that carcinogenesis is a form of somatic-cell evolution, we developed a two-component mixture model that enables to accomplish the following analyses. (i) We formulated a quasi-likelihood approach to test whether the two-component model is significantly better than a single-component model, which can be used for new cancer gene predicting. (ii) We implemented an empirical Bayesian method to calculate the posterior probabilities of a site to be cancer-driving for all sites of a gene, which can be used for new driving site predicting. (iii) We developed a computational procedure to calculate the somatic selection intensity at driver sites and passenger sites, respectively, as well as site-specific profiles for all sites. Using these newly-developed methods, we comprehensively analyzed 294 known cancer genes based on The Cancer Genome Atlas (TCGA) database.

Download Full-text