scholarly journals A flexible summary-based colocalization method with application to the mucin Cystic Fibrosis lung disease modifier locus

2021 ◽  
Author(s):  
Fan Wang ◽  
Naim Panjwani ◽  
Wang Cheng ◽  
Lei Sun ◽  
Lisa J Strug

Mucus obstruction is a central feature in the Cystic Fibrosis (CF) airways. A genome-wide association study (GWAS) of lung disease by the CF Gene Modifier Consortium (CFGMC) identified a significant locus containing two mucin genes, MUC20 and MUC4. Expression quantitative trait locus (eQTL) analysis using human nasal epithelial (HNE) from 94 CF Canadians in the CFGMC demonstrated MUC4 eQTLs that mirrored the lung association pattern in the region, suggesting that MUC4 expression may mediate CF lung disease. Complications arose, however, with colocalization testing using existing methods: the locus is complex and the associated SNPs span a 0.2Mb region with high linkage disequilibrium and evidence of eQTLs for multiple genes and tissues (heterogeneity). We previously developed the Simple Sum (SS), a powerful colocalization test in regions with heterogeneity, but SS assumed eQTLs to be present to achieve type I error control. Here we propose a two-stage SS (SS2) colocalization test that avoids a prior eQTL assumptions, accounts for multiple hypothesis testing and the composite null hypothesis and enables meta-analysis. We compare SS2 to published approaches through simulation and demonstrate type I error control for all settings with the greatest power in the presence of high LD and heterogeneity. Applying SS2 to the MUC20/MUC4 CF lung disease locus with eQTLs from CF HNE revealed significant colocalization with MUC4 (p = 1.71×10-5) rather than MUC20. The SS2 is a powerful method to inform the responsible gene(s) at a locus and guide future functional studies. SS2 has been implemented in the application LocusFocus (locusfocus.research.sickkids.ca).

Biometrika ◽  
2019 ◽  
Vol 106 (3) ◽  
pp. 651-651
Author(s):  
Yang Liu ◽  
Wei Sun ◽  
Alexander P Reiner ◽  
Charles Kooperberg ◽  
Qianchuan He

Summary Genetic pathway analysis has become an important tool for investigating the association between a group of genetic variants and traits. With dense genotyping and extensive imputation, the number of genetic variants in biological pathways has increased considerably and sometimes exceeds the sample size $n$. Conducting genetic pathway analysis and statistical inference in such settings is challenging. We introduce an approach that can handle pathways whose dimension $p$ could be greater than $n$. Our method can be used to detect pathways that have nonsparse weak signals, as well as pathways that have sparse but stronger signals. We establish the asymptotic distribution for the proposed statistic and conduct theoretical analysis on its power. Simulation studies show that our test has correct Type I error control and is more powerful than existing approaches. An application to a genome-wide association study of high-density lipoproteins demonstrates the proposed approach.


Genes ◽  
2021 ◽  
Vol 12 (3) ◽  
pp. 441
Author(s):  
Fanny Pineau ◽  
Davide Caimmi ◽  
Sylvie Taviaux ◽  
Maurane Reveil ◽  
Laura Brosseau ◽  
...  

Cystic fibrosis (CF) is a chronic genetic disease that mainly affects the respiratory and gastrointestinal systems. No curative treatments are available, but the follow-up in specialized centers has greatly improved the patient life expectancy. Robust biomarkers are required to monitor the disease, guide treatments, stratify patients, and provide outcome measures in clinical trials. In the present study, we outline a strategy to select putative DNA methylation biomarkers of lung disease severity in cystic fibrosis patients. In the discovery step, we selected seven potential biomarkers using a genome-wide DNA methylation dataset that we generated in nasal epithelial samples from the MethylCF cohort. In the replication step, we assessed the same biomarkers using sputum cell samples from the MethylBiomark cohort. Of interest, DNA methylation at the cg11702988 site (ATP11A gene) positively correlated with lung function and BMI, and negatively correlated with lung disease severity, P. aeruginosa chronic infection, and the number of exacerbations. These results were replicated in prospective sputum samples collected at four time points within an 18-month period and longitudinally. To conclude, (i) we identified a DNA methylation biomarker that correlates with CF severity, (ii) we provided a method to easily assess this biomarker, and (iii) we carried out the first longitudinal analysis of DNA methylation in CF patients. This new epigenetic biomarker could be used to stratify CF patients in clinical trials.


1979 ◽  
Vol 4 (1) ◽  
pp. 14-23 ◽  
Author(s):  
Juliet Popper Shaffer

If used only when a preliminary F test yields significance, the usual multiple range procedures can be modified to increase the probability of detecting differences without changing the control of Type I error. The modification consists of a reduction in the critical value when comparing the largest and smallest means. Equivalence of modified and unmodified procedures in error control is demonstrated. The modified procedure is also compared with the alternative of using the unmodified range test without a preliminary F test, and it is shown that each has advantages over the other under some circumstances.


2015 ◽  
Vol 2015 ◽  
pp. 1-7 ◽  
Author(s):  
Guogen Shan ◽  
Amei Amei ◽  
Daniel Young

Sensitivity and specificity are often used to assess the performance of a diagnostic test with binary outcomes. Wald-type test statistics have been proposed for testing sensitivity and specificity individually. In the presence of a gold standard, simultaneous comparison between two diagnostic tests for noninferiority of sensitivity and specificity based on an asymptotic approach has been studied by Chen et al. (2003). However, the asymptotic approach may suffer from unsatisfactory type I error control as observed from many studies, especially in small to medium sample settings. In this paper, we compare three unconditional approaches for simultaneously testing sensitivity and specificity. They are approaches based on estimation, maximization, and a combination of estimation and maximization. Although the estimation approach does not guarantee type I error, it has satisfactory performance with regard to type I error control. The other two unconditional approaches are exact. The approach based on estimation and maximization is generally more powerful than the approach based on maximization.


Trials ◽  
2015 ◽  
Vol 16 (S2) ◽  
Author(s):  
Deepak Parashar ◽  
Jack Bowden ◽  
Colin Starr ◽  
Lorenz Wernisch ◽  
Adrian Mander

Author(s):  
Aaron T. L. Lun ◽  
Gordon K. Smyth

AbstractRNA sequencing (RNA-seq) is widely used to study gene expression changes associated with treatments or biological conditions. Many popular methods for detecting differential expression (DE) from RNA-seq data use generalized linear models (GLMs) fitted to the read counts across independent replicate samples for each gene. This article shows that the standard formula for the residual degrees of freedom (d.f.) in a linear model is overstated when the model contains fitted values that are exactly zero. Such fitted values occur whenever all the counts in a treatment group are zero as well as in more complex models such as those involving paired comparisons. This misspecification results in underestimation of the genewise variances and loss of type I error control. This article proposes a formula for the reduced residual d.f. that restores error control in simulated RNA-seq data and improves detection of DE genes in a real data analysis. The new approach is implemented in the quasi-likelihood framework of the edgeR software package. The results of this article also apply to RNA-seq analyses that apply linear models to log-transformed counts, such as those in the limma software package, and more generally to any count-based GLM where exactly zero fitted values are possible.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. 4568-4568 ◽  
Author(s):  
Jean-Christophe Pignon ◽  
Opeyemi Jegede ◽  
Sachet A Shukla ◽  
David A. Braun ◽  
Christine Horak ◽  
...  

4568 Background: hERV levels positively correlate with tumor immune infiltrate and were recently shown to be associated with clinical benefit to PD-1/PD-L1 blockade in two small cohorts of patients (pts) with mccRCC (Smith C.C. et al and Panda A. et al; 2018). We tested whether hERV levels correlate with efficacy of nivolumab in a prospective phase II study of pts with mccRCC (Checkmate 010). Methods: Reverse transcribed RNA extracted from 99 FFPE pretreatment tumors were analyzed by RT-qPCR to assess levels of pan- ERVE4, pan- ERV3.2, hERV4700 GAG or ENV, and the reference genes 18S and HPRT1. Normalized hERV levels were transformed as categorical value (high or low) using population quartiles as cutoffs. For each cutoff, samples with non-quantifiable hERV levels for which the limit of quantification was above the tested cutoff could not be categorized and were excluded from analysis. Log rank test was used to test the association of hERV levels with PFS/irPFS (RECISTv1.1/irRECIST) at each cutoff using Holm-Bonferroni correction for Type I error control; adjusted P-values are reported. Fisher’s exact test was then used to explore the association with ORR/irORR (RECISTv1.1/irRECIST). Results: Among the hERV studied, only hERV4700 ENV was significantly associated with PFS/irPFS. At the 25th percentile cutoff, 45 pts had high levels of hERV4700 ENV and 24 pts had low levels of hERV4700 ENV. Median PFS and irPFS were significantly longer in the high- hERV4700 ENV group [7.0 (95% CI: 2.2 - 10.2) and 8.5 (95% CI: 4.2 - 14.1) months, respectively] versus the low- hERV4700 ENV group [2.6 (95% CI: 1.4 - 5.4) and 2.9 (95% CI: 1.4 - 5.7) months, respectively] ( P = 0.010 for PFS and P = 0.028 for irPFS). At the same cutoff, ORR and irORR rates were significantly higher in the high- hERV4700 ENV group [35.6 (95% CI: 21.9 - 51.2) % for both ORR/irORR] versus the low- hERV4700 ENV group [12.5 (95% CI: 2.7 - 32.4) and 8.3 (95% CI: 1.0 - 27.0) %, respectively] ( P = 0.036 for ORR and P = 0.012 for irORR). Conclusions: hERV4700 ENV levels may predict outcome on nivolumab in mccRCC. Validation of our results and correlation of hERV levels with immune markers in a controlled phase III trial (CheckMate 025) is ongoing.


2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Alyssa Counsell ◽  
Robert Philip Chalmers ◽  
Robert A. Cribbie

Comparing the means of independent groups is a concern when the assumptions of normality and variance homogeneity are violated. Robust means modeling (RMM) was proposed as an alternative to ANOVA-type procedures when the assumptions of normality and variance homogeneity are violated. The purpose of this study is to compare the Type I error and power rates of RMM to the trimmed Welch procedure. A Monte Carlo study was used to investigate RMM and the trimmed Welch procedure under several conditions of nonnormality and variance heterogeneity. The results suggest that the trimmed Welch provides a better balance of Type I error control and power than RMM.


2021 ◽  
pp. 096228022110336
Author(s):  
Chi Chang ◽  
Thomas Jaki ◽  
Muhammad Saad Sadiq ◽  
Alena Kuhlemeier ◽  
Daniel Feaster ◽  
...  

An important goal of personalized medicine is to identify heterogeneity in treatment effects and then use that heterogeneity to target the intervention to those most likely to benefit. Heterogeneity is assessed using the predicted individual treatment effects framework, and a permutation test is proposed to establish if significant heterogeneity is present given the covariates and predictive model or algorithm used for predicted individual treatment effects. We first show evidence for heterogeneity in the effects of treatment across an illustrative example data set. We then use simulations with two different predictive methods (linear regression model and Random Forests) to show that the permutation test has adequate type-I error control. Next, we use an example dataset as the basis for simulations to demonstrate the ability of the permutation test to find heterogeneity in treatment effects for a predicted individual treatment effects estimate as a function of both effect size and sample size. We find that the proposed test has good power for detecting heterogeneity in treatment effects when the heterogeneity was due primarily to a single predictor, or when it was spread across the predictors. Power was found to be greater for predictions from a linear model than from random forests. This non-parametric permutation test can be used to test for significant differences across individuals in predicted individual treatment effects obtained with a given set of covariates using any predictive method with no additional assumptions.


2006 ◽  
Vol 3 (1) ◽  
Author(s):  
Sharipah Syed Yahaya ◽  
Abdul Othman ◽  
Harvey Keselman

Nonnormality and variance heterogeneity affect the validity of the traditional tests for treatment group equality (e.g. ANOVA F-test and t-test), particularly when group sizes are unequal. Adopting trimmed means instead of the usual least squares estimator has been shown to be mostly affective in combating the deleterious effects of nonnormality. There are, however, practical concerns regarding trimmed means, such as the predetermined amount of symmetric trimming that is typically used. Wilcox and Keselman proposed the Modified One- Step M-estimator (MOM) which empirically determines the amount of trimming. Othman et al. found that when this estimator is used with Schrader and Hettmansperger's H statistic, rates of Type I error were well controlled even though data were nonnormal in form. In this paper, we modified the criterion for choosing the sample values for MOM by replacing the default scale estimator, MADn, with two robust scale estimators, Sn and Tn , suggested by Rousseeuw and Croux (1993). To study the robustness of the modified methods, conditions that are known to negatively affect rates of Type I error were manipulated. As well, a bootstrap method was used to generate a better approximate sampling distribution since the null distribution of MOM-H is intractable. These modified methods resulted in better Type I error control especially when data were extremely skewed.


Sign in / Sign up

Export Citation Format

Share Document