A selective inference approach for false discovery rate control using multiomics covariates yields insights into disease risk

Ronald Yurko; Max G’Sell; Kathryn Roeder; Bernie Devlin

doi:10.1073/pnas.1918862117

A selective inference approach for false discovery rate control using multiomics covariates yields insights into disease risk

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1918862117 ◽

2020 ◽

Vol 117 (26) ◽

pp. 15028-15035 ◽

Cited By ~ 1

Author(s):

Ronald Yurko ◽

Max G’Sell ◽

Kathryn Roeder ◽

Bernie Devlin

Keyword(s):

Gene Expression ◽

Multiple Testing ◽

Disease Risk ◽

Association Studies ◽

Auxiliary Information ◽

Primary Data ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Hypothesis Tests ◽

Selective Inference

To correct for a large number of hypothesis tests, most researchers rely on simple multiple testing corrections. Yet, new methodologies of selective inference could potentially improve power while retaining statistical guarantees, especially those that enable exploration of test statistics using auxiliary information (covariates) to weight hypothesis tests for association. We explore one such method, adaptiveP-value thresholding (AdaPT), in the framework of genome-wide association studies (GWAS) and gene expression/coexpression studies, with particular emphasis on schizophrenia (SCZ). Selected SCZ GWAS associationPvalues play the role of the primary data for AdaPT; single-nucleotide polymorphisms (SNPs) are selected because they are gene expression quantitative trait loci (eQTLs). This natural pairing of SNPs and genes allow us to map the following covariate values to these pairs: GWAS statistics from genetically correlated bipolar disorder, the effect size of SNP genotypes on gene expression, and gene–gene coexpression, captured by subnetwork (module) membership. In all, 24 covariates per SNP/gene pair were included in the AdaPT analysis using flexible gradient boosted trees. We demonstrate a substantial increase in power to detect SCZ associations using gene expression information from the developing human prefrontal cortex. We interpret these results in light of recent theories about the polygenic nature of SCZ. Importantly, our entire process for identifying enrichment and creating features with independent complementary data sources can be implemented in many different high-throughput settings to ultimately improve power.

Get full-text (via PubEx)

A selective inference approach for FDR control using multi-omics covariates yields insights into disease risk

10.1101/806471 ◽

2019 ◽

Author(s):

Ronald Yurko ◽

Max G’Sell ◽

Kathryn Roeder ◽

Bernie Devlin

Keyword(s):

Gene Expression ◽

Multiple Testing ◽

Disease Risk ◽

Association Studies ◽

Auxiliary Information ◽

Primary Data ◽

P Value ◽

Genome Wide Association Studies ◽

Hypothesis Tests ◽

Selective Inference

AbstractTo correct for a large number of hypothesis tests, most researchers rely on simple multiple testing corrections. Yet, new methodologies of selective inference could potentially improve power while retaining statistical guarantees, especially those that enable exploration of test statistics using auxiliary information (covariates) to weight hypothesis tests for association. We explore one such method, adaptive p-value thresholding (Lei & Fithian 2018, AdaPT), in the framework of genome-wide association studies (GWAS) and gene expression/coexpression studies, with particular emphasis on schizophrenia (SCZ). Selected SCZ GWAS association p-values play the role of the primary data for AdaPT; SNPs are selected because they are gene expression quantitative trait loci (eQTLs). This natural pairing of SNPs and genes allow us to map the following covariate values to these pairs: GWAS statistics from genetically-correlated bipolar disorder, the effect size of SNP genotypes on gene expression, and gene-gene coexpression, captured by subnetwork (module) membership. In all 24 covariates per SNP/gene pair were included in the AdaPT analysis using flexible gradient boosted trees. We demonstrate a substantial increase in power to detect SCZ associations using gene expression information from the developing human prefontal cortex (Werling et al. 2019). We interpret these results in light of recent theories about the polygenic nature of SCZ. Importantly, our entire process for identifying enrichment and creating features with independent complementary data sources can be implemented in many different high-throughput settings to ultimately improve power.

Get full-text (via PubEx)

Abstract 288: Novel Mechanisms of Non-coding Genomic Regulation Identified in Cardiac Disease-in-a-dish Models

Circulation Research ◽

10.1161/res.119.suppl_1.288 ◽

2016 ◽

Vol 119 (suppl_1) ◽

Author(s):

Aditya Kumar ◽

Stephanie Thomas ◽

Kirsten Wong ◽

Kevin Tenerelli ◽

Valentina Lo Sardo ◽

...

Keyword(s):

Heart Attack ◽

Connexin 43 ◽

Disease Risk ◽

Association Studies ◽

Correlation Coefficients ◽

Induced Pluripotent Stem Cell ◽

Genome Wide Association Studies ◽

Isogenic Line ◽

Nucleotide Polymorphisms ◽

Increased Risk

Genome-wide association studies have identified single nucleotide polymorphisms (SNPs) at gene loci that affect cardiovascular function, and while mechanisms in protein-coding loci are obvious, those in non-coding loci are difficult to determine. 9p21 is a recently identified locus associated with increased risk of coronary artery disease (CAD) and myocardial infarction. Associations have implicated SNPs in altering smooth muscle and endothelial cell properties but have not identified adverse effects in cardiomyocytes (CMs) despite enhanced disease risk. Using induced pluripotent stem cell-derived CMs from patients that are homozygous risk/risk (R/R) and non-risk/non-risk (N/N) for 9p21 SNPs and either CAD positive or negative, we assessed CM function when cultured on hydrogels capable of mimicking the fibrotic stiffening associated with disease post-heart attack, i.e. “heart attack-in-a-dish” stiffening from 11 kiloPascals (kPa) to 50 kPa. While all CMs independent of genotype and disease beat synchronously on soft matrices, R/R CMs cultured on dynamically stiffened hydrogels exhibited asynchronous contractions and had significantly lower correlation coefficients versus N/N CMs in the same conditions. Dynamic stiffening reduced connexin 43 expression and gap junction assembly in R/R CMs but not N/N CMs. To eliminate patient-to-patient variability, we created an isogenic line by deleting the 9p21 gene locus from a R/R patient using TALEN-mediated gene editing, i.e. R/R KO. Deletion of the 9p21 locus restored synchronous contractility and organized connexin 43 junctions. As a non-coding locus, 9p21 appears to repress connexin transcription, leading to the phenotypes we observe, but only when the niche is stiffened as in disease. These data are the first to demonstrate that disease-specific niche remodeling, e.g. a “heart attack-in-a-dish” model, can differentially affect CM function depending on SNPs within a non-coding locus.

Get full-text (via PubEx)

Analysis of chromatin organization and gene expression in T cells identifies functional genes for rheumatoid arthritis

10.1101/827923 ◽

2019 ◽

Author(s):

Jing Yang ◽

Amanda McGovern ◽

Paul Martin ◽

Kate Duffus ◽

Xiangyu Ge ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Gene Expression ◽

T Cells ◽

Complex Disease ◽

Target Genes ◽

Disease Risk ◽

Association Studies ◽

Dna Interaction ◽

Genome Wide Association Studies ◽

Causal Genes

AbstractGenome-wide association studies have identified genetic variation contributing to complex disease risk. However, assigning causal genes and mechanisms has been more challenging because disease-associated variants are often found in distal regulatory regions with cell-type specific behaviours. Here, we collect ATAC-seq, Hi-C, Capture Hi-C and nuclear RNA-seq data in stimulated CD4+ T-cells over 24 hours, to identify functional enhancers regulating gene expression. We characterise changes in DNA interaction and activity dynamics that correlate with changes gene expression, and find that the strongest correlations are observed within 200 kb of promoters. Using rheumatoid arthritis as an example of T-cell mediated disease, we demonstrate interactions of expression quantitative trait loci with target genes, and confirm assigned genes or show complex interactions for 20% of disease associated loci, including FOXO1, which we confirm using CRISPR/Cas9.

Get full-text (via PubEx)

Associations of Two Obesity-Related Single-Nucleotide Polymorphisms with Adiponectin in Chinese Children

International Journal of Endocrinology ◽

10.1155/2017/6437542 ◽

2017 ◽

Vol 2017 ◽

pp. 1-5 ◽

Cited By ~ 1

Author(s):

Lijun Wu ◽

Liwang Gao ◽

Xiaoyuan Zhao ◽

Meixian Zhang ◽

Jianxin Wu ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Multiple Testing ◽

Association Studies ◽

Statistical Significance ◽

Additive Model ◽

Recessive Model ◽

Chinese Children ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide

Purpose. Genome-wide association studies have found two obesity-related single-nucleotide polymorphisms (SNPs), rs17782313 near the melanocortin-4 receptor (MC4R) gene and rs6265 near the brain-derived neurotrophic factor (BDNF) gene, but the associations of both SNPs with other obesity-related traits are not fully described, especially in children. The aim of the present study is to investigate the associations between the SNPs and adiponectin that has a regulatory role in glucose and lipid metabolism. Methods. We examined the associations of the SNPs with adiponectin in Beijing Child and Adolescent Metabolic Syndrome (BCAMS) study. A total of 3503 children participated in the study. Results. The SNP rs6265 was significantly associated with adiponectin under an additive model (P=0.02 and 0.024, resp.) after adjustment for age, gender, and BMI or obesity statuses. The SNP rs17782313 was significantly associated with low adiponectin under a recessive model. No statistical significance was found between the two SNPs and low adiponectin after correction for multiple testing. Conclusion. We demonstrate for the first time that the SNP rs17782313 near MC4R and the SNP rs6265 near BDNF are associated with adiponectin in Chinese children. These novel findings provide important evidence that adiponectin possibly mediates MC4R and BDNF involved in obesity.

Get full-text (via PubEx)

Combinatorial and statistical prediction of gene expression from haplotype sequence

Bioinformatics ◽

10.1093/bioinformatics/btaa318 ◽

2020 ◽

Vol 36 (Supplement_1) ◽

pp. i194-i202

Author(s):

Berk A Alpay ◽

Pinar Demetci ◽

Sorin Istrail ◽

Derek Aguiar

Keyword(s):

Gene Expression ◽

Multiple Testing ◽

Association Studies ◽

Classification Problem ◽

Statistical Prediction ◽

Model Complexity ◽

Supplementary Information ◽

Prediction Methods ◽

Genome Wide Association Studies ◽

Regulatory Effects

Abstract Motivation Genome-wide association studies (GWAS) have discovered thousands of significant genetic effects on disease phenotypes. By considering gene expression as the intermediary between genotype and disease phenotype, expression quantitative trait loci studies have interpreted many of these variants by their regulatory effects on gene expression. However, there remains a considerable gap between genotype-to-gene expression association and genotype-to-gene expression prediction. Accurate prediction of gene expression enables gene-based association studies to be performed post hoc for existing GWAS, reduces multiple testing burden, and can prioritize genes for subsequent experimental investigation. Results In this work, we develop gene expression prediction methods that relax the independence and additivity assumptions between genetic markers. First, we consider gene expression prediction from a regression perspective and develop the HAPLEXR algorithm which combines haplotype clusterings with allelic dosages. Second, we introduce the new gene expression classification problem, which focuses on identifying expression groups rather than continuous measurements; we formalize the selection of an appropriate number of expression groups using the principle of maximum entropy. Third, we develop the HAPLEXD algorithm that models haplotype sharing with a modified suffix tree data structure and computes expression groups by spectral clustering. In both models, we penalize model complexity by prioritizing genetic clusters that indicate significant effects on expression. We compare HAPLEXR and HAPLEXD with three state-of-the-art expression prediction methods and two novel logistic regression approaches across five GTEx v8 tissues. HAPLEXD exhibits significantly higher classification accuracy overall; HAPLEXR shows higher prediction accuracy on approximately half of the genes tested and the largest number of best predicted genes (r2>0.1) among all methods. We show that variant and haplotype features selected by HAPLEXR are smaller in size than competing methods (and thus more interpretable) and are significantly enriched in functional annotations related to gene regulation. These results demonstrate the importance of explicitly modeling non-dosage dependent and intragenic epistatic effects when predicting expression. Availability and implementation Source code and binaries are freely available at https://github.com/rapturous/HAPLEX. Supplementary information Supplementary data are available at Bioinformatics online.

Get full-text (via PubEx)

Diagnosis of Human Axillary Osmidrosis by Genotyping of the HumanABCC11Gene: Clinical Practice and Basic Scientific Evidence

BioMed Research International ◽

10.1155/2016/7670483 ◽

2016 ◽

Vol 2016 ◽

pp. 1-9 ◽

Cited By ~ 12

Author(s):

Yu Toyoda ◽

Tsuneaki Gomi ◽

Hiroshi Nakagawa ◽

Makoto Nagakura ◽

Toshihisa Ishikawa

Keyword(s):

Genetic Polymorphisms ◽

Molecular Mechanisms ◽

New Technologies ◽

Disease Risk ◽

Association Studies ◽

Diagnostic Strategy ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Drug Induced ◽

Axillary Osmidrosis

The importance of personalized medicine and healthcare is becoming increasingly recognized. Genetic polymorphisms associated with potential risks of various human genetic diseases as well as drug-induced adverse reactions have recently been well studied, and their underlying molecular mechanisms are being uncovered by functional genomics as well as genome-wide association studies. Knowledge of certain genetic polymorphisms is clinically important for our understanding of interindividual differences in drug response and/or disease risk. As such evidence accumulates, new clinical applications and practices are needed. In this context, the development of new technologies for simple, fast, accurate, and cost-effective genotyping is imperative. Here, we describe a simple isothermal genotyping method capable of detecting single nucleotide polymorphisms (SNPs) in the human ATP-binding cassette (ABC) transporterABCC11gene and its application to the clinical diagnosis of axillary osmidrosis. We have recently reported that axillary osmidrosis is linked with one SNP 538G>A in theABCC11gene. Our molecular biological and biochemical studies have revealed that this SNP greatly affects the protein expression level and the function of ABCC11. In this review, we highlight the clinical relevance and importance of this diagnostic strategy in axillary osmidrosis therapy.

Get full-text (via PubEx)

Capturing SNP Association across the NK Receptor and HLA Gene Regions in Multiple Sclerosis by Targeted Penalised Regression Models

Genes ◽

10.3390/genes13010087 ◽

2021 ◽

Vol 13 (1) ◽

pp. 87

Author(s):

Sean M. Burnard ◽

Rodney A. Lea ◽

Miles Benton ◽

David Eccles ◽

Daniel W. Kennedy ◽

...

Keyword(s):

Multiple Sclerosis ◽

Complex Traits ◽

Multiple Testing ◽

Large Scale ◽

Disease Risk ◽

Association Studies ◽

Meta Analysis ◽

Elastic Net ◽

Genome Wide Association Studies ◽

Multiple Testing Correction

Conventional genome-wide association studies (GWASs) of complex traits, such as Multiple Sclerosis (MS), are reliant on per-SNP p-values and are therefore heavily burdened by multiple testing correction. Thus, in order to detect more subtle alterations, ever increasing sample sizes are required, while ignoring potentially valuable information that is readily available in existing datasets. To overcome this, we used penalised regression incorporating elastic net with a stability selection method by iterative subsampling to detect the potential interaction of loci with MS risk. Through re-analysis of the ANZgene dataset (1617 cases and 1988 controls) and an IMSGC dataset as a replication cohort (1313 cases and 1458 controls), we identified new association signals for MS predisposition, including SNPs above and below conventional significance thresholds while targeting two natural killer receptor loci and the well-established HLA loci. For example, rs2844482 (98.1% iterations), otherwise ignored by conventional statistics (p = 0.673) in the same dataset, was independently strongly associated with MS in another GWAS that required more than 40 times the number of cases (~45 K). Further comparison of our hits to those present in a large-scale meta-analysis, confirmed that the majority of SNPs identified by the elastic net model reached conventional statistical GWAS thresholds (p < 5 × 10−8) in this much larger dataset. Moreover, we found that gene variants involved in oxidative stress, in addition to innate immunity, were associated with MS. Overall, this study highlights the benefit of using more advanced statistical methods to (re-)analyse subtle genetic variation among loci that have a biological basis for their contribution to disease risk.

Get full-text (via PubEx)

Identification of Novel Susceptible Genes of Gastric Cancer Based on Integrated Omics Data

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.712020 ◽

2021 ◽

Vol 9 ◽

Author(s):

Huang Yaoxing ◽

Yu Danchun ◽

Sun Xiaojuan ◽

Jiang Shuman ◽

Yan Qingqing ◽

...

Keyword(s):

Gene Expression ◽

Gastric Cancer ◽

Association Studies ◽

Gene Expression Level ◽

Expression Level ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Coding Region ◽

Level Data ◽

Integrated Omics

Gastric cancer (GC) is one of the most common causes of cancer-related deaths in the world. This cancer has been regarded as a biological and genetically heterogeneous disease with a poorly understood carcinogenesis at the molecular level. Thousands of biomarkers and susceptible loci have been explored via experimental and computational methods, but their effects on disease outcome are still unknown. Genome-wide association studies (GWAS) have identified multiple susceptible loci for GC, but due to the linkage disequilibrium (LD), single-nucleotide polymorphisms (SNPs) may fall within the non-coding region and exert their biological function by modulating the gene expression level. In this study, we collected 1,091 cases and 410,350 controls from the GWAS catalog database. Integrating with gene expression level data obtained from stomach tissue, we conducted a machine learning-based method to predict GC-susceptible genes. As a result, we identified 787 novel susceptible genes related to GC, which will provide new insight into the genetic and biological basis for the mechanism and pathology of GC development.

Get full-text (via PubEx)

DeepWAS: Multivariate genotype-phenotype associations by directly integrating regulatory information using deep learning

10.1101/069096 ◽

2016 ◽

Cited By ~ 1

Author(s):

Janine Arloth ◽

Gökcen Eraslan ◽

Till F.M. Andlauer ◽

Jade Martins ◽

Stella Iurato ◽

...

Keyword(s):

Deep Learning ◽

Disease Risk ◽

Association Studies ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Cell Type ◽

Individual Snps ◽

Multivariate Gwas ◽

Regulatory Effects ◽

Joint Contribution

AbstractGenome-wide association studies (GWAS) identify genetic variants associated with quantitative traits or disease. Thus, GWAS never directly link variants to regulatory mechanisms, which, in turn, are typically inferred during post-hoc analyses. In parallel, a recent deep learning-based method allows for prediction of regulatory effects per variant on currently up to 1,000 cell type-specific chromatin features. We here describe “DeepWAS”, a new approach that directly integrates predictions of these regulatory effects of single variants into a multivariate GWAS setting. As a result, single variants associated with a trait or disease are, by design, coupled to their impact on a chromatin feature in a cell type. Up to 40,000 regulatory single-nucleotide polymorphisms (SNPs) were associated with multiple sclerosis (MS, 4,888 cases and 10,395 controls), major depressive disorder (MDD, 1,475 cases and 2,144 controls), and height (5,974 individuals) to each identify 43-61 regulatory SNPs, called deepSNPs, which are shown to reach at least nominal significance in large GWAS. MS- and height-specific deepSNPs resided in active chromatin and introns, whereas MDD-specific deepSNPs located mostly to intragenic regions and repressive chromatin states. We found deepSNPs to be enriched in public or cohort-matched expression and methylation quantitative trait loci and demonstrate the potential of the DeepWAS method to directly generate testable functional hypotheses based on genotype data alone. DeepWAS is an innovative GWAS approach with the power to identify individual SNPs in non-coding regions with gene regulatory capacity with a joint contribution to disease risk. DeepWAS is available at https://github.com/cellmapslab/DeepWAS.

Get full-text (via PubEx)

Unravelling the Roles Of susceptibility Loci for Autoimmune Diseases in the Post-GWAS Era

10.20944/preprints201805.0160.v1 ◽

2018 ◽

Author(s):

Jody Ye ◽

Kathleen Gillespie ◽

Santiago Rodriguez

Keyword(s):

Autoimmune Diseases ◽

Disease Risk ◽

Association Studies ◽

Copy Number Variations ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Susceptibility Loci ◽

Single Nucleotide ◽

Environmental Interactions ◽

Genome Wide

Although genome-wide association studies (GWAS) have identified several hundred loci associated with autoimmune diseases, their mechanistic insights are still poorly understood. The human genome is more complex than common single nucleotide polymorphisms (SNPs) that are interrogated by GWAS arrays. Some structural variants such as insertions-deletions, copy number variations, and minisatellites that are not very well tagged by SNPs cannot be fully explored by GWAS. Therefore, it is possible that some of these loci may have large effects on autoimmune disease risk. In addition, other layers of regulations such as gene-gene interactions, epigenetic-determinants, gene and environmental interactions also contribute to the heritability of autoimmune diseases. This review focuses on discussing why studying these elements may allow us to gain a more comprehensive understanding of the aetiology of complex autoimmune traits.

Get full-text (via PubEx)