scholarly journals Large-scale trans-ethnic replication and discovery of genetic associations for rare diseases with self-reported medical data

Author(s):  
Suyash S Shringarpure ◽  
Wei Wang ◽  
Yunxuan Jiang ◽  
Alison Acevedo ◽  
Devika Dhamija ◽  
...  

A key challenge in the study of rare disease genetics is assembling large case cohorts for well- powered studies. We demonstrate the use of self-reported diagnosis data to study rare diseases at scale. We performed genome-wide association studies (GWAS) for 33 rare diseases using self-reported diagnosis phenotypes and re-discovered 29 known associations to validate our approach. In addition, we performed the first GWAS for Duane retraction syndrome, vestibular schwannoma and spontaneous pneumothorax, and report novel genome-wide significant associations for these diseases. We replicated these novel associations in non-European populations within the 23andMe, Inc. cohort as well as in the UK Biobank cohort. We also show that mixed model analyses including all ethnicities and related samples increase the power for finding associations in rare diseases. Our results, based on analysis of 19,084 rare disease cases for 33 diseases from 7 populations, show that large-scale online collection of self-reported data is a viable method for discovery and replication of genetic associations for rare diseases. This approach, which is complementary to sequencing-based approaches, will enable the discovery of more novel genetic associations for increasingly rare diseases across multiple ancestries and shed more light on the genetic architecture of rare diseases.

2021 ◽  
Author(s):  
Runqing Yang ◽  
Yuxin Song ◽  
Li Jiang ◽  
Zhiyu Hao ◽  
Runqing Yang

Abstract Complex computation and approximate solution hinder the application of generalized linear mixed models (GLMM) into genome-wide association studies. We extended GRAMMAR to handle binary diseases by considering genomic breeding values (GBVs) estimated in advance as a known predictor in genomic logit regression, and then controlled polygenic effects by regulating downward genomic heritability. Using simulations and case analyses, we showed in optimizing GRAMMAR, polygenic effects and genomic controls could be evaluated using the fewer sampling markers, which extremely simplified GLMM-based association analysis in large-scale data. In addition, joint analysis for quantitative trait nucleotide (QTN) candidates chosen by multiple testing offered significant improved statistical power to detect QTNs over existing methods.


2021 ◽  
Author(s):  
Tomas W Fitzgerald ◽  
Ewan Birney

Copy number variation (CNV) has long been known to influence human traits having a rich history of research into common and rare genetic disease and although CNV is accepted as an important class of genomic variation, progress on copy number (CN) phenotype associations from Next Generation Sequencing data (NGS) has been limited, in part, due to the relative difficulty in CNV detection and an enrichment for large numbers of false positives. To date most successful CN genome wide association studies (CN-GWAS) have focused on using predictive measures of dosage intolerance or gene burden tests to gain sufficient power for detecting CN effects. Here we present a novel method for large scale CN analysis from NGS data generating robust CN estimates and allowing CN-GWAS to be performed genome wide in discovery mode. We provide a detailed analysis in the large scale UK BioBank resource and a specifically designed software package for deriving CN estimates from NGS data that are robust enough to be used for CN-GWAS. We use these methods to perform genome wide CN-GWAS analysis across 78 human traits discovering 862 genetic associations that are likely to contribute strongly to trait distributions based solely on their CN or by acting in concert with other genetic variation. Finally, we undertake an analysis comparing CNV and SNP association signals across the same traits and samples, defining specific CNV association classes based on whether they could be detected using standard SNP-GWAS in the UK Biobank.


2012 ◽  
Vol 15 (3) ◽  
pp. 414-418 ◽  
Author(s):  
Nic M. Novak ◽  
Jason L. Stein ◽  
Sarah E. Medland ◽  
Derrek P. Hibar ◽  
Paul M. Thompson ◽  
...  

In an attempt to increase power to detect genetic associations with brain phenotypes derived from human neuroimaging data, we recently conducted a large-scale, genome-wide association meta-analysis of hippocampal, brain, and intracranial volume through the Enhancing NeuroImaging Genetics through Meta-Analysis (ENIGMA) consortium. Here, we present a freely available online interactive tool, EnigmaVis, which makes it easy to visualize the association results generated by the consortium alongside allele frequency, genes, and functional annotations. EnigmaVis runs natively within the web browser, and generates plots that show the level of association between brain phenotypes at user-specified genomic positions. Uniquely, EnigmaVis is dynamic; users can interact with elements on the plot in real time. This software will be useful when exploring the effect on brain structure of particular genetic variants influencing neuropsychiatric illness and cognitive function. Future projects of the consortium and updates to EnigmaVis will also be displayed on the site. EnigmaVis is freely available online at http://enigma.loni.ucla.edu/enigma-vis/


2020 ◽  
Vol 26 (5) ◽  
pp. 576-581 ◽  
Author(s):  
Nikolaos A Patsopoulos ◽  
Philip L De Jager

Multiple sclerosis (MS) exhibits a well-documented increased incidence in individuals with respective family history, that is, is a heritable disease. In the last decade, genome-wide association studies have enabled the agnostic interrogation of the whole genome at a large scale. To date, over 200 genetic associations have been described at the strict level of genome-wide significance. Our current understanding of MS genetics can explain up to half of the disease’s heritability, raising the important question of whether this is enough information to leverage toward improving diagnosis in MS. Parallel advancements in technologies that allow the characterization of the full transcriptome down to the single-cell level have enabled the generation of an unprecedented wealth of information. Transcriptional changes of putative causal cells could be utilized to identify early signs of disease onset. These recent findings in genetics and genomics, coupled with new technologies and deeply phenotyped cohorts, have the potential to improve the diagnosis of MS.


2014 ◽  
Author(s):  
Minsun Song ◽  
Wei Hao ◽  
John D. Storey

We present a new statistical test of association between a trait and genetic markers, which we theoretically and practically prove to be robust to arbitrarily complex population structure. The statistical test involves a set of parameters that can be directly estimated from large-scale genotyping data, such as that measured in genome-wide association studies (GWAS). We also derive a new set of methodologies, called a genotype-conditional association test (GCAT), shown to provide accurate association tests in populations with complex structures, manifested in both the genetic and environmental contributions to the trait. We demonstrate the proposed method on a large simulation study and on the Northern Finland Birth Cohort study. In the Finland study, we identify several new significant loci that other methods do not detect. Our proposed framework provides a substantially different approach to the problem from existing methods, such as the linear mixed model and principal component approaches.


2020 ◽  
Author(s):  
Phyllis M. Thangaraj ◽  
Undina Gisladottir ◽  
Nicholas P. Tatonetti

AbstractGenome-wide association studies (GWAS) may require enrollment of up to millions of participants to power variant discovery. This requires manual curation of cases and controls with large-scale collaborations. Biobanks connected to electronic health records (EHR) can facilitate these studies by using data from clinical care systems, like billing diagnosis codes, as phenotypes. These systems, however, do not define adjudicated cases and controls. We developed QTPhenProxy, a machine learning model that adds nuance to cohort classification by assigning everyone in a cohort a probability of having the study disease. We then ran a GWAS using the probabilities as a quantitative trait. With an order of magnitude fewer cases than the largest stroke GWAS, our method outperformed previous methods at replicating known variants in stroke and discovered a novel variant in ABCG8 associated with intracerebral hemorrhage in the UK Biobank that replicated in the MEGASTROKE GWA meta-analysis. QTPhenProxy expands traditional phenotyping to improve the power of GWAS.


2021 ◽  
Author(s):  
Runqing Yang ◽  
Yuxin Song ◽  
Li Jiang ◽  
Zhiyu Hao ◽  
Runqing Yang

Abstract Complex computation and approximate solution hinder the application of generalized linear mixed models (GLMM) into genome-wide association studies. We extended GRAMMAR to handle binary diseases by considering genomic breeding values (GBVs) estimated in advance as a known predictor in genomic logit regression, and then controlled polygenic effects by regulating downward genomic heritability. Using simulations and case analyses, we showed in optimizing GRAMMAR, polygenic effects and genomic controls could be evaluated using the fewer sampling markers, which extremely simplified GLMM-based association analysis in large-scale data. In addition, joint analysis for quantitative trait nucleotide (QTN) candidates chosen by multiple testing offered significant improved statistical power to detect QTNs over existing methods.


2019 ◽  
Author(s):  
Alexander S. Hatoum ◽  
Claire L. Morrison ◽  
Evann C. Mitchell ◽  
Max Lam ◽  
Chelsie E. Benca-Bachman ◽  
...  

AbstractDeficits in executive functions (EFs), cognitive processes that control goal-directed behaviors, are associated with psychopathology and neurological disorders. Little is known about the molecular bases of EF individual differences; existing EF genome-wide association studies (GWAS) used small sample sizes and/or focused on individual tasks that are imprecise measures of EF. We conducted a GWAS of a Common EF (cEF) factor based on multiple tasks in the UK Biobank (N=427,037 European-descent individuals), finding 129 independent genome-wide significant lead variants in 112 distinct loci. cEF was associated with fast synaptic transmission processes (synaptic, potassium channel, and GABA pathways) in gene-based analyses. cEF was genetically correlated with measures of intelligence (IQ) and cognitive processing speed, but cEF and IQ showed differential genetic associations with psychiatric disorders and educational attainment. Results suggest that cEF is a genetically distinct cognitive construct that is particularly relevant to understanding the genetic variance in psychiatric disorders.


Sign in / Sign up

Export Citation Format

Share Document