Large-scale trans-ethnic replication and discovery of genetic associations for rare diseases with self-reported medical data

A key challenge in the study of rare disease genetics is assembling large case cohorts for well- powered studies. We demonstrate the use of self-reported diagnosis data to study rare diseases at scale. We performed genome-wide association studies (GWAS) for 33 rare diseases using self-reported diagnosis phenotypes and re-discovered 29 known associations to validate our approach. In addition, we performed the first GWAS for Duane retraction syndrome, vestibular schwannoma and spontaneous pneumothorax, and report novel genome-wide significant associations for these diseases. We replicated these novel associations in non-European populations within the 23andMe, Inc. cohort as well as in the UK Biobank cohort. We also show that mixed model analyses including all ethnicities and related samples increase the power for finding associations in rare diseases. Our results, based on analysis of 19,084 rare disease cases for 33 diseases from 7 populations, show that large-scale online collection of self-reported data is a viable method for discovery and replication of genetic associations for rare diseases. This approach, which is complementary to sequencing-based approaches, will enable the discovery of more novel genetic associations for increasingly rare diseases across multiple ancestries and shed more light on the genetic architecture of rare diseases.

Download Full-text

A mixed model reduces spurious genetic associations produced by population stratification in genome-wide association studies

Genomics ◽

10.1016/j.ygeno.2015.01.006 ◽

2015 ◽

Vol 105 (4) ◽

pp. 191-196 ◽

Cited By ~ 18

Author(s):

Jimin Shin ◽

Chaeyoung Lee

Keyword(s):

Population Stratification ◽

Mixed Model ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Genome Wide

Download Full-text

Optimal Genomic Control in Large-scale Genetic Associations for Binary Diseases

10.21203/rs.3.rs-318017/v2 ◽

2021 ◽

Author(s):

Runqing Yang ◽

Yuxin Song ◽

Li Jiang ◽

Zhiyu Hao ◽

Runqing Yang

Keyword(s):

Multiple Testing ◽

Statistical Power ◽

Large Scale ◽

Association Studies ◽

Joint Analysis ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Genomic Heritability ◽

Large Scale Data ◽

Genome Wide

Abstract Complex computation and approximate solution hinder the application of generalized linear mixed models (GLMM) into genome-wide association studies. We extended GRAMMAR to handle binary diseases by considering genomic breeding values (GBVs) estimated in advance as a known predictor in genomic logit regression, and then controlled polygenic effects by regulating downward genomic heritability. Using simulations and case analyses, we showed in optimizing GRAMMAR, polygenic effects and genomic controls could be evaluated using the fewer sampling markers, which extremely simplified GLMM-based association analysis in large-scale data. In addition, joint analysis for quantitative trait nucleotide (QTN) candidates chosen by multiple testing offered significant improved statistical power to detect QTNs over existing methods.

Download Full-text

CNest: A Novel Copy Number Association Discovery Method Uncovers 862 New Associations from 200,629 Whole Exome Sequence Datasets in the UK Biobank

10.1101/2021.08.19.456963 ◽

2021 ◽

Author(s):

Tomas W Fitzgerald ◽

Ewan Birney

Keyword(s):

Copy Number ◽

Large Scale ◽

Association Studies ◽

Genomic Variation ◽

Next Generation Sequencing Data ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Genome Wide ◽

The Uk ◽

Ngs Data

Copy number variation (CNV) has long been known to influence human traits having a rich history of research into common and rare genetic disease and although CNV is accepted as an important class of genomic variation, progress on copy number (CN) phenotype associations from Next Generation Sequencing data (NGS) has been limited, in part, due to the relative difficulty in CNV detection and an enrichment for large numbers of false positives. To date most successful CN genome wide association studies (CN-GWAS) have focused on using predictive measures of dosage intolerance or gene burden tests to gain sufficient power for detecting CN effects. Here we present a novel method for large scale CN analysis from NGS data generating robust CN estimates and allowing CN-GWAS to be performed genome wide in discovery mode. We provide a detailed analysis in the large scale UK BioBank resource and a specifically designed software package for deriving CN estimates from NGS data that are robust enough to be used for CN-GWAS. We use these methods to perform genome wide CN-GWAS analysis across 78 human traits discovering 862 genetic associations that are likely to contribute strongly to trait distributions based solely on their CN or by acting in concert with other genetic variation. Finally, we undertake an analysis comparing CNV and SNP association signals across the same traits and samples, defining specific CNV association classes based on whether they could be detected using standard SNP-GWAS in the UK Biobank.

Download Full-text

EnigmaVis: Online Interactive Visualization of Genome-Wide Association Studies of the Enhancing NeuroImaging Genetics through Meta-Analysis (ENIGMA) Consortium

Twin Research and Human Genetics ◽

10.1017/thg.2012.17 ◽

2012 ◽

Vol 15 (3) ◽

pp. 414-418 ◽

Cited By ~ 24

Author(s):

Nic M. Novak ◽

Jason L. Stein ◽

Sarah E. Medland ◽

Derrek P. Hibar ◽

Paul M. Thompson ◽

...

Keyword(s):

Large Scale ◽

Association Studies ◽

Meta Analysis ◽

Genome Wide Association ◽

Intracranial Volume ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Genome Wide ◽

Neuroimaging Data ◽

Neuroimaging Genetics

In an attempt to increase power to detect genetic associations with brain phenotypes derived from human neuroimaging data, we recently conducted a large-scale, genome-wide association meta-analysis of hippocampal, brain, and intracranial volume through the Enhancing NeuroImaging Genetics through Meta-Analysis (ENIGMA) consortium. Here, we present a freely available online interactive tool, EnigmaVis, which makes it easy to visualize the association results generated by the consortium alongside allele frequency, genes, and functional annotations. EnigmaVis runs natively within the web browser, and generates plots that show the level of association between brain phenotypes at user-specified genomic positions. Uniquely, EnigmaVis is dynamic; users can interact with elements on the plot in real time. This software will be useful when exploring the effect on brain structure of particular genetic variants influencing neuropsychiatric illness and cognitive function. Future projects of the consortium and updates to EnigmaVis will also be displayed on the site. EnigmaVis is freely available online at http://enigma.loni.ucla.edu/enigma-vis/

Download Full-text

Genetic and gene expression signatures in multiple sclerosis

Multiple Sclerosis Journal ◽

10.1177/1352458519898332 ◽

2020 ◽

Vol 26 (5) ◽

pp. 576-581 ◽

Cited By ~ 2

Author(s):

Nikolaos A Patsopoulos ◽

Philip L De Jager

Keyword(s):

Multiple Sclerosis ◽

Large Scale ◽

New Technologies ◽

Association Studies ◽

Disease Onset ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Genome Wide ◽

Transcriptional Changes

Multiple sclerosis (MS) exhibits a well-documented increased incidence in individuals with respective family history, that is, is a heritable disease. In the last decade, genome-wide association studies have enabled the agnostic interrogation of the whole genome at a large scale. To date, over 200 genetic associations have been described at the strict level of genome-wide significance. Our current understanding of MS genetics can explain up to half of the disease’s heritability, raising the important question of whether this is enough information to leverage toward improving diagnosis in MS. Parallel advancements in technologies that allow the characterization of the full transcriptome down to the single-cell level have enabled the generation of an unprecedented wealth of information. Transcriptional changes of putative causal cells could be utilized to identify early signs of disease onset. These recent findings in genetics and genomics, coupled with new technologies and deeply phenotyped cohorts, have the potential to improve the diagnosis of MS.

Download Full-text

Efficient mixed model approach for large-scale genome-wide association studies of ordinal categorical phenotypes

The American Journal of Human Genetics ◽

10.1016/j.ajhg.2021.03.019 ◽

2021 ◽

Author(s):

Wenjian Bi ◽

Wei Zhou ◽

Rounak Dey ◽

Bhramar Mukherjee ◽

Joshua N. Sampson ◽

...

Keyword(s):

Large Scale ◽

Mixed Model ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Mixed Model Approach ◽

Genome Wide ◽

Model Approach

Download Full-text

Testing for genetic associations in arbitrarily structured populations

10.1101/012682 ◽

2014 ◽

Author(s):

Minsun Song ◽

Wei Hao ◽

John D. Storey

Keyword(s):

Large Scale ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Principal Component ◽

Statistical Test ◽

Structured Populations ◽

Birth Cohort Study ◽

Genome Wide Association Studies ◽

Genetic Associations

We present a new statistical test of association between a trait and genetic markers, which we theoretically and practically prove to be robust to arbitrarily complex population structure. The statistical test involves a set of parameters that can be directly estimated from large-scale genotyping data, such as that measured in genome-wide association studies (GWAS). We also derive a new set of methodologies, called a genotype-conditional association test (GCAT), shown to provide accurate association tests in populations with complex structures, manifested in both the genetic and environmental contributions to the trait. We demonstrate the proposed method on a large simulation study and on the Northern Finland Birth Cohort study. In the Finland study, we identify several new significant loci that other methods do not detect. Our proposed framework provides a substantially different approach to the problem from existing methods, such as the linear mixed model and principal component approaches.

Download Full-text

Medical data and machine learning improve power of stroke genome-wide association studies

10.1101/2020.01.22.915397 ◽

2020 ◽

Author(s):

Phyllis M. Thangaraj ◽

Undina Gisladottir ◽

Nicholas P. Tatonetti

Keyword(s):

Machine Learning ◽

Large Scale ◽

Association Studies ◽

Meta Analysis ◽

Clinical Care ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Care Systems ◽

The Uk

AbstractGenome-wide association studies (GWAS) may require enrollment of up to millions of participants to power variant discovery. This requires manual curation of cases and controls with large-scale collaborations. Biobanks connected to electronic health records (EHR) can facilitate these studies by using data from clinical care systems, like billing diagnosis codes, as phenotypes. These systems, however, do not define adjudicated cases and controls. We developed QTPhenProxy, a machine learning model that adds nuance to cohort classification by assigning everyone in a cohort a probability of having the study disease. We then ran a GWAS using the probabilities as a quantitative trait. With an order of magnitude fewer cases than the largest stroke GWAS, our method outperformed previous methods at replicating known variants in stroke and discovered a novel variant in ABCG8 associated with intracerebral hemorrhage in the UK Biobank that replicated in the MEGASTROKE GWA meta-analysis. QTPhenProxy expands traditional phenotyping to improve the power of GWAS.

Download Full-text

Optimal Genomic Control in Large-scale Genetic Associations for Binary Diseases

10.21203/rs.3.rs-318017/v1 ◽

2021 ◽

Author(s):

Runqing Yang ◽

Yuxin Song ◽

Li Jiang ◽

Zhiyu Hao ◽

Runqing Yang

Keyword(s):

Multiple Testing ◽

Statistical Power ◽

Large Scale ◽

Association Studies ◽

Joint Analysis ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Genomic Heritability ◽

Large Scale Data ◽

Genome Wide

Download Full-text

Genome-Wide Association Study of Over 427,000 Individuals Establishes Executive Functioning as a Neurocognitive Basis of Psychiatric Disorders Influenced by GABAergic Processes

10.1101/674515 ◽

2019 ◽

Cited By ~ 2

Author(s):

Alexander S. Hatoum ◽

Claire L. Morrison ◽

Evann C. Mitchell ◽

Max Lam ◽

Chelsie E. Benca-Bachman ◽

...

Keyword(s):

Psychiatric Disorders ◽

Cognitive Processing ◽

Genome Wide Association Study ◽

Association Studies ◽

Small Sample ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Genome Wide ◽

The Uk

AbstractDeficits in executive functions (EFs), cognitive processes that control goal-directed behaviors, are associated with psychopathology and neurological disorders. Little is known about the molecular bases of EF individual differences; existing EF genome-wide association studies (GWAS) used small sample sizes and/or focused on individual tasks that are imprecise measures of EF. We conducted a GWAS of a Common EF (cEF) factor based on multiple tasks in the UK Biobank (N=427,037 European-descent individuals), finding 129 independent genome-wide significant lead variants in 112 distinct loci. cEF was associated with fast synaptic transmission processes (synaptic, potassium channel, and GABA pathways) in gene-based analyses. cEF was genetically correlated with measures of intelligence (IQ) and cognitive processing speed, but cEF and IQ showed differential genetic associations with psychiatric disorders and educational attainment. Results suggest that cEF is a genetically distinct cognitive construct that is particularly relevant to understanding the genetic variance in psychiatric disorders.

Download Full-text