Finding the Sources of Missing Heritability within Rare Variants Through Simulation

Thousands of genome-wide association studies (GWAS) have been conducted to identify the genetic variants associated with complex disorders. However, only a small proportion of phenotypic variances can be explained by the reported variants. Moreover, many GWAS failed to identify genetic variants associated with disorders displaying hereditary features. The “missing heritability” problem can be partly explained by rare variants. We simulated a causality scenario that gestational ages, a quantitative trait that can distinguish preterm (<37 weeks) and term births, were significantly correlated with the rare variant aggregations at 1000 single-nucleotide polymorphism loci. These 1000 simulated causal rare variants were embedded into randomly selected subsets of 9642 promoter regions from the 1000 Genomes Project genotypic data according to different proportions of causal rare variants within the embedded promoters. Through analysis of the correlations between rare variant aggregations and gestational ages, we found that the embedded promoters as a whole showed weaker genetic association when the proportion of causal rare variants decreased, and no individual embedded promoters showed genetic association when the proportion of causal rare variants was smaller than 0.4. Our analyses indicate that association signals can be greatly diluted when causal rare variants are dispersedly and sparsely distributed in the genome, accounting for an important source of missing heritability.

Download Full-text

The contribution of rare whole genome sequencing variants to plasma protein levels and to the missing heritability

10.21203/rs.3.rs-625433/v1 ◽

2021 ◽

Author(s):

Marcin Kierczak ◽

Nima Rafati ◽

Julia Höglund ◽

Hadrien Gourle ◽

Daniel Schmitz ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genetic Variants ◽

Complex Traits ◽

Rare Variants ◽

Association Studies ◽

Genome Wide Association Studies ◽

Whole Genome ◽

Missing Heritability ◽

Common Genetic Variants

Abstract Despite the success in identifying effects of common genetic variants, using genome-wide association studies (GWAS), much of the genetic contribution to complex traits remains unexplained. Here, we analysed high coverage whole-genome sequencing (WGS) data, to evaluate the contribution of rare genetic variants to 414 plasma proteins. The frequency distribution of genetic variants was skewed towards the rare spectrum, and damaging variants were more often rare. However, only 2.24% of the heritability was estimated to be explained by rare variants. A gene-based approach, developed to also capture the effect of rare variants, identified associations for 249 of the proteins, which was 25% more as compared to a GWAS. Out of those, 24 associations were driven by rare variants, clearly highlighting the capacity of aggregated tests and WGS data. We conclude that, while many rare variants have considerable phenotypic effects, their contribution to the missing heritability is limited by their low frequencies.

Download Full-text

Exome-Wide Pan-Cancer Analysis of Germline Variants in 8,719 Individuals Finds Little Evidence of Rare Variant Associations

Human Heredity ◽

10.1159/000519355 ◽

2021 ◽

pp. 1-10

Author(s):

Zoe Guan ◽

Ronglai Shen ◽

Colin B. Begg

Keyword(s):

Rare Variant ◽

Rare Variants ◽

Association Studies ◽

The Cancer Genome Atlas ◽

Considerable Proportion ◽

Genome Wide Association Studies ◽

Sequencing Data ◽

Risk Variants ◽

Cancer Types ◽

Pan Cancer

Background: Many cancer types show considerable heritability, and extensive research has been done to identify germline susceptibility variants. Linkage studies have discovered many rare high-risk variants, and genome-wide association studies (GWAS) have discovered many common low-risk variants. However, it is believed that a considerable proportion of the heritability of cancer remains unexplained by known susceptibility variants. The “rare variant hypothesis” proposes that much of the missing heritability lies in rare variants that cannot reliably be detected by linkage analysis or GWAS. Until recently, high sequencing costs have precluded extensive surveys of rare variants, but technological advances have now made it possible to analyze rare variants on a much greater scale. Objectives: In this study, we investigated associations between rare variants and 14 cancer types. Methods: We ran association tests using whole-exome sequencing data from The Cancer Genome Atlas (TCGA) and validated the findings using data from the Pan-Cancer Analysis of Whole Genomes Consortium (PCAWG). Results: We identified four significant associations in TCGA, only one of which was replicated in PCAWG (BRCA1 and ovarian cancer). Conclusions: Our results provide little evidence in favor of the rare variant hypothesis. Much larger sample sizes may be needed to detect undiscovered rare cancer variants.

Download Full-text

Molecular Variants in Human Trace Amine-Associated Receptors and Their Implications in Mental and Metabolic Disorders

Cellular and Molecular Neurobiology ◽

10.1007/s10571-019-00743-y ◽

2019 ◽

Vol 40 (2) ◽

pp. 239-255 ◽

Cited By ~ 3

Author(s):

Grazia Rutigliano ◽

Riccardo Zucchi

Keyword(s):

Genetic Variants ◽

Affective Disorders ◽

Metabolic Disorders ◽

Association Studies ◽

Single Gene ◽

Genome Wide Association Studies ◽

Complex Disorders ◽

Linkage Analyses ◽

Genome Wide ◽

Trace Amine

Abstract We provide a comprehensive review of the available evidence on the pathophysiological implications of genetic variants in the human trace amine-associated receptor (TAAR) superfamily. Genes coding for trace amine-associated receptors (taars) represent a multigene family of G-protein-coupled receptors, clustered to a small genomic region of 108 kb located in chromosome 6q23, which has been consistently identified by linkage analyses as a susceptibility locus for schizophrenia and affective disorders. Most TAARs are expressed in brain areas involved in emotions, reward and cognition. TAARs are activated by endogenous trace amines and thyronamines, and evidence for a modulatory action on other monaminergic systems has been reported. Therefore, linkage analyses were followed by fine mapping association studies in schizophrenia and affective disorders. However, none of these reports has received sufficient universal replication, so their status remains uncertain. Single nucleotide polymorphisms in taars have emerged as susceptibility loci from genome-wide association studies investigating migraine and brain development, but none of the detected variants reached the threshold for genome-wide significance. In the last decade, technological advances enabled single-gene or whole-exome sequencing, thus allowing the detection of rare genetic variants, which may have a greater impact on the risk of complex disorders. Using these approaches, several taars (especially taar1) variants have been detected in patients with mental and metabolic disorders, and in some cases, defective receptor function has been demonstrated in vitro. Finally, with the use of transcriptomic and peptidomic techniques, dysregulations of TAARs (especially TAAR6) have been identified in brain disorders characterized by cognitive impairment.

Download Full-text

A Comparative Study on Multifactor Dimensionality Reduction Methods for Detecting Gene-Gene Interactions with the Survival Phenotype

BioMed Research International ◽

10.1155/2015/671859 ◽

2015 ◽

Vol 2015 ◽

pp. 1-7 ◽

Cited By ~ 5

Author(s):

Seungyeoun Lee ◽

Yongkang Kim ◽

Min-Seok Kwon ◽

Taesung Park

Keyword(s):

Dimensionality Reduction ◽

Genetic Variants ◽

Multifactor Dimensionality Reduction ◽

Association Studies ◽

Gene Interactions ◽

Genome Wide Association Studies ◽

Missing Heritability ◽

Analytical Strategy ◽

Reduction Methods ◽

Missing Heritability Problem

Genome-wide association studies (GWAS) have extensively analyzed single SNP effects on a wide variety of common and complex diseases and found many genetic variants associated with diseases. However, there is still a large portion of the genetic variants left unexplained. This missing heritability problem might be due to the analytical strategy that limits analyses to only single SNPs. One of possible approaches to the missing heritability problem is to consider identifying multi-SNP effects or gene-gene interactions. The multifactor dimensionality reduction method has been widely used to detect gene-gene interactions based on the constructive induction by classifying high-dimensional genotype combinations into one-dimensional variable with two attributes of high risk and low risk for the case-control study. Many modifications of MDR have been proposed and also extended to the survival phenotype. In this study, we propose several extensions of MDR for the survival phenotype and compare the proposed extensions with earlier MDR through comprehensive simulation studies.

Download Full-text

PL01-01 - Genetic Reseach: Promises and Pitfalls for Clinical Treatment of Depression

European Psychiatry ◽

10.1016/s0924-9338(11)73710-5 ◽

2011 ◽

Vol 26 (S2) ◽

pp. 2007-2007

Author(s):

J. Mendlewicz

Keyword(s):

Mood Disorders ◽

Genetic Variants ◽

Rare Variants ◽

Association Studies ◽

Copy Number Variations ◽

Depressive Illness ◽

Public Health Issue ◽

Genome Wide Association Studies ◽

Structural Variations ◽

Epigenetic Modulation

The lifetime prevalence of mood disorders is estimated around 20% in the general population leading to a main cause of disability worldwide and a major public health issue.1 The ethiology of mood disorders is still unknown, but its various phenotypes are believed to be caused by multiple genetic variants interacting in a complex way with environmental vulnerability factors. Therefore, the identification of biomarkers and environmental markers is crucial to improve our understanding and diagnosis as well as our treatments. Despite intensive and costly research for more than two decades to unravel suceptibility genes, although pathophysiological pathways of interest have been recongnized, results have not been consistant so far and not a single genetic biomarker of depression has been identified and replicated. More recent systematic genome-wide association studies (GWAS) have reported weak associations of some genetic variants in large samples, but multiple rare variants may concur to confer only part of the suceptibility to depression. Structural variations may also be considered to be promising as is the case for copy-number-variations (CNVs). Methodological issues and limitations will also be critically discussed in light of the complexity of gene-evironment interactions (epigenetic modulation of gene expression)2 and in relation to future prospects for individualized pharmacotherapy of depressive illness.

Download Full-text

Exome sequencing in families with severe mental illness identifies novel and rare variants in genes implicated in Mendelian neuropsychiatric syndromes

10.1101/310821 ◽

2018 ◽

Cited By ~ 1

Author(s):

Suhas Ganesh ◽

Ahmed P Husayn ◽

Ravi Kumar Nadella ◽

Ravi Prabhakar More ◽

Manasa Sheshadri ◽

...

Keyword(s):

Bipolar Disorder ◽

Exome Sequencing ◽

Rare Variants ◽

Association Studies ◽

Mental Illnesses ◽

Disease Genes ◽

Mendelian Disease ◽

Genome Wide Association Studies ◽

Complex Disorders ◽

Family Based

AbstractIntroductionSevere Mental Illnesses (SMI), such as bipolar disorder and schizophrenia, are highly heritable, and have a complex pattern of inheritance. Genome wide association studies detect a part of the heritability, which can be attributed to common genetic variation. Examination of rare variants with Next Generation Sequencing (NGS) may add to the understanding of genetic architecture of SMIs.MethodsWe analyzed 32 ill subjects (with diagnosis of Bipolar Disorder, n=26; schizophrenia, n=4; schizoaffective disorder, n=1 schizophrenia like psychosis, n=1) from 8 multiplex families; and 33 healthy individuals by whole exome sequencing. Prioritized variants were selected by a 4-step filtering process, which included deleteriousness by 5 in silico algorithms; sharing within families, absence in the controls and rarity in South Asian sample of Exome Aggregation Consortium.ResultsWe identified a total of 42 unique rare, non-synonymous deleterious variants in this study with an average of 5 variants per family. None of the variants were shared across families, indicating a ‘private’ mutational profile. Twenty (47.6%) of the variant harboring genes identified in this sample have been previously reported to contribute to the risk of neuropsychiatric syndromes. These include genes which are related to neurodevelopmental processes, or have been implicated in different monogenic syndromes with a severe neurodevelopmental phenotype.ConclusionNGS approaches in family based studies are useful to identify novel and rare variants in genes for complex disorders like SMI. The study further validates the phenotypic burden of rare variants in Mendelian disease genes, indicating pleiotropic effects in the etiology of severe mental illnesses.

Download Full-text

Bayesian model comparison for rare variant association studies of multiple phenotypes

10.1101/257162 ◽

2018 ◽

Cited By ~ 3

Author(s):

Christopher DeBoever ◽

Matthew Aguirre ◽

Yosuke Tanigawa ◽

Chris C. A. Spencer ◽

Timothy Poterba ◽

...

Keyword(s):

Genetic Variation ◽

Rare Variant ◽

Genetic Variants ◽

Model Comparison ◽

Rare Variants ◽

Association Studies ◽

Meta Analysis ◽

Rare Variant Association ◽

Physical Measurements ◽

Comparison Approach

AbstractWhole genome sequencing studies applied to large populations or biobanks with extensive phenotyping raise new analytic challenges. The need to consider many variants at a locus or group of genes simultaneously and the potential to study many correlated phenotypes with shared genetic architecture provide opportunities for discovery and inference that are not addressed by the traditional one variant-one phenotype association study. Here we introduce a model comparison approach we refer to as MRP for rare variant association studies that considers correlation, scale, and location of genetic effects across a group of genetic variants, phenotypes, and studies. We consider the use of summary statistic data to apply univariate and multivariate gene-based meta-analysis models for identifying rare variant associations with an emphasis on protective protein-truncating variants that can expedite drug discovery. Through simulation studies, we demonstrate that the proposed model comparison approach can improve ability to detect rare variant association signals. We also apply the model to two groups of phenotypes from the UK Biobank: 1) asthma diagnosis, eosinophil counts, forced expiratory volume, and forced vital capacity; and 2) glaucoma diagnosis, intra-ocular pressure, and corneal resistance factor. We are able to recover known associations such as the protective association between rs146597587 in IL33 and asthma. We also find evidence for novel protective associations between rare variants in ANGPTL7 and glaucoma. Overall, we show that the MRP model comparison approach is able to retain and improve upon useful features from widely-used meta-analysis approaches for rare variant association analyses and prioritize protective modifiers of disease risk.Author summaryDue to the continually decreasing cost of acquiring genetic data, we are now beginning to see large collections of individuals for which we have both genetic information and trait data such as disease status, physical measurements, biomarker levels, and more. These datasets offer new opportunities to find relationships between inherited genetic variation and disease. While it is known that there are relationships between different traits, typical genetic analyses only focus on analyzing one genetic variant and one phenotype at a time. Additionally, it is difficult to identify rare genetic variants that are associated with disease due to their scarcity, even among large sample sizes. In this work, we present a method for identifying associations between genetic variation and disease that considers multiple rare variants and phenotypes at the same time. By sharing information across rare variant and phenotypes, we improve our ability to identify rare variants associated with disease compared to considering a single rare variant and a single phenotype. The method can be used to identify candidate disease genes as well as genes that might represent attractive drug targets.

Download Full-text

Disruption of the SUMO Pathway in a High-Risk B-Cell Non-Hodgkin Lymphoma Pedigree

Blood ◽

10.1182/blood.v126.23.2682.2682 ◽

2015 ◽

Vol 126 (23) ◽

pp. 2682-2682

Author(s):

Cassandra Garner ◽

Martha Glenn ◽

Rosalie G Waller ◽

Venkatesh Rajamanickam ◽

Todd Darlington ◽

...

Keyword(s):

High Risk ◽

B Cell ◽

Hodgkin Lymphoma ◽

Genetic Variants ◽

Molecular Mechanisms ◽

Rare Variants ◽

Association Studies ◽

Genome Wide Association Studies ◽

Non Hodgkin Lymphoma ◽

Sumo Pathway

Abstract Taken together, B-cell lymphoproliferative disorders (lymphoma, multiple myeloma and leukemia) are the fourth most common form of cancer. Chronic Lymphocytic Leukemia (CLL), a specific sub-type of B-cell non-Hodgkin lymphoma (NHL), is the most common adult leukemia in Western countries. Despite its prevalence, the underlying genetic mechanisms responsible for CLL remain largely unknown. The strong familial clustering seen in CLL suggests that genetic variants contributing to its pathogenesis may be inherited. Our lab is dedicated to identifying inherited genetic risk variants associated with hematological malignancies with a specific focus on CLL. We hypothesize that there are multiple germline genetic variants (both common and rare) involved in risk of CLL. In order to identify rare variants, our lab uses the Utah Population Database (UPDB) to identify extended high-risk pedigrees with statistical excess of familial CLL. We then use next generation sequencing to identify genetic variants segregating in these high-risk pedigrees. This powerful study design for identifying rare variants and has previously been proven successful for identifying variants associated with other cancers. We have performed whole-exome sequencing in one particularly high-risk pedigree containing 3 CLL cases and a mantle cell lymphoma case within 2 generations. We have identified variants in multiple genes associated with the SUMO pathway. We find rare variants in RANBP2, SP100, PML, IL1RL2 (paralog of IL1R) and CREBRF (transcriptional regulation through RNA pol II) that are shared by all NHL cases in this pedigree. Other genes in this pathway have been identified by previous CLL genome wide association studies, specifically FAS, TNF and SUMO1. Furthermore, a strong role has been previously suggested for the SUMO pathway in cancer cell survival and tumor progression, lending support to the potential role of this pathway in NHL risk. We are currently validating our sequence findings and determining the prevalence of these variants in other NHL cases included in the UPDB. Our high-risk pedigree findings are supportive that disruption of the SUMO pathway contributes to pathogenesis of NHL. Identification of the specific variants involved will increase our understanding of the molecular mechanisms contributing to NHL and has the potential to provide genetic markers that can be used for diagnosis and new avenues for treatment of these diseases. Disclosures No relevant conflicts of interest to declare.

Download Full-text

Advantages of genotype imputation with ethnically matched reference panel for rare variant association analyses

10.1101/579201 ◽

2019 ◽

Cited By ~ 4

Author(s):

Mart Kals ◽

Tiit Nikopensius ◽

Kristi Läll ◽

Kalle Pärn ◽

Timo Tõnis Sikka ◽

...

Keyword(s):

Rare Variant ◽

Rare Variants ◽

Association Studies ◽

Low Frequency ◽

Genotype Imputation ◽

Reference Panel ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Variant Analysis ◽

Coding Variants

AbstractGenotype imputation has become a standard procedure prior genome-wide association studies (GWASs). For common and low-frequency variants, genotype imputation can be performed sufficiently accurately with publicly available and ethnically heterogeneous reference datasets like 1000 Genomes Project (1000G) and Haplotype Reference Consortium panels. However, the imputation of rare variants has been shown to be significantly more accurate when ethnically matched reference panel is used. Even more, greater genetic similarity between reference panel and target samples facilitates the detection of rare (or even population-specific) causal variants. Notwithstanding, the genome-wide downstream consequences and differences of using ethnically mixed and matched reference panels have not been yet comprehensively explored.We determined and quantified these differences by performing several comparative evaluations of the discovery-driven analysis scenarios. A variant-wise GWAS was performed on seven complex diseases and body mass index by using genome-wide genotype data of ∼37,000 Estonians imputed with ethnically mixed 1000G and ethnically matched imputation reference panels. Although several previously reported common (minor allele frequency; MAF > 5%) variant associations were replicated in both resulting imputed datasets, no major differences were observed among the genome-wide significant findings or in the fine-mapping effort. In the analysis of rare (MAF < 1%) coding variants, 46 significantly associated genes were identified in the ethnically matched imputed data as compared to four genes in the 1000G panel based imputed data. All resulting genes were consequently studied in the UK Biobank data.These associations provide a solid example of how rare variants can be efficiently analysed to discover novel, potentially functional genetic variants in relevant phenotypes. Furthermore, our work serves as proof of a cost-efficient study design, demonstrating that the usage of ethnically matched imputation reference panels can enable substantially improved imputation of rare variants, facilitating novel high-confidence findings in rare variant GWAS scans.Author summaryOver the last decade, genome-wide association studies (GWASs) have been widely used for detecting genetic biomarkers in a wide range of traits. Typically, GWASs are carried out using chip-based genotyping data, which are then combined with a more densely genotyped reference panel to infer untyped genetic variants in chip-typed individuals. The latter method is called genotype imputation and its accuracy depends on multiple factors. Publicly available and ethnically heterogeneous imputation reference panels (IRPs) such as 1000 Genomes Project (1000G) are sufficiently accurate for imputation of common and low-frequency variants, but custom ethnically matched IRPs outperform these in case of rare variants. In this work, we systematically compare downstream association analysis effects on eight complex traits in ∼37,000 Estonians imputed with ethnically mixed and ethnically matched IRPs. We do not observe major differences in the single variant analysis, where both imputed datasets replicate previously reported significant loci. But in the gene-based analysis of rare protein-coding variants we show that ethnically matched panel clearly outperforms 1000G panel based imputation, providing 10-fold increase in significant gene-trait associations. Our study demonstrates empirically that imputed data based on ethnically matched panel is very promising for rare variant analysis – it captures more population-specific variants and makes it possible to efficiently identify novel findings.

Download Full-text

Germline burden of rare damaging variants negatively affects human healthspan and lifespan

eLife ◽

10.7554/elife.53449 ◽

2020 ◽

Vol 9 ◽

Cited By ~ 3

Author(s):

Anastasia V Shindyapina ◽

Aleksandr A Zenin ◽

Andrei E Tarkhov ◽

Didac Santesmasses ◽

Peter O Fedichev ◽

...

Keyword(s):

Genetic Variants ◽

Aging Process ◽

Rare Variants ◽

Association Studies ◽

Twin Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Mortality And Morbidity ◽

Human Lifespan ◽

Genome Wide

Heritability of human lifespan is 23–33% as evident from twin studies. Genome-wide association studies explored this question by linking particular alleles to lifespan traits. However, genetic variants identified so far can explain only a small fraction of lifespan heritability in humans. Here, we report that the burden of rarest protein-truncating variants (PTVs) in two large cohorts is negatively associated with human healthspan and lifespan, accounting for 0.4 and 1.3 years of their variability, respectively. In addition, longer-living individuals possess both fewer rarest PTVs and less damaging PTVs. We further estimated that somatic accumulation of PTVs accounts for only a small fraction of mortality and morbidity acceleration and hence is unlikely to be causal in aging. We conclude that rare damaging mutations, both inherited and accumulated throughout life, contribute to the aging process, and that burden of ultra-rare variants in combination with common alleles better explain apparent heritability of human lifespan.

Download Full-text