The population frequency of human mitochondrial DNA variants is highly dependent upon mutational bias

Human Mitochondrial Dna ◽

Genomic Changes ◽

Common Genetic Variants ◽

Variant Frequency

Genome-wide association studies (GWASs) typically seek common genetic variants that can influence disease likelihood. However, these analyses often fail to convincingly link specific genes and their variants with highly penetrant phenotypic effects. To solve the 'missing heritability problem' that characterizes GWASs, researchers have turned to rare variants revealed by next-generation sequencing when seeking genomic changes that may be pathogenic, as a reduction in variant frequency is an expected outcome of selection. While triage of rare variants has led to some success in illuminating genes linked to heritable disease, the interpretation and utilization of rare genomic changes remains very challenging. Human mitochondrial DNA (mtDNA) encodes proteins and RNAs required for the essential process of oxidative phosphorylation, and a number of metabolic diseases are linked to mitochondrial mutations. Recently, the mtDNAs of nearly 200,000 individuals were sequenced in order to produce the HelixMT database (HelixMTdb), a large catalog of human mtDNA variation. Here, we were surprised to find that many synonymous nucleotide substitutions were never detected within this quite substantial survey of human mtDNA. Subsequent study of more than 1000 mammalian mtDNAs suggested that selection on synonymous sites within mitochondrial protein-coding genes is minimal and unlikely to explain the rarity of most synonymous changes among humans. Rather, the mutational propensities of mtDNA are more likely to determine variant frequency. Our findings have general implications for the interpretation of variant frequencies when studying heritable disease.

The population frequency of human mitochondrial DNA variants is highly dependent upon mutational bias

Biology Open ◽

10.1242/bio.059072 ◽

2021 ◽

Author(s):

Cory D. Dunn

Keyword(s):

Mitochondrial Dna ◽

Rare Variants ◽

Mutational Bias ◽

Low Frequencies ◽

Synonymous Substitutions ◽

Genomic Changes ◽

Population Frequency ◽

Variant Frequency ◽

Mitochondrial Dna Variants

Next-generation sequencing can quickly reveal genetic variation potentially linked to heritable disease. As databases encompassing human variation continue to expand, rare variants have been of high interest, since the frequency of a variant is expected to be low if the genetic change leads to a loss of fitness or fecundity. However, the use of variant frequency when seeking genomic changes linked to disease remains very challenging. Here, we explore the role of selection in controlling human variant frequency using the HelixMT database, which encompasses hundreds of thousands of mitochondrial DNA (mtDNA) samples. We find that a substantial number of synonymous substitutions, which have no effect on protein sequence, were never encountered in this large study, while many other synonymous changes are found at very low frequencies. Further analyses of human and mammalian mtDNA datasets indicate that the population frequency of synonymous variants is predominantly determined by mutational biases rather than by strong selection acting upon nucleotide choice. Our work has important implications that extend to the interpretation of variant frequency for non-synonymous substitutions.

Abstract 367: Extreme High-Density Lipoprotein Cholesterol Genetics: An Assortment of Large and Small Polygenic Effects

Arteriosclerosis Thrombosis and Vascular Biology ◽

10.1161/atvb.37.suppl_1.367 ◽

2017 ◽

Vol 37 (suppl_1) ◽

Author(s):

Jacqueline S Dron ◽

Jian Wang ◽

Cécile Low-Kam ◽

Sumeet A Khetarpal ◽

John F Robinson ◽

...

Keyword(s):

Large Scale ◽

Genetic Basis ◽

Rare Variants ◽

Association Studies ◽

Density Lipoprotein ◽

Copy Number Variations ◽

Common Variants ◽

Targeted Next Generation Sequencing ◽

Rationale: Although HDL-C levels are known to have a complex genetic basis, most studies have focused solely on identifying rare variants with large phenotypic effects to explain extreme HDL-C phenotypes. Objective: Here we concurrently evaluate the contribution of both rare and common genetic variants, as well as large-scale copy number variations (CNVs), towards extreme HDL-C concentrations. Methods: In clinically ascertained patients with low ( N =136) and high ( N =119) HDL-C profiles, we applied our targeted next-generation sequencing panel (LipidSeq TM ) to sequence genes involved in HDL metabolism, which were subsequently screened for rare variants and CNVs. We also developed a novel polygenic trait score (PTS) to assess patients’ genetic accumulations of common variants that have been shown by genome-wide association studies to associate primarily with HDL-C levels. Two additional cohorts of patients with extremely low and high HDL-C (total N =1,746 and N =1,139, respectively) were used for PTS validation. Results: In the discovery cohort, 32.4% of low HDL-C patients carried rare variants or CNVs in primary ( ABCA1 , APOA1 , LCAT ) and secondary ( LPL , LMF1 , GPD1 , APOE ) HDL-C–altering genes. Additionally, 13.4% of high HDL-C patients carried rare variants or CNVs in primary ( SCARB1 , CETP , LIPC , LIPG ) and secondary ( APOC3 , ANGPTL4 ) HDL-C–altering genes. For polygenic effects, patients with abnormal HDL-C profiles but without rare variants or CNVs were ~2-fold more likely to have an extreme PTS compared to normolipidemic individuals, indicating an increased frequency of common HDL-C–associated variants in these patients. Similar results in the two validation cohorts demonstrate that this novel PTS successfully quantifies common variant accumulation, further characterizing the polygenic basis for extreme HDL-C phenotypes. Conclusions: Patients with extreme HDL-C levels have various combinations of rare variants, common variants, or CNVs driving their phenotypes. Fully characterizing the genetic basis of HDL-C levels must extend to encompass multiple types of genetic determinants—not just rare variants—to further our understanding of this complex, controversial quantitative trait.

A Novel Approach for the Simultaneous Analysis of Common and Rare Variants in Complex Traits

Bioinformatics and Biology Insights ◽

10.4137/bbi.s8852 ◽

2012 ◽

Vol 6 ◽

pp. BBI.S8852 ◽

Cited By ~ 4

Author(s):

Ao Yuan ◽

Guanjie Chen ◽

Yanxun Zhou ◽

Amy Bentley ◽

Charles Rotimi

Keyword(s):

Complex Traits ◽

Rare Variants ◽

Sequence Data ◽

Association Studies ◽

Simultaneous Analysis ◽

Common Variants ◽

Disease Etiology ◽

Novel Approach ◽

Genome-wide association studies (GWAS) have been successful in detecting common genetic variants underlying common traits and diseases. Despite the GWAS success stories, the percent trait variance explained by GWAS signals, the so called “missing heritability” has been, at best, modest. Also, the predictive power of common variants identified by GWAS has not been encouraging. Given these observations along with the fact that the effects of rare variants are often, by design, unaccounted for by GWAS and the availability of sequence data, there is a growing need for robust analytic approaches to evaluate the contribution of rare variants to common complex diseases. Here we propose a new method that enables the simultaneous analysis of the association between rare and common variants in disease etiology. We refer to this method as SCARVA (simultaneous common and rare variants analysis). SCARVA is simple to use and is efficient. We used SCARVA to analyze two independent real datasets to identify rare and common variants underlying variation in obesity among participants in the Africa America Diabetes Mellitus (AADM) study and plasma triglyceride levels in the Dallas Heart Study (DHS). We found common and rare variants associated with both traits, consistent with published results.

Targeted sequencing of Parkinson’s disease loci genes highlights SYT11, FGF20 and other associations

Brain ◽

10.1093/brain/awaa401 ◽

2020 ◽

Author(s):

Uladzislau Rudakou ◽

Eric Yu ◽

Lynne Krohn ◽

Jennifer A Ruskey ◽

Farnaz Asayesh ◽

...

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Linkage Disequilibrium ◽

Rare Variants ◽

Association Studies ◽

Strong Linkage Disequilibrium ◽

Common Variants ◽

Abstract Genome-wide association studies (GWAS) have identified numerous loci associated with Parkinson’s disease. The specific genes and variants that drive the associations within the vast majority of these loci are unknown. We aimed to perform a comprehensive analysis of selected genes to determine the potential role of rare and common genetic variants within these loci. We fully sequenced 32 genes from 25 loci previously associated with Parkinson’s disease in 2657 patients and 3647 controls from three cohorts. Capture was done using molecular inversion probes targeting the exons, exon-intron boundaries and untranslated regions (UTRs) of the genes of interest, followed by sequencing. Quality control was performed to include only high-quality variants. We examined the role of rare variants (minor allele frequency < 0.01) using optimized sequence Kernel association tests. The association of common variants was estimated using regression models adjusted for age, sex and ethnicity as required in each cohort, followed by a meta-analysis. After Bonferroni correction, we identified a burden of rare variants in SYT11, FGF20 and GCH1 associated with Parkinson’s disease. Nominal associations were identified in 21 additional genes. Previous reports suggested that the SYT11 GWAS association is driven by variants in the nearby GBA gene. However, the association of SYT11 was mainly driven by a rare 3′ UTR variant (rs945006601) and was independent of GBA variants (P = 5.23 × 10−5 after exclusion of all GBA variant carriers). The association of FGF20 was driven by a rare 5′ UTR variant (rs1034608171) located in the promoter region. The previously reported association of GCH1 with Parkinson’s disease is driven by rare non-synonymous variants, some of which are known to cause dopamine-responsive dystonia. We also identified two LRRK2 variants, p.Arg793Met and p.Gln1353Lys, in 10 and eight controls, respectively, but not in patients. We identified common variants associated with Parkinson’s disease in MAPT, TMEM175, BST1, SNCA and GPNMB, which are all in strong linkage disequilibrium with known GWAS hits in their respective loci. A common coding PM20D1 variant, p.Ile149Val, was nominally associated with reduced risk of Parkinson’s disease (odds ratio 0.73, 95% confidence interval 0.60–0.89, P = 1.161 × 10−3). This variant is not in linkage disequilibrium with the top GWAS hits within this locus and may represent a novel association. These results further demonstrate the importance of fine mapping of GWAS loci, and suggest that SYT11, FGF20, and potentially PM20D1, BST1 and GPNMB should be considered for future studies as possible Parkinson’s disease-related genes.

The contribution of rare whole genome sequencing variants to plasma protein levels and to the missing heritability

10.21203/rs.3.rs-625433/v1 ◽

2021 ◽

Author(s):

Marcin Kierczak ◽

Nima Rafati ◽

Julia Höglund ◽

Hadrien Gourle ◽

Daniel Schmitz ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Genetic Variants ◽

Complex Traits ◽

Rare Variants ◽

Association Studies ◽

Whole Genome ◽

Missing Heritability ◽

Abstract Despite the success in identifying effects of common genetic variants, using genome-wide association studies (GWAS), much of the genetic contribution to complex traits remains unexplained. Here, we analysed high coverage whole-genome sequencing (WGS) data, to evaluate the contribution of rare genetic variants to 414 plasma proteins. The frequency distribution of genetic variants was skewed towards the rare spectrum, and damaging variants were more often rare. However, only 2.24% of the heritability was estimated to be explained by rare variants. A gene-based approach, developed to also capture the effect of rare variants, identified associations for 249 of the proteins, which was 25% more as compared to a GWAS. Out of those, 24 associations were driven by rare variants, clearly highlighting the capacity of aggregated tests and WGS data. We conclude that, while many rare variants have considerable phenotypic effects, their contribution to the missing heritability is limited by their low frequencies.

Targeted sequencing of Parkinson's disease loci genes highlights SYT11, FGF20 and other associations

10.1101/2020.05.29.20116111 ◽

2020 ◽

Author(s):

Uladzislau Rudakou ◽

Eric Yu ◽

Lynne M Krohn ◽

Jennifer A Ruskey ◽

Farnaz Asayesh ◽

...

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Rare Variants ◽

Association Studies ◽

Meta Analysis ◽

Strong Linkage Disequilibrium ◽

Common Variants ◽

Genome-wide association studies (GWAS) have identified numerous loci associated with Parkinson's disease. The specific genes and variants that drive the associations within the vast majority of these loci are unknown. We aimed to perform a comprehensive analysis of selected genes to determine the potential role of rare and common genetic variants within these loci. We fully sequenced 32 genes from 25 loci previously associated with Parkinson's disease in 2,657 patients and 3,647 controls from three cohorts. Capture was done using molecular inversion probes targeting the exons, exon-intron boundaries and untranslated regions (UTRs) of the genes of interest, followed by sequencing. Quality control was performed to include only high-quality variants. We examined the role of rare variants (minor allele frequency < 0.01) using optimized sequence Kernel association tests (SKAT-O). The association of common variants was estimated using regression models adjusted for age, sex and ethnicity as required in each cohort, followed by a meta-analysis. After Bonferroni correction, we identified a burden of rare variants in SYT11, FGF20 and GCH1 associated with Parkinson's disease. Nominal associations were identified in 21 additional genes. Previous reports suggested that the SYT11 GWAS association is driven by variants in the nearby GBA gene. However, the association of SYT11 was mainly driven by a rare 3' UTR variant (rs945006601) and was independent of GBA variants (p=5.23E-05 after exclusion of all GBA variant carriers). The association of FGF20 was driven by a rare 5' UTR variant (rs1034608171) located in the promoter region. The previously reported association of GCH1 with Parkinson's Disease is driven by rare nonsynonymous variants, some of which are known to cause dopamine-responsive dystonia. We also identified two LRRK2 variants, p.Arg793Met and p.Gln1353Lys, in ten and eight controls, respectively, but not in patients. We identified common variants associated with Parkinson's disease in MAPT, TMEM175, BST1, SNCA and GPNMB which are all in strong linkage disequilibrium (LD) with known GWAS hits in their respective loci. A common coding PM20D1 variant, p.Ile149Val, was nominally associated with reduced risk of Parkinson's disease (OR 0.73, 95% CI 0.60-0.89, p=1.161E-03). This variant is not in LD with the top GWAS hits within this locus and may represent a novel association. These results further demonstrate the importance of fine mapping of GWAS loci, and suggest that SYT11, FGF20, and potentially PM20D1, BST1 and GPNMB should be considered for future studies as possible Parkinson's disease-related genes.

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program

Nature ◽

10.1038/s41586-021-03205-y ◽

2021 ◽

Vol 590 (7845) ◽

pp. 290-299 ◽

Cited By ~ 22

Author(s):

Daniel Taliun ◽

◽

Daniel N. Harris ◽

Michael D. Kessler ◽

Jedidiah Carlson ◽

...

Keyword(s):

Rare Variants ◽

Sequence Data ◽

Association Studies ◽

Genotype Imputation ◽

Phenotypic Data ◽

Treatment And Prevention ◽

Genome Wide ◽

Diverse Backgrounds ◽

Unmapped Reads

AbstractThe Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.

Common genetic variants with fetal effects on birth weight are enriched for proximity to genes implicated in rare developmental disorders

Human Molecular Genetics ◽

10.1093/hmg/ddab060 ◽

2021 ◽

Author(s):

Robin N Beaumont ◽

Isabelle K Mayne ◽

Rachel M Freathy ◽

Caroline F Wright

Keyword(s):

Birth Weight ◽

Statistical Power ◽

Developmental Disorders ◽

Association Studies ◽

Later Life ◽

Nucleotide Polymorphisms ◽

Genome Wide ◽

Common Genetic Variants ◽

Causal Genes

Abstract Birth weight is an important factor in newborn survival; both low and high birth weights are associated with adverse later-life health outcomes. Genome-wide association studies (GWAS) have identified 190 loci associated with maternal or fetal effects on birth weight. Knowledge of the underlying causal genes is crucial to understand how these loci influence birth weight and the links between infant and adult morbidity. Numerous monogenic developmental syndromes are associated with birth weights at the extreme ends of the distribution. Genes implicated in those syndromes may provide valuable information to prioritize candidate genes at the GWAS loci. We examined the proximity of genes implicated in developmental disorders (DDs) to birth weight GWAS loci using simulations to test whether they fall disproportionately close to the GWAS loci. We found birth weight GWAS single nucleotide polymorphisms (SNPs) fall closer to such genes than expected both when the DD gene is the nearest gene to the birth weight SNP and also when examining all genes within 258 kb of the SNP. This enrichment was driven by genes causing monogenic DDs with dominant modes of inheritance. We found examples of SNPs in the intron of one gene marking plausible effects via different nearby genes, highlighting the closest gene to the SNP not necessarily being the functionally relevant gene. This is the first application of this approach to birth weight, which has helped identify GWAS loci likely to have direct fetal effects on birth weight, which could not previously be classified as fetal or maternal owing to insufficient statistical power.

Family-Based Quantitative Trait Meta-Analysis Implicates Rare Noncoding Variants in DENND1A in Polycystic Ovary Syndrome

The Journal of Clinical Endocrinology & Metabolism ◽

10.1210/jc.2018-02496 ◽

2019 ◽

Vol 104 (9) ◽

pp. 3835-3850 ◽

Cited By ~ 13

Author(s):

Matthew Dapas ◽

Ryan Sisk ◽

Richard S Legro ◽

Margrit Urbanek ◽

Andrea Dunaif ◽

...

Keyword(s):

Polycystic Ovary Syndrome ◽

Quantitative Trait ◽

Rare Variants ◽

Polycystic Ovary ◽

Association Studies ◽

Meta Analysis ◽

Premenopausal Women ◽

Endocrine Disorders ◽

Ovary Syndrome

AbstractContextPolycystic ovary syndrome (PCOS) is among the most common endocrine disorders of premenopausal women, affecting 5% to15% of this population depending on the diagnostic criteria applied. It is characterized by hyperandrogenism, ovulatory dysfunction, and polycystic ovarian morphology. PCOS is highly heritable, but only a small proportion of this heritability can be accounted for by the common genetic susceptibility variants identified to date.ObjectiveThe objective of this study was to test whether rare genetic variants contribute to PCOS pathogenesis.Design, Patients, and MethodsWe performed whole-genome sequencing on DNA from 261 individuals from 62 families with one or more daughters with PCOS. We tested for associations of rare variants with PCOS and its concomitant hormonal traits using a quantitative trait meta-analysis.ResultsWe found rare variants in DENND1A (P = 5.31 × 10−5, adjusted P = 0.039) that were significantly associated with reproductive and metabolic traits in PCOS families.ConclusionsCommon variants in DENND1A have previously been associated with PCOS diagnosis in genome-wide association studies. Subsequent studies indicated that DENND1A is an important regulator of human ovarian androgen biosynthesis. Our findings provide additional evidence that DENND1A plays a central role in PCOS and suggest that rare noncoding variants contribute to disease pathogenesis.

A nonparametric test for association with multiple loci in the retrospective case-control study

Statistical Methods in Medical Research ◽

10.1177/0962280219842892 ◽

2019 ◽

Vol 29 (2) ◽

pp. 589-602

Author(s):

Chan Wang ◽

Shufang Deng ◽

Leiming Sun ◽

Liming Li ◽

Yue-Qing Hu

Keyword(s):

Rare Variants ◽

Association Studies ◽

Nonparametric Test ◽

Case Control ◽

Nucleotide Polymorphisms ◽

Retrospective Case ◽

Multiple Loci ◽

Common Diseases ◽

The Difference

The genome-wide association studies aim at identifying common or rare variants associated with common diseases and explaining more heritability. It is well known that common diseases are influenced by multiple single nucleotide polymorphisms (SNPs) that are usually correlated in location or function. In order to powerfully detect association signals, it is highly desirable to take account of correlations or linkage disequilibrium (LD) information among multiple SNPs in testing for association. In this article, we propose a test SLIDE that depicts the difference of the average multi-locus genotypes between cases and controls and derive its variance–covariance matrix in the retrospective design. This matrix is composed of the pairwise LD between SNPs. Thus SLIDE can borrow the strength from an external database in the population of interest with a few thousands to hundreds of thousands individuals to improve the power for detecting association. Extensive simulations show that SLIDE has apparent superiority over the existing methods, especially in the situation involving both common and rare variants, both protective and deleterious variants. Furthermore, the efficiency of the proposed method is demonstrated in the application to the data from the Wellcome Trust Case Control Consortium.