scholarly journals Landscape of multi-nucleotide variants in 125,748 human exomes and 15,708 genomes

2019 ◽  
Author(s):  
Qingbo Wang ◽  
Emma Pierce-Hoffman ◽  
Beryl B. Cummings ◽  
Konrad J. Karczewski ◽  
Jessica Alföldi ◽  
...  

AbstractMulti-nucleotide variants (MNVs), defined as two or more nearby variants existing on the same haplotype in an individual, are a clinically and biologically important class of genetic variation. However, existing tools for variant interpretation typically do not accurately classify MNVs, and understanding of their mutational origins remains limited. Here, we systematically survey MNVs in 125,748 whole exomes and 15,708 whole genomes from the Genome Aggregation Database (gnomAD). We identify 1,996,125 MNVs across the genome with constituent variants falling within 2 bp distance of one another, of which 31,510 exist within the same codon, including 405 predicted to result in gain of a nonsense mutation, 1,818 predicted to rescue a nonsense mutation event that would otherwise be caused by one of the constituent variants, and 16,481 additional variants predicted to alter protein sequences. We show that the distribution of MNVs is highly non-uniform across the genome, and that this non-uniformity can be largely explained by a variety of known mutational mechanisms, such as CpG deamination, replication error by polymerase zeta, or polymerase slippage at repeat junctions. We also provide an estimate of the dinucleotide mutation rate caused by polymerase zeta. Finally, we show that differential CpG methylation drives MNV differences across functional categories. Our results demonstrate the importance of incorporating haplotype-aware annotation for accurate functional interpretation of genetic variation, and refine our understanding of genome-wide mutational mechanisms of MNVs.

2021 ◽  
Author(s):  
Jordi Valls-Margarit ◽  
Iván Galván-Femenía ◽  
Daniel Matias ◽  
Natalia Blay ◽  
Montserrat Puiggròs ◽  
...  

The combined analysis of haplotype panels with phenotype clinical cohorts is a common approach to explore the genetic architecture of human diseases. However, genetic studies are mainly based on single nucleotide variants (SNVs) and small insertions and deletions (indels). Here, we contribute to fill this gap by generating a dense haplotype map focused on the identification, characterization and phasing of structural variants (SVs). By integrating multiple variant identification methods and Logistic Regression models, we present a catalogue of 35,431,441 variants, including 89,178 SVs (≥50bp), 30,325,064 SNVs and 5,017,199 indels, across 785 Illumina high coverage (30X) whole-genomes from the Iberian GCAT Cohort, containing 3.52M SNVs, 606,336 indels and 6,393 SVs in median per individual. The haplotype panel is able to impute up to 14,360,728 SNVs/indels and 23,179 SVs, showing a 2.7-fold increase for SVs compared with available genetic variation panels. The value of this panel for SVs analysis is shown through an imputed rare Alu element located in a new locus associated with mononeuritis of lower limb, a rare neuromuscular disease. This study represents the first deep characterization of genetic variation within the Iberian population and the first operational haplotype panel to systematically include the SVs into genome-wide genetic studies.


Pathogens ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 363
Author(s):  
Sulochana K. Wasala ◽  
Dana K. Howe ◽  
Louise-Marie Dandurand ◽  
Inga A. Zasada ◽  
Dee R. Denver

Globodera pallida is among the most significant plant-parasitic nematodes worldwide, causing major damage to potato production. Since it was discovered in Idaho in 2006, eradication efforts have aimed to contain and eradicate G. pallida through phytosanitary action and soil fumigation. In this study, we investigated genome-wide patterns of G. pallida genetic variation across Idaho fields to evaluate whether the infestation resulted from a single or multiple introduction(s) and to investigate potential evolutionary responses since the time of infestation. A total of 53 G. pallida samples (~1,042,000 individuals) were collected and analyzed, representing five different fields in Idaho, a greenhouse population, and a field in Scotland that was used for external comparison. According to genome-wide allele frequency and fixation index (Fst) analyses, most of the genetic variation was shared among the G. pallida populations in Idaho fields pre-fumigation, indicating that the infestation likely resulted from a single introduction. Temporal patterns of genome-wide polymorphisms involving (1) pre-fumigation field samples collected in 2007 and 2014 and (2) pre- and post-fumigation samples revealed nucleotide variants (SNPs, single-nucleotide polymorphisms) with significantly differentiated allele frequencies indicating genetic differentiation. This study provides insights into the genetic origins and adaptive potential of G. pallida invading new environments.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Kelly B. Klingler ◽  
Joshua P. Jahner ◽  
Thomas L. Parchman ◽  
Chris Ray ◽  
Mary M. Peacock

Abstract Background Distributional responses by alpine taxa to repeated, glacial-interglacial cycles throughout the last two million years have significantly influenced the spatial genetic structure of populations. These effects have been exacerbated for the American pika (Ochotona princeps), a small alpine lagomorph constrained by thermal sensitivity and a limited dispersal capacity. As a species of conservation concern, long-term lack of gene flow has important consequences for landscape genetic structure and levels of diversity within populations. Here, we use reduced representation sequencing (ddRADseq) to provide a genome-wide perspective on patterns of genetic variation across pika populations representing distinct subspecies. To investigate how landscape and environmental features shape genetic variation, we collected genetic samples from distinct geographic regions as well as across finer spatial scales in two geographically proximate mountain ranges of eastern Nevada. Results Our genome-wide analyses corroborate range-wide, mitochondrial subspecific designations and reveal pronounced fine-scale population structure between the Ruby Mountains and East Humboldt Range of eastern Nevada. Populations in Nevada were characterized by low genetic diversity (π = 0.0006–0.0009; θW = 0.0005–0.0007) relative to populations in California (π = 0.0014–0.0019; θW = 0.0011–0.0017) and the Rocky Mountains (π = 0.0025–0.0027; θW = 0.0021–0.0024), indicating substantial genetic drift in these isolated populations. Tajima’s D was positive for all sites (D = 0.240–0.811), consistent with recent contraction in population sizes range-wide. Conclusions Substantial influences of geography, elevation and climate variables on genetic differentiation were also detected and may interact with the regional effects of anthropogenic climate change to force the loss of unique genetic lineages through continued population extirpations in the Great Basin and Sierra Nevada.


Genetics ◽  
2001 ◽  
Vol 157 (1) ◽  
pp. 283-294 ◽  
Author(s):  
Kristie Ashton ◽  
Ana Patricia Wagoner ◽  
Roland Carrillo ◽  
Greg Gibson

AbstractDrosophila melanogaster appears to be well suited as a model organism for quantitative pharmacogenetic analysis. A genome-wide deficiency screen for haploinsufficient effects on prepupal heart rate identified nine regions of the genome that significantly reduce (five deficiencies) or increase (four deficiencies) heart rate across a range of genetic backgrounds. Candidate genes include several neurotransmitter receptor loci, particularly monoamine receptors, consistent with results of prior pharmacological manipulations of heart rate, as well as genes associated with paralytic phenotypes. Significant genetic variation is also shown to exist for a suite of four autonomic behaviors that are exhibited spontaneously upon decapitation, namely, grooming, grasping, righting, and quivering. Overall activity levels are increased by application of particular concentrations of the drugs octopamine and nicotine, but due to high environmental variance both within and among replicate vials, the significance of genetic variation among wild-type lines for response to the drugs is difficult to establish. An interval mapping design was also used to map two or three QTL for each behavioral trait in a set of recombinant inbred lines derived from the laboratory stocks Oregon-R and 2b.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Alvaro N. Barbeira ◽  
◽  
Rodrigo Bonazzola ◽  
Eric R. Gamazon ◽  
Yanyu Liang ◽  
...  

AbstractThe resources generated by the GTEx consortium offer unprecedented opportunities to advance our understanding of the biology of human diseases. Here, we present an in-depth examination of the phenotypic consequences of transcriptome regulation and a blueprint for the functional interpretation of genome-wide association study-discovered loci. Across a broad set of complex traits and diseases, we demonstrate widespread dose-dependent effects of RNA expression and splicing. We develop a data-driven framework to benchmark methods that prioritize causal genes and find no single approach outperforms the combination of multiple approaches. Using colocalization and association approaches that take into account the observed allelic heterogeneity of gene expression, we propose potential target genes for 47% (2519 out of 5385) of the GWAS loci examined.


2018 ◽  
Vol 60 (1) ◽  
pp. 17-28 ◽  
Author(s):  
Yasmeen Niazi ◽  
Hauke Thomsen ◽  
Bozena Smolkova ◽  
Ludmila Vodickova ◽  
Sona Vodenkova ◽  
...  

2022 ◽  
Author(s):  
Tiago da Silva Ribeiro ◽  
José A Galván ◽  
John E Pool

Local adaptation can lead to elevated genetic differentiation at the targeted genetic variant and nearby sites. Selective sweeps come in different forms, and depending on the initial and final frequencies of a favored variant, very different patterns of genetic variation may be produced. If local selection favors an existing variant that had already recombined onto multiple genetic backgrounds, then the width of elevated genetic differentiation (high FST) may be too narrow to detect using a typical windowed genome scan, even if the targeted variant becomes highly differentiated. We therefore used a simulation approach to investigate the power of SNP-level FST (specifically, the maximum SNP FST value within a window) to detect diverse scenarios of local adaptation, and compared it against whole-window FST and the Comparative Haplotype Identity statistic. We found that SNP FST had superior power to detect complete or mostly complete soft sweeps, but lesser power than window-wide statistics to detect partial hard sweeps. To investigate the relative enrichment and nature of SNP FST outliers from real data, we applied the two FST statistics to a panel of Drosophila melanogaster populations. We found that SNP FST had a genome-wide enrichment of outliers compared to demographic expectations, and though it yielded a lesser enrichment than window FST, it detected mostly unique outlier genes and functional categories. Our results suggest that SNP FST is highly complementary to typical window-based approaches for detecting local adaptation, and merits inclusion in future genome scans and methodologies.


Sign in / Sign up

Export Citation Format

Share Document