scholarly journals Genome-wide analysis of mobile element insertions in human genomes

Author(s):  
Yiwei Niu ◽  
Xueyi Teng ◽  
Yirong Shi ◽  
Yanyan Li ◽  
Yiheng Tang ◽  
...  

AbstractMobile element insertions (MEIs) are a major class of structural variants (SVs) and have been linked to many human genetic disorders, including hemophilia, neurofibromatosis, and various cancers. However, human MEI resources from large-scale genome sequencing are still lacking compared to those for SNPs and SVs. Here, we report a comprehensive map of 36,699 non-reference MEIs constructed from 5,675 genomes, comprising 2,998 Chinese samples (∼26.2X, NyuWa) and 2,677 samples from the 1000 Genomes Project (∼7.4X, 1KGP). We discovered that LINE-1 insertions were highly enriched at centromere regions, implying the role of chromosome context in retroelement insertion. After functional annotation, we estimated that MEIs are responsible for about 9.3% of all protein-truncating events per genome. Finally, we built a companion database named HMEID for public use. This resource represents the latest and largest genomewide study on MEIs and will have broad utility for exploration of human MEI findings.

PLoS ONE ◽  
2017 ◽  
Vol 12 (1) ◽  
pp. e0167742 ◽  
Author(s):  
Paul S. de Vries ◽  
Maria Sabater-Lleal ◽  
Daniel I. Chasman ◽  
Stella Trompet ◽  
Tarunveer S. Ahluwalia ◽  
...  

2020 ◽  
Vol 287 (1930) ◽  
pp. 20200712 ◽  
Author(s):  
Elahe Parvizi ◽  
Ceridwen I. Fraser ◽  
Ludovic Dutoit ◽  
Dave Craw ◽  
Jonathan M. Waters

Theory suggests that catastrophic earth-history events can drive rapid biological evolution, but empirical evidence for such processes is scarce. Destructive geological events such as earthquakes can represent large-scale natural experiments for inferring such evolutionary processes. We capitalized on a major prehistoric (800 yr BP) geological uplift event affecting a southern New Zealand coastline to test for the lasting genomic impacts of disturbance. Genome-wide analyses of three co-distributed keystone kelp taxa revealed that post-earthquake recolonization drove the evolution of novel, large-scale intertidal spatial genetic ‘sectors’ which are tightly linked to geological fault boundaries. Demographic simulations confirmed that, following widespread extirpation, parallel expansions into newly vacant habitats rapidly restructured genome-wide diversity. Interspecific differences in recolonization mode and tempo reflect differing ecological constraints relating to habitat choice and dispersal capacity among taxa. This study highlights the rapid and enduring evolutionary effects of catastrophic ecosystem disturbance and reveals the key role of range expansion in reshaping spatial genetic patterns.


2019 ◽  
Vol 35 (22) ◽  
pp. 4782-4787 ◽  
Author(s):  
David E Larson ◽  
Haley J Abel ◽  
Colby Chiang ◽  
Abhijit Badve ◽  
Indraniel Das ◽  
...  

Abstract Summary Large-scale human genetics studies are now employing whole genome sequencing with the goal of conducting comprehensive trait mapping analyses of all forms of genome variation. However, methods for structural variation (SV) analysis have lagged far behind those for smaller scale variants, and there is an urgent need to develop more efficient tools that scale to the size of human populations. Here, we present a fast and highly scalable software toolkit (svtools) and cloud-based pipeline for assembling high quality SV maps—including deletions, duplications, mobile element insertions, inversions and other rearrangements—in many thousands of human genomes. We show that this pipeline achieves similar variant detection performance to established per-sample methods (e.g. LUMPY), while providing fast and affordable joint analysis at the scale of ≥100 000 genomes. These tools will help enable the next generation of human genetics studies. Availability and implementation svtools is implemented in Python and freely available (MIT) from https://github.com/hall-lab/svtools. Supplementary information Supplementary data are available at Bioinformatics online.


2022 ◽  
Author(s):  
Linyi Zhang ◽  
Samridhi Chaturvedi ◽  
Chris Nice ◽  
Lauren Lucas ◽  
Zachariah Gompert

Structural variants (SVs) can promote speciation by directly causing reproductive isolation or by suppressing recombination across large genomic regions. Whereas examples of each mechanism have been documented, systematic tests of the role of SVs in speciation are lacking. Here, we take advantage of long-read (Oxford nanopore) whole-genome sequencing and a hybrid zone between two Lycaeides butterfly taxa (L. melissa and Jackson Hole Lycaeides) to comprehensively evaluate genome-wide patterns of introgression for SVs and relate these patterns to hypotheses about speciation. We found >100,000 SVs segregating within or between the two hybridizing species. SVs and SNPs exhibited similar levels of genetic differentiation between species, with the exception of inversions, which were more differentiated. We detected credible variation in patterns of introgression among SV loci in the hybrid zone, with 562 of 1419 ancestry-informative SVs exhibiting genomic clines that deviating from null expectations based on genome-average ancestry. Overall, hybrids exhibited a directional shift towards Jackson Hole Lycaeides ancestry at SV loci, consistent with the hypothesis that these loci experienced more selection on average then SNP loci. Surprisingly, we found that deletions, rather than inversions, showed the highest skew towards excess introgression from Jackson Hole Lycaeides. Excess Jackson Hole Lycaeides ancestry in hybrids was also especially pronounced for Z-linked SVs and inversions containing many genes. In conclusion, our results show that SVs are ubiquitous and suggest that SVs in general, but especially deletions, might contribute disproportionately to hybrid fitness and thus (partial) reproductive isolation.


2021 ◽  
Vol 49 (3) ◽  
pp. 1497-1516
Author(s):  
Wilfried M Guiblet ◽  
Marzia A Cremona ◽  
Robert S Harris ◽  
Di Chen ◽  
Kristin A Eckert ◽  
...  

Abstract Approximately 13% of the human genome can fold into non-canonical (non-B) DNA structures (e.g. G-quadruplexes, Z-DNA, etc.), which have been implicated in vital cellular processes. Non-B DNA also hinders replication, increasing errors and facilitating mutagenesis, yet its contribution to genome-wide variation in mutation rates remains unexplored. Here, we conducted a comprehensive analysis of nucleotide substitution frequencies at non-B DNA loci within noncoding, non-repetitive genome regions, their ±2 kb flanking regions, and 1-Megabase windows, using human-orangutan divergence and human single-nucleotide polymorphisms. Functional data analysis at single-base resolution demonstrated that substitution frequencies are usually elevated at non-B DNA, with patterns specific to each non-B DNA type. Mirror, direct and inverted repeats have higher substitution frequencies in spacers than in repeat arms, whereas G-quadruplexes, particularly stable ones, have higher substitution frequencies in loops than in stems. Several non-B DNA types also affect substitution frequencies in their flanking regions. Finally, non-B DNA explains more variation than any other predictor in multiple regression models for diversity or divergence at 1-Megabase scale. Thus, non-B DNA substantially contributes to variation in substitution frequencies at small and large scales. Our results highlight the role of non-B DNA in germline mutagenesis with implications to evolution and genetic diseases.


Author(s):  
Arunabha Majumdar ◽  
Preksha Patel ◽  
Bogdan Pasaniuc ◽  
Roel A. Ophoff

AbstractIn genetic studies of psychiatric disorders in the pre-genome-wide association study (GWAS) era, one of the most commonly studied loci is the serotonin transporter (SLC6A4) promoter polymorphism, a 43-base-pair insertion/deletion polymorphism in the promoter region (5-HTTLPR). The genetic association signals between 5-HTTLPR and psychiatric phenotypes, however, have been inconsistent across many studies. Since the polymorphism cannot be tested via available SNP arrays, we had previously proposed an efficient machine learning algorithm to predict the genotypes of 5-HTTLPR based on the genotypes of eight nearby SNPs, which requires access to individual-level genotype and phenotype data. To utilize the advantage of publicly available GWAS summary statistics obtained from studies with very large sample sizes, we develop a GWAS summary-statistics-based approach for testing the variable number of tandem repeat (VNTR) associations with various phenotypes. We first cross-verify the accuracy of the summary-statistics-based approach for 61 phenotypes in the UK Biobank. Since we observed a strong similarity between the predicted individual-level 5-HTTLPR genotype-based approach and the summary-statistics-based approach, we applied our method to the available neurobehavioral GWAS summary statistics data obtained from large-scale GWAS. We found no genome-wide significant evidence for association between 5-HTTLPR and any of the neurobehavioral traits. We did observe, however, genome-wide significant evidence for association between this locus and human adult height, BMI, and total cholesterol. Our summary-statistics-based approach provides a systematic way to examine the role of VNTRs and related types of genetic polymorphisms in disease risk and trait susceptibility of phenotypes for which large-scale GWAS summary statistics data are available.


2020 ◽  
Vol 23 (2) ◽  
pp. 135-136
Author(s):  
Cynthia Bulik ◽  
Martin Kennedy ◽  
Tracey Wade

AbstractIdentification of genetic variants associated with eating disorders is underway. The Anorexia Nervosa Genetics Initiative, an initiative of the Klarman Family Foundation, has contributed to advancing the field, yielding a large-scale genome-wide association study published in Nature Genetics. Eight genetic variants significantly associated with anorexia nervosa were identified, along with patterns of genetic correlations that suggest both psychiatric and metabolic origins of this serious and life-threatening illness. This article details the role of Professor Nick Martin in contributing to this important collaboration.


2019 ◽  
Author(s):  
Clement Goubert ◽  
Jainy Thomas ◽  
Lindsay M. Payer ◽  
Jeffrey M. Kidd ◽  
Julie Feusier ◽  
...  

ABSTRACTAlu retrotransposons account for more than 10% of the human genome, and insertions of these elements create structural variants segregating in human populations. Such polymorphic Alu are powerful markers to understand population structure, and they represent variants that can greatly impact genome function, including gene expression. Accurate genotyping of Alu and other mobile elements has been challenging. Indeed, we found that Alu genotypes previously called for the 1000 Genomes Project are sometimes erroneous, which poses significant problems for phasing these insertions with other variants that comprise the haplotype. To ameliorate this issue, we introduce a new pipeline -- TypeTE -- which genotypes Alu insertions from whole-genome sequencing data. Starting from a list of polymorphic Alus, TypeTE identifies the hallmarks (poly-A tail and target site duplication) and orientation of Alu insertions using local re-assembly to reconstruct presence and absence alleles. Genotype likelihoods are then computed after re-mapping sequencing reads to the reconstructed alleles. Using a ‘gold standard’ set of PCR-based genotyping of >200 loci, we show that TypeTE improves genotype accuracy from 83% to 92% in the 1000 Genomes dataset. TypeTE can be readily adapted to other retrotransposon families and brings a valuable toolbox addition for population genomics.


2019 ◽  
Author(s):  
Ankit Kumar Pathak ◽  
Ashwin Kumar Jainarayanan ◽  
Samir Kumar Brahmachari

ABSTRACTWith large-scale human genome and exome sequencing, a lot of focus has gone in studying variations present in genomes and their associations to various diseases. Since major emphasis has been put on their variations, less focus has been given to invariant genes in the population. Here we present 60,706 genomes from the ExAC database to identify population specific invariant genes. Out of 1,336 total genes drawn from various population specific invariant genes, 423 were identified to be mostly (allele frequency less than 0.001) invariant across different populations. 46 of these invariant genes showed absolute invariance in all populations. Most of these common invariant genes have homologs in primates, rodents and placental mammals while 8 of them were unique to human genome and 3 genes still had unknown functions. Surprisingly, a majority were found to be X-linked and around 50% of these genes were not expressed in any tissues. The functional analysis showed that the invariant genes are not only involved in fundamental functions like transcription and translation but also in various developmental processes. The variations in many of these invariant genes were found to be associated with cancer, developmental diseases and dominant genetic disorders.


2015 ◽  
Vol 2015 ◽  
pp. 1-7 ◽  
Author(s):  
Haihua Bai ◽  
Haiping Liu ◽  
Suyalatu Suyalatu ◽  
Xiaosen Guo ◽  
Shandan Chu ◽  
...  

The large scale genome wide association studies (GWAS) have identified approximately 80 single nucleotide polymorphisms (SNPs) conferring susceptibility to type 2 diabetes (T2D). However, most of these loci have not been replicated in diverse populations and much genetic heterogeneity has been observed across ethnic groups. We tested 28 SNPs previously found to be associated with T2D by GWAS in a Mongolian sample of Northern China (497 diagnosed with T2D and 469 controls) for association with T2D and diabetes related quantitative traits. We replicated T2D association of 11 SNPs, namely, rs7578326 (IRS1), rs1531343 (HMGA2), rs8042680 (PRC1), rs7578597 (THADA), rs1333051 (CDKN2), rs6723108 (TMEM163), rs163182 and rs2237897 (KCNQ1), rs1387153 (MTNR1B), rs243021 (BCL11A), and rs10229583 (PAX4) in our sample. Further, we showed that risk allele of the strongest T2D associated SNP in our sample, rs757832 (IRS1), is associated with increased level of TG. We observed substantial difference of T2D risk allele frequency between the Mongolian sample and the 1000G Caucasian sample for a few SNPs, including rs6723108 (TMEM163) whose risk allele reaches near fixation in the Mongolian sample. Further study of genetic architecture of these variants in susceptibility of T2D is needed to understand the role of these variants in heterogeneous populations.


Sign in / Sign up

Export Citation Format

Share Document