Mutagenesis of human genomes by endogenous mobile elements on a population scale

2021 ◽  
Author(s):  
Nelson T. Chuang ◽  
Eugene J. Gardner ◽  
Diane M. Terry ◽  
Jonathan Crabtree ◽  
Anup A. Mahurkar ◽  
...  

Several large-scale Illumina whole-genome sequencing (WGS) and whole-exome sequencing (WES) projects have emerged recently that have provided exceptional opportunities to discover mobile element insertions (MEIs) and study the impact of these MEIs on human genomes. However, these projects also have presented major challenges with respect to the scalability and computational costs associated with performing MEI discovery on tens or even hundreds of thousands of samples. To meet these challenges, we have developed a more efficient and scalable version of our mobile element locator tool (MELT) called CloudMELT. We then used MELT and CloudMELT to perform MEI discovery in 57,919 human genomes and exomes, leading to the discovery of 104,350 nonredundant MEIs. We leveraged this collection (1) to examine potentially active L1 source elements that drive the mobilization of new Alu, L1, and SVA MEIs in humans; (2) to examine the population distributions and subfamilies of these MEIs; and (3) to examine the mutagenesis of GENCODE genes, ENCODE-annotated features, and disease genes by these MEIs. Our study provides new insights on the L1 source elements that drive MEI mutagenesis and brings forth a better understanding of how this mutagenesis impacts human genomes.

2019 ◽  
Vol 35 (22) ◽  
pp. 4782-4787 ◽  
Author(s):  
David E Larson ◽  
Haley J Abel ◽  
Colby Chiang ◽  
Abhijit Badve ◽  
Indraniel Das ◽  
...  

Abstract Summary Large-scale human genetics studies are now employing whole genome sequencing with the goal of conducting comprehensive trait mapping analyses of all forms of genome variation. However, methods for structural variation (SV) analysis have lagged far behind those for smaller scale variants, and there is an urgent need to develop more efficient tools that scale to the size of human populations. Here, we present a fast and highly scalable software toolkit (svtools) and cloud-based pipeline for assembling high quality SV maps—including deletions, duplications, mobile element insertions, inversions and other rearrangements—in many thousands of human genomes. We show that this pipeline achieves similar variant detection performance to established per-sample methods (e.g. LUMPY), while providing fast and affordable joint analysis at the scale of ≥100 000 genomes. These tools will help enable the next generation of human genetics studies. Availability and implementation svtools is implemented in Python and freely available (MIT) from https://github.com/hall-lab/svtools. Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Author(s):  
Paolo Devanna ◽  
Xiaowei Sylvia Chen ◽  
Joses Ho ◽  
Dario Gajewski ◽  
Alessandro Gialluisi ◽  
...  

ABSTRACTNext generation sequencing has opened the way for the large scale interrogation of cohorts at the whole exome, or whole genome level. Currently, the field largely focuses on potential disease causing variants that fall within coding sequences and that are predicted to cause protein sequence changes, generally discarding non-coding variants. However non-coding DNA makes up ~98% of the genome and contains a range of sequences essential for controlling the expression of protein coding genes. Thus, potentially causative non-coding variation is currently being overlooked. To address this, we have designed an approach to assess variation in one class of non-coding regulatory DNA; the 3′UTRome. Variants in the 3'UTR region of genes are of particular interest because 3'UTRs are responsible for modulating protein expression levels via their interactions with microRNAs. Furthermore they are amenable to large scale analysis as 3′UTR-microRNA interactions are based on complementary base pairing and as such can be predicted in silico at the genome-wide level. We report a strategy for identifying and functionally testing variants in microRNA binding sites within the 3'UTRome and demonstrate the efficacy of this pipeline in a cohort of language impaired children. Using whole exome sequence data from 43 probands, we extracted variants that lay within 3'UTR microRNA binding sites. We identified a common variant (SNP) in a microRNA binding site and found this SNP to be associated with an endophenotype of language impairment (non-word repetition). We showed that this variant disrupted microRNA regulation in cells and was linked to altered gene expression in the brain, suggesting it may represent a risk factor contributing to SLI. This work demonstrates that biologically relevant variants are currently being under-investigated despite the wealth of next-generation sequencing data available and presents a simple strategy for interrogating non-coding regions of the genome. We propose that this strategy should be routinely applied to whole exome and whole genome sequence data in order to broaden our understanding of how non-coding genetic variation underlies complex phenotypes such as neurodevelopmental disorders.


2020 ◽  
Vol 13 (5) ◽  
pp. 504-514
Author(s):  
Zuhair N. Al-Hassnan ◽  
Abdulrahman Almesned ◽  
Sahar Tulbah ◽  
Ali Alakhfash ◽  
Faten Alhadeq ◽  
...  

Background: Childhood-onset cardiomyopathy is a heterogeneous group of conditions the cause of which is largely unknown. The influence of consanguinity on the genetics of cardiomyopathy has not been addressed at a large scale. Methods: To unravel the genetic cause of childhood-onset cardiomyopathy in a consanguineous population, a categorized approach was adopted. Cases with childhood-onset cardiomyopathy were consecutively recruited. Based on the likelihood of founder mutation and on the clinical diagnosis, genetic test was categorized to either (1) targeted genetic test with targeted mutation test, single-gene test, or multigene panel for Noonan syndrome, or (2) untargeted genetic test with whole-exome sequencing or whole-genome sequencing. Several bioinformatics tools were used to filter the variants. Results: Two-hundred five unrelated probands with various forms of cardiomyopathy were evaluated. The median age of presentation was 10 months. In 30.2% (n=62), targeted genetic test had a yield of 82.7% compared with 33.6% for whole-exome sequencing/whole-genome sequencing (n=143) giving an overall yield of 53.7%. Strikingly, 96.4% of the variants were homozygous, 9% of which were found in 4 dominant genes. Homozygous variants were also detected in 7 novel candidates ( ACACB, AASDH, CASZ1, FLII, RHBDF1, RPL3L, ULK1 ). Conclusions: Our work demonstrates the impact of consanguinity on the genetics of childhood-onset cardiomyopathy, the value of adopting a categorized population-sensitive genetic approach, and the opportunity of uncovering novel genes. Our data suggest that if a founder mutation is not suspected, adopting whole-exome sequencing/whole-genome sequencing as a first-line test should be considered.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Ismael A. Vergara ◽  
Christopher P. Mintoff ◽  
Shahneen Sandhu ◽  
Lachlan McIntosh ◽  
Richard J. Young ◽  
...  

AbstractAlthough melanoma is initiated by acquisition of point mutations and limited focal copy number alterations in melanocytes-of-origin, the nature of genetic changes that characterise lethal metastatic disease is poorly understood. Here, we analyze the evolution of human melanoma progressing from early to late disease in 13 patients by sampling their tumours at multiple sites and times. Whole exome and genome sequencing data from 88 tumour samples reveals only limited gain of point mutations generally, with net mutational loss in some metastases. In contrast, melanoma evolution is dominated by whole genome doubling and large-scale aneuploidy, in which widespread loss of heterozygosity sculpts the burden of point mutations, neoantigens and structural variants even in treatment-naïve and primary cutaneous melanomas in some patients. These results imply that dysregulation of genomic integrity is a key driver of selective clonal advantage during melanoma progression.


2021 ◽  
Vol 3 ◽  
Author(s):  
Raphael Payet-Burin ◽  
Mikkel Kromman ◽  
Silvio J. Pereira-Cardenal ◽  
Kenneth M. Strzepek ◽  
Peter Bauer-Gottwein

Perfect foresight hydroeconomic optimization models are tools to evaluate impacts of water infrastructure investments and policies considering complex system interlinkages. However, when assuming perfect foresight, optimal management decisions are found assuming perfect knowledge of climate and runoff, which might bias the economic evaluation of investments and policies. We investigate the impacts of assuming perfect foresight by using Model Predictive Control (MPC) as an alternative. We apply MPC in WHAT-IF, a hydroeconomic optimization model, for two study cases: a synthetic setup inspired by the Nile River, and a large-scale investment problem on the Zambezi River Basin considering the water–energy–food nexus. We validate the MPC framework against Stochastic Dynamic Programming and observe more realistic modeled reservoir operation compared to perfect foresight, especially regarding anticipation of spills and droughts. We find that the impact of perfect foresight on total system benefits remains small (<2%). However, when evaluating investments and policies using with-without analysis, perfect foresight is found to overestimate or underestimate values of investments by more than 20% in some scenarios. As the importance of different effects varies between scenarios, it is difficult to find general, case-independent guidelines predicting whether perfect foresight is a reasonable assumption. However, we find that the uncertainty linked to climate change in our study cases has more significant impacts than the assumption of perfect foresight. Hence, we recommend MPC to perform the economic evaluation of investments and policies, however, under high uncertainty of future climate, increased computational costs of MPC must be traded off against computational costs of exhaustive scenario exploration.


2014 ◽  
Vol 42 (3) ◽  
pp. 344-355 ◽  
Author(s):  
Gail E. Henderson ◽  
Susan M. Wolf ◽  
Kristine J. Kuczynski ◽  
Steven Joffe ◽  
Richard R. Sharp ◽  
...  

Large-scale sequencing tests, including whole-exome and whole-genome sequencing (WES/WGS), are rapidly moving into clinical use. Sequencing is already being used clinically to identify therapeutic opportunities for cancer patients who have run out of conventional treatment options, to help diagnose children with puzzling neurodevelopmental conditions, and to clarify appropriate drug choices and dosing in individuals. To evaluate and support clinical applications of these technologies, the National Human Genome Research Institute (NHGRI) and National Cancer Institute (NCI) have funded studies on clinical and research sequencing under the Clinical Sequencing Exploratory Research (CSER) program as well as studies on return of results (RoR). Most of these studies use sequencing in real-world clinical settings and collect data on both the application of sequencing and the impact of receiving genomic findings on study participants. They are occurring in the context of controversy over how to obtain consent for exome and genome sequencing.


2021 ◽  
pp. practneurol-2020-002561
Author(s):  
Huw R Morris ◽  
Henry Houlden ◽  
James Polke

The costs of whole-genome sequencing have rapidly decreased, and it is being increasingly deployed in large-scale clinical research projects and introduced into routine clinical care. This will lead to rapid diagnoses for patients with genetic disease but also introduces uncertainty because of the diversity of human genomes and the potential difficulties in annotating new genetic variants for individual patients and families. Here we outline the steps in organising whole-genome sequencing for patients in the neurology clinic and emphasise that close liaison between the clinician and the laboratory is essential.


2018 ◽  
Author(s):  
Sulev Reisberg ◽  
Kristi Krebs ◽  
Mart Kals ◽  
Reedik Mägi ◽  
Kristjan Metsalu ◽  
...  

ABSTRACTPurposeBiomedical databases combining electronic medical records, phenotypic and genomic data constitute a powerful resource for the personalization of treatment. To leverage the wealth of information provided, algorithms are required that systematically translate the contained information into treatment recommendations based on existing genotype-phenotype associations.MethodsWe developed and tested algorithms for translation of pre-existing genotype data of over 44,000 participants of the Estonian biobank into pharmacogenetic recommendations. We compared the results obtained by whole genome sequencing, whole exome sequencing and genotyping using microarrays, and evaluated the impact of pharmacogenetic reporting based on drug prescription statistics in the Nordic countries and Estonia.ResultsOur most striking result was that the performance of genotyping arrays is similar to that of whole genome sequencing, whereas exome sequencing is not suitable for pharmacogenetic predictions. Interestingly, 99.8% of all assessed individuals had a genotype associated with increased risks to at least one medication, and thereby the implementation of pharmacogenetic recommendations based on genotyping affects at least 50 daily drug doses per 1000 inhabitants.ConclusionWe find that microarrays are a cost-effective solution for creating pre-emptive pharmacogenetic reports, and with slight modifications, existing databases can be applied for automated pharmacogenetic decision support for clinicians.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Torrin L. McDonald ◽  
Weichen Zhou ◽  
Christopher P. Castro ◽  
Camille Mumm ◽  
Jessica A. Switzenberg ◽  
...  

AbstractMobile element insertions (MEIs) are repetitive genomic sequences that contribute to genetic variation and can lead to genetic disorders. Targeted and whole-genome approaches using short-read sequencing have been developed to identify reference and non-reference MEIs; however, the read length hampers detection of these elements in complex genomic regions. Here, we pair Cas9-targeted nanopore sequencing with computational methodologies to capture active MEIs in human genomes. We demonstrate parallel enrichment for distinct classes of MEIs, averaging 44% of reads on-targeted signals and exhibiting a 13.4-54x enrichment over whole-genome approaches. We show an individual flow cell can recover most MEIs (97% L1Hs, 93% AluYb, 51% AluYa, 99% SVA_F, and 65% SVA_E). We identify seventeen non-reference MEIs in GM12878 overlooked by modern, long-read analysis pipelines, primarily in repetitive genomic regions. This work introduces the utility of nanopore sequencing for MEI enrichment and lays the foundation for rapid discovery of elusive, repetitive genetic elements.


Sign in / Sign up

Export Citation Format

Share Document