scholarly journals An open resource of structural variation for medical and population genetics

2019 ◽  
Author(s):  
Ryan L. Collins ◽  
Harrison Brand ◽  
Konrad J. Karczewski ◽  
Xuefang Zhao ◽  
Jessica Alföldi ◽  
...  

SUMMARYStructural variants (SVs) rearrange large segments of the genome and can have profound consequences for evolution and human diseases. As national biobanks, disease association studies, and clinical genetic testing grow increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD) have become integral for interpreting genetic variation. To date, no large-scale reference maps of SVs exist from high-coverage sequencing comparable to those available for point mutations in protein-coding genes. Here, we constructed a reference atlas of SVs across 14,891 genomes from diverse global populations (54% non-European) as a component of gnomAD. We discovered a rich landscape of 433,371 distinct SVs, including 5,295 multi-breakpoint complex SVs across 11 mutational subclasses, and examples of localized chromosome shattering, as in chromothripsis. The average individual harbored 7,439 SVs, which accounted for 25-29% of all rare protein-truncating events per genome. We found strong correlations between constraint against damaging point mutations and rare SVs that both disrupt and duplicate protein-coding sequence, suggesting intolerance to reciprocal dosage alterations for a subset of tightly regulated genes. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than any effect on noncoding SVs. Finally, we benchmarked carrier rates for medically relevant SVs, finding very large (≥1Mb) rare SVs in 3.8% of genomes (~1:26 individuals) and clinically reportable incidental SVs in 0.18% of genomes (~1:556 individuals). These data have been integrated directly into the gnomAD browser (https://gnomad.broadinstitute.org) and will have broad utility for population genetics, disease association, and diagnostic screening.

This chapter focuses on the Human Genome Project (HGP), which determined that humans have between 20,000 to 25,000 protein-coding genes and only about 1.5% of the genome codes for proteins, rRNA, and tRNA. The remainder once referred as “junk DNA” is today known to be crucial to survival of the species. Research indicates that genes are not contiguous, and some genes occur within the introns of other genes; some genes can overlap with each other either on the same or on different DNA strands with shared coding and/or regulatory elements; plus, the vast majority of human genes undergo alternative splicing leading to different proteins being encoded by the same gene. Advances in genomics and gene sequencing technologies have created exceptional opportunities for the delivery of personalized medical care. Clinical genetic testing has been helpful in identifying gene variants associated with risks for a number of diseases and health conditions.


2021 ◽  
Author(s):  
Jamie M Ellingford ◽  
Joo Wook Ahn ◽  
Richard D Bagnall ◽  
Diana Baralle ◽  
Stephanie Barton ◽  
...  

Purpose: The majority of clinical genetic testing focuses almost exclusively on regions of the genome that directly encode proteins. The important role of variants in non-coding regions in penetrant disease is, however, increasingly being demonstrated, and the use of whole genome sequencing in clinical diagnostic settings is rising across a large range of genetic disorders. Despite this, there is no existing guidance on how current guidelines designed primarily for variants in protein-coding regions should be adapted for variants identified in other genomic contexts. Methods: We convened a panel of clinical and research scientists with wide-ranging expertise in clinical variant interpretation, with specific experience in variants within non-coding regions. This panel discussed and refined an initial draft of the guidelines which were then extensively tested and reviewed by external groups. Results: We discuss considerations specifically for variants in non-coding regions of the genome. We outline how to define candidate regulatory elements, highlight examples of mechanisms through which non-coding region variants can lead to penetrant monogenic disease, and outline how existing guidelines can be adapted for these variants. Conclusion: These recommendations aim to increase the number and range of non-coding region variants that can be clinically interpreted, which, together with a compatible phenotype, can lead to new diagnoses and catalyse the discovery of novel disease mechanisms.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Shumaila Sayyab ◽  
Anders Lundmark ◽  
Malin Larsson ◽  
Markus Ringnér ◽  
Sara Nystedt ◽  
...  

AbstractThe mechanisms driving clonal heterogeneity and evolution in relapsed pediatric acute lymphoblastic leukemia (ALL) are not fully understood. We performed whole genome sequencing of samples collected at diagnosis, relapse(s) and remission from 29 Nordic patients. Somatic point mutations and large-scale structural variants were called using individually matched remission samples as controls, and allelic expression of the mutations was assessed in ALL cells using RNA-sequencing. We observed an increased burden of somatic mutations at relapse, compared to diagnosis, and at second relapse compared to first relapse. In addition to 29 known ALL driver genes, of which nine genes carried recurrent protein-coding mutations in our sample set, we identified putative non-protein coding mutations in regulatory regions of seven additional genes that have not previously been described in ALL. Cluster analysis of hundreds of somatic mutations per sample revealed three distinct evolutionary trajectories during ALL progression from diagnosis to relapse. The evolutionary trajectories provide insight into the mutational mechanisms leading relapse in ALL and could offer biomarkers for improved risk prediction in individual patients.


eLife ◽  
2014 ◽  
Vol 3 ◽  
Author(s):  
Vladislava Chalei ◽  
Stephen N Sansom ◽  
Lesheng Kong ◽  
Sheena Lee ◽  
Juan F Montiel ◽  
...  

Many intergenic long noncoding RNA (lncRNA) loci regulate the expression of adjacent protein coding genes. Less clear is whether intergenic lncRNAs commonly regulate transcription by modulating chromatin at genomically distant loci. Here, we report both genomically local and distal RNA-dependent roles of Dali, a conserved central nervous system expressed intergenic lncRNA. Dali is transcribed downstream of the Pou3f3 transcription factor gene and its depletion disrupts the differentiation of neuroblastoma cells. Locally, Dali transcript regulates transcription of the Pou3f3 locus. Distally, it preferentially targets active promoters and regulates expression of neural differentiation genes, in part through physical association with the POU3F3 protein. Dali interacts with the DNMT1 DNA methyltransferase in mouse and human and regulates DNA methylation status of CpG island-associated promoters in trans. These results demonstrate, for the first time, that a single intergenic lncRNA controls the activity and methylation of genomically distal regulatory elements to modulate large-scale transcriptional programmes.


2019 ◽  
Author(s):  
Dimitrios Vitsios ◽  
Slavé Petrovski

AbstractAccess to large-scale genomics datasets has increased the utility of hypothesis-free genome-wide analyses that result in candidate lists of genes. Often these analyses highlight several gene signals that might contribute to pathogenesis but are insufficiently powered to reach experiment-wide significance. This often triggers a process of laborious evaluation of highly-ranked genes through manual inspection of various public knowledge resources to triage those considered sufficiently interesting for deeper investigation. Here, we introduce a novel multi-dimensional, multi-step machine learning framework to objectively and more holistically assess biological relevance of genes to disease studies, by relying on a plethora of gene-associated annotations. We developed mantis-ml to serve as an automated machine learning (AutoML) framework, following a stochastic semi-supervised learning approach to rank known and novel disease-associated genes through iterative training and prediction sessions of random balanced datasets across the protein-coding exome (n=18,626 genes). We applied this framework on a range of disease-specific areas and as a generic disease likelihood estimator, achieving an average Area Under Curve (AUC) prediction performance of 0.85. Critically, to demonstrate applied utility on exome-wide association studies, we overlapped mantis-ml disease-specific predictions with data from published cohort-level association studies. We retrieved statistically significant enrichment of high mantis-ml predictions among the top-ranked genes from hypothesis-free cohort-level statistics (p<0.05), suggesting the capture of true prioritisation signals. We believe that mantis-ml is a novel easy-to-use tool to support objectively triaging gene discovery and overall enhancing our understanding of complex genotype-phenotype associations.


2019 ◽  
Vol 35 (19) ◽  
pp. 3576-3583 ◽  
Author(s):  
Chong Wu ◽  
Wei Pan

Abstract Motivation Most trait-associated genetic variants identified in genome-wide association studies (GWASs) are located in non-coding regions of the genome and thought to act through their regulatory roles. Results To account for enriched association signals in DNA regulatory elements, we propose a novel and general gene-based association testing strategy that integrates enhancer-target gene pairs and methylation quantitative trait locus data with GWAS summary results; it aims to both boost statistical power for new discoveries and enhance mechanistic interpretability of any new discovery. By reanalyzing two large-scale schizophrenia GWAS summary datasets, we demonstrate that the proposed method could identify some significant and novel genes (containing no genome-wide significant SNPs nearby) that would have been missed by other competing approaches, including the standard and some integrative gene-based association methods, such as one incorporating enhancer-target gene pairs and one integrating expression quantitative trait loci. Availability and implementation Software: wuchong.org/egmethyl.html Supplementary information Supplementary data are available at Bioinformatics online.


2014 ◽  
Vol 4 (1) ◽  
Author(s):  
Bart J. G. Broeckx ◽  
Frank Coopman ◽  
Geert E. C. Verhoeven ◽  
Valérie Bavegems ◽  
Sarah De Keulenaer ◽  
...  

Abstract Whole exome sequencing is a technique that aims to selectively sequence all exons of protein-coding genes. A canine whole exome sequencing enrichment kit was designed based on the latest canine reference genome (build 3.1.72). Its performance was tested by sequencing 2 exome captures, each consisting of 4 pre-capture pooled, barcoded Illumina libraries on an Illumina HiSeq 2500. At an average sequencing depth of 102x, 83 to 86% of the target regions were completely sequenced with a minimum coverage of five and 90% of the reads mapped on the target regions. Additionally, it is shown that the reproducibility within and between captures is high and that pooling four samples per capture is a valid option. Overall, we have demonstrated the strong performance of this WES enrichment kit and are confident it will be a valuable tool in future disease association studies.


2019 ◽  
Vol 5 (5) ◽  
pp. eaaw0946 ◽  
Author(s):  
Enrique Lin-Shiao ◽  
Yemin Lan ◽  
Julia Welzenbach ◽  
Katherine A. Alexander ◽  
Zhen Zhang ◽  
...  

The transcription factor p63 is a key mediator of epidermal development. Point mutations in p63 in patients lead to developmental defects, including orofacial clefting. To date, knowledge on how pivotal the role of p63 is in human craniofacial development is limited. Using an inducible transdifferentiation model, combined with epigenomic sequencing and multicohort meta-analysis of genome-wide association studies data, we show that p63 establishes enhancers at craniofacial development genes to modulate their transcription. Disease-specific substitution mutation in the DNA binding domain or sterile alpha motif protein interaction domain of p63, respectively, eliminates or reduces establishment of these enhancers. We show that enhancers established by p63 are highly enriched for single-nucleotide polymorphisms associated with nonsyndromic cleft lip ± cleft palate (CL/P). These orthogonal approaches indicate a strong molecular link between p63 enhancer function and CL/P, illuminating molecular mechanisms underlying this developmental defect and revealing vital regulatory elements and new candidate causative genes.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 121 ◽  
Author(s):  
Enrico Ferrero

The identification of therapeutic targets is a critical step in the research and developement of new drugs, with several drug discovery programmes failing because of a weak linkage between target and disease. Genome-wide association studies and large-scale gene expression experiments are providing insights into the biology of several common and complex diseases, but the complexity of transcriptional regulation mechanisms often limit our understanding of how genetic variation can influence changes in gene expression. Several initiatives in the field of regulatory genomics are aiming to close this gap by systematically identifying and cataloguing regulatory elements such as promoters and enhacers across different tissues and cell types. In this Bioconductor workflow, we will explore how different types of regulatory genomic data can be used for the functional interpretation of disease-associated variants and for the prioritisation of gene lists from gene expression experiments.


Gut ◽  
2019 ◽  
Vol 68 (5) ◽  
pp. 928-941 ◽  
Author(s):  
Claartje Aleid Meddens ◽  
Amy Catharina Johanna van der List ◽  
Edward Eelco Salomon Nieuwenhuis ◽  
Michal Mokry

Genome-wide association studies have identified over 200 loci associated with IBD. We and others have recently shown that, in addition to variants in protein-coding genes, the majority of the associated loci are related to DNA regulatory elements (DREs). These findings add a dimension to the already complex genetic background of IBD. In this review we summarise the existing evidence on the role of DREs in IBD. We discuss how epigenetic research can be used in candidate gene approaches that take non-coding variants into account and can help to pinpoint the essential pathways and cell types in the pathogenesis of IBD. Despite the increased level of genetic complexity, these findings can contribute to novel therapeutic options that target transcription factor binding and enhancer activity. Finally, we summarise the future directions and challenges of this emerging field.


Sign in / Sign up

Export Citation Format

Share Document