scholarly journals Structural variants are a major source of gene expression differences in humans and often affect multiple nearby genes

2021 ◽  
pp. gr.275488.121
Author(s):  
Alexandra J Scott ◽  
Colby Chiang ◽  
Ira M Hall

Structural variants (SVs) are an important source of human genome diversity but their functional effects are not well understood. We mapped 61,668 SVs in 613 individuals with deep genome sequencing data from the GTEx project and measured their effects on gene expression. We estimate that common SVs are causal at 2.66% of eQTLs, which is a 10.5-fold enrichment relative to their abundance in the genome and consistent with prior work using smaller sample sizes. Duplications and deletions were the most impactful variant types, whereas the contribution of mobile element insertions was small (0.12% of eQTLs, 1.9-fold enriched). Multi-tissue analysis of expression effects revealed that gene-altering SVs show significantly more constitutive effects than other variant types, with 62.09% of coding SV-eQTLs active in all tissues with known eQTL activity compared to 23.08% of coding SNV- and indel-eQTLs, while noncoding SVs, SNVs and indels show broadly similar patterns. We also identified 539 rare SVs associated with nearby gene expression outliers. Of these, 62.34% are noncoding SVs that show strong effects on gene expression yet modest enrichment at known regulatory elements, demonstrating that rare noncoding SVs are a major source of gene expression differences but remain difficult to predict from current annotations. Both common and rare noncoding SVs often show strong regional effects on the expression of multiple genes: SV-eQTLs affect an average of 1.82 nearby genes compared to 1.09 genes affected by SNV- and indel-eQTLs, and 21.34% of rare expression-altering SVs show strong effects on 2-9 different genes. We also observe significant effects on rare gene expression changes extending 1 Mb from the SV. This provides a mechanism by which individual noncoding SVs may have strong or pleiotropic effects on phenotypic variation and disease.

2021 ◽  
Author(s):  
Alexandra J. Scott ◽  
Colby Chiang ◽  
Ira M. Hall

ABSTRACTStructural variants (SVs) are an important source of human genome diversity but their functional effects are not well understood. We mapped 61,668 SVs in 613 individuals with deep genome sequencing data from the GTEx project and measured their effects on gene expression. We estimate that common SVs are causal at 2.66% of eQTLs, which is a 10.5-fold enrichment relative to their abundance in the genome and consistent with prior work using smaller sample sizes. Duplications and deletions were the most impactful variant types, whereas the contribution of mobile element insertions was surprisingly small (0.12% of eQTLs, 1.9-fold enriched). Multi-tissue analysis of expression effects revealed that gene-altering SVs show significantly more constitutive effects than other variant types, with 62.09% of coding SV-eQTLs active in all tissues with known eQTL activity compared to 23.08% of coding SNV- and indel-eQTLs, whereas noncoding SVs, SNVs and indels show broadly similar patterns. We also identified 539 rare SVs associated with nearby gene expression outliers. Of these, 62.34% are noncoding SVs that show strong effects on gene expression yet modest enrichment at known regulatory elements, demonstrating that rare noncoding SVs are a major source of gene expression differences but remain difficult to predict from current annotations. Remarkably, both common and rare noncoding SVs often show strong regional effects on the expression of multiple genes: SV-eQTLs affect an average of 1.82 nearby genes compared to 1.09 genes affected by SNV- and indel-eQTLs, and 21.34% of rare expression-altering SVs show strong effects on 2-9 different genes. We also observe significant effects on gene expression extending 1 Mb from the SV. This provides a mechanism by which individual noncoding SVs may have strong and/or pleiotropic effects on phenotypic variation and disease.


2020 ◽  
Vol 22 (Supplement_3) ◽  
pp. iii300-iii300
Author(s):  
Michael Koldobskiy ◽  
Ashley Tetens ◽  
Allison Martin ◽  
Charles Eberhart ◽  
Eric Raabe ◽  
...  

Abstract Diffuse intrinsic pontine glioma (DIPG) is a childhood brainstem tumor with a dismal prognosis and no effective treatment. Recent studies point to a critical role for epigenetic dysregulation in this disease. Nearly 80% of DIPGs harbor mutations in histone H3 encoding replacement of lysine 27 with methionine (K27M), leading to global loss of the repressive histone H3K27 trimethylation mark, global DNA hypomethylation, and a distinct gene expression profile. However, a static view of the epigenome fails to capture the plasticity of cancer cells and their gene expression states. Recent studies across diverse cancers have highlighted the role of epigenetic variability as a driving force in tumor evolution. Epigenetic variability may underlie the heterogeneity and phenotypic plasticity of DIPG cells and allow for the selection of cellular traits that promote survival and resistance to therapy. We have recently formalized a novel framework for analyzing variability of DNA methylation directly from whole-genome bisulfite sequencing data, allowing computation of DNA methylation entropy at precise genomic locations. Using these methods, we have shown that DIPG exhibits a markedly disordered epigenome, with increased stochasticity of DNA methylation localizing to specific regulatory elements and genes. We evaluate the responsiveness of the DIPG epigenetic landscape to pharmacologic modulation in order to modify proliferation, differentiation state, and immune signaling in DIPG cells.


2019 ◽  
Author(s):  
Clement Goubert ◽  
Jainy Thomas ◽  
Lindsay M. Payer ◽  
Jeffrey M. Kidd ◽  
Julie Feusier ◽  
...  

ABSTRACTAlu retrotransposons account for more than 10% of the human genome, and insertions of these elements create structural variants segregating in human populations. Such polymorphic Alu are powerful markers to understand population structure, and they represent variants that can greatly impact genome function, including gene expression. Accurate genotyping of Alu and other mobile elements has been challenging. Indeed, we found that Alu genotypes previously called for the 1000 Genomes Project are sometimes erroneous, which poses significant problems for phasing these insertions with other variants that comprise the haplotype. To ameliorate this issue, we introduce a new pipeline -- TypeTE -- which genotypes Alu insertions from whole-genome sequencing data. Starting from a list of polymorphic Alus, TypeTE identifies the hallmarks (poly-A tail and target site duplication) and orientation of Alu insertions using local re-assembly to reconstruct presence and absence alleles. Genotype likelihoods are then computed after re-mapping sequencing reads to the reconstructed alleles. Using a ‘gold standard’ set of PCR-based genotyping of >200 loci, we show that TypeTE improves genotype accuracy from 83% to 92% in the 1000 Genomes dataset. TypeTE can be readily adapted to other retrotransposon families and brings a valuable toolbox addition for population genomics.


2021 ◽  
Author(s):  
Saeideh Ashouri ◽  
Jing Hao Wong ◽  
Hidewaki Nakagawa ◽  
Mihoko Shimada ◽  
Katsushi Tokunaga ◽  
...  

Abstract Intermediate-sized insertions are one of the structural variants contributing to genome diversity. However, due to technical difficulties in identifying them, their importance in disease pathogenicity and gene expression regulation remains unclear. We used whole-genome sequencing data of 174 Japanese samples to characterize intermediate-sized insertions using a highly-accurate insertion calling method (IMSindel software and joint-call recovery) and obtained a catalogue of 4,254 insertions. We constructed an imputation panel comprising of insertions and SNVs from all samples, and conducted imputation of intermediate-sized insertions for 82 publicly-available Japanese samples. Imputation accuracy, evaluated using Nanopore long-read sequencing data, was 97%. Subsequent eQTL analysis predicted 128 (~ 3.0%) insertions as causative for gene expression level changes. Enrichment analysis of causal insertions for genome regulatory elements showed significant associations with CTCF-binding sites, super-enhancers, and promoters. Among 17 causal insertions found in the same causal set with GWAS hits, there were insertions associated with changes in expression of cancer-related genes such as BRCA1, ZNF222, and ABCB10. Analysis of insertions sequences revealed that 461 insertions were short tandem duplications frequently found in early replicating regions of genome. Furthermore, comparison of functional importance of intermediate-sized insertions with that of intermediate-sized deletions detected in the same sample set in our previous study showed that insertions were more frequent in genic regions, and proportion of functional candidates was smaller in insertions. Here, we characterize a high-confidence set of intermediate-sized insertions and indicate their importance in gene expression regulation. Our results emphasize the importance of considering intermediate-sized insertions in trait association studies.


2020 ◽  
Vol 48 (6) ◽  
pp. e36-e36 ◽  
Author(s):  
Clément Goubert ◽  
Jainy Thomas ◽  
Lindsay M Payer ◽  
Jeffrey M Kidd ◽  
Julie Feusier ◽  
...  

Abstract Alu retrotransposons account for more than 10% of the human genome, and insertions of these elements create structural variants segregating in human populations. Such polymorphic Alus are powerful markers to understand population structure, and they represent variants that can greatly impact genome function, including gene expression. Accurate genotyping of Alus and other mobile elements has been challenging. Indeed, we found that Alu genotypes previously called for the 1000 Genomes Project are sometimes erroneous, which poses significant problems for phasing these insertions with other variants that comprise the haplotype. To ameliorate this issue, we introduce a new pipeline – TypeTE – which genotypes Alu insertions from whole-genome sequencing data. Starting from a list of polymorphic Alus, TypeTE identifies the hallmarks (poly-A tail and target site duplication) and orientation of Alu insertions using local re-assembly to reconstruct presence and absence alleles. Genotype likelihoods are then computed after re-mapping sequencing reads to the reconstructed alleles. Using a high-quality set of PCR-based genotyping of >200 loci, we show that TypeTE improves genotype accuracy from 83% to 92% in the 1000 Genomes dataset. TypeTE can be readily adapted to other retrotransposon families and brings a valuable toolbox addition for population genomics.


2019 ◽  
Vol 35 (20) ◽  
pp. 3931-3936 ◽  
Author(s):  
Xin Huang ◽  
Xudong Gao ◽  
Wanying Li ◽  
Shuai Jiang ◽  
Ruijiang Li ◽  
...  

Abstract Motivation During development of the mammalian embryo, histone modification H3K4me3 plays an important role in regulating gene expression and exhibits extensive reprograming on the parental genomes. In addition to these dramatic epigenetic changes, certain unchanging regulatory elements are also essential for embryonic development. Results Using large-scale H3K4me3 chromatin immunoprecipitation sequencing data, we identified a form of H3K4me3 that was present during all eight stages of the mouse embryo before implantation. This ‘stable H3K4me3’ was highly accessible and much longer than normal H3K4me3. Moreover, most of the stable H3K4me3 was in the promoter region and was enriched in higher chromatin architecture. Using in-depth analysis, we demonstrated that stable H3K4me3 was related to higher gene expression levels and transcriptional initiation during embryonic development. Furthermore, stable H3K4me3 was much more active in blood tumor cells than in normal blood cells, suggesting a potential mechanism of cancer progression. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Abdallah M. Eteleeb ◽  
David A. Quigley ◽  
Shuang G. Zhao ◽  
Duy Pham ◽  
Rendong Yang ◽  
...  

Abstract Whole genome sequencing (WGS) has enabled the discovery of genomic structural variants (SVs), including those targeting intergenic and intronic non-coding regions that eluded previous exome focused strategies. However, the field currently lacks an automated tool that analyzes SV candidates to identify recurrent SVs and their targeted sites (hotspot regions), visualizes these genomic events within the context of various functional elements, and evaluates their potential effect on gene expression. To address this, we developed SV-HotSpot, an automated tool that integrates SV candidates, copy number alterations, gene expression, and genome annotations (e.g. gene and regulatory elements) to discover, annotate, and visualize recurrent SVs and their targeted hotspot regions that may affect gene expression. We applied SV-HotSpot to WGS and matched transcriptome data from metastatic castration resistant prostate cancer patients and rediscovered recurrent SVs targeting coding and non-coding functional elements known to promote prostate cancer progression and metastasis. SV-HotSpot provides a valuable resource to integrate SVs, gene expression, and genome annotations for discovering biologically relevant SVs altering coding and non-coding genome. SV-HotSpot is available at https://github.com/ChrisMaherLab/SV-HotSpot.


2020 ◽  
Author(s):  
Cody J. Steely ◽  
Kristi L. Russell ◽  
Julie E. Feusier ◽  
Yi Qiao ◽  
Gabor Marth ◽  
...  

AbstractWhile mobile elements are largely inactive in healthy somatic tissues, increased activity has been found in cancer tissues, with significant variation among different cancer types. In addition to insertion events, mobile elements have also been found to mediate many structural variation events in the genome. Here, to better understand the timing and impact of mobile element insertions and mobile element-mediated structural variants in cancer, we examined their activity in longitudinal samples of four metastatic breast cancer patients. With whole-genome sequencing data from multiple timepoints through tumor treatment and progression, we used mobile element detection software followed by visual confirmation of the insertions. From this analysis we identified 11 mobile element insertions or mobile element-mediated structural variants, and found that the majority (nine of the eleven) of these occurred early in tumor progression. Two of the identified insertions were SVA elements, which have not been examined in previous cancer studies. Most of the variants appear to impact intergenic regions; however, we identified a mobile element-mediated translocation in MAP2K4 and a mobile element-mediated deletion in YTHDF2 that likely inactivate reported tumor suppressor genes. MAP2K4 is part of the JNK signaling pathway, influencing cell growth and proliferation. The high variant allele frequency of this translocation and the important function of MAP2K4 indicate that this mobile element-mediated translocation is likely a driver mutation. Overall, using a unique longitudinal dataset, we find that most variants are likely passenger mutations in the four patients we examined, but some variants impact tumor progression.


2016 ◽  
Author(s):  
Caleb A. Lareau ◽  
Martin J. Aryee

ABSTRACTThe three-dimensional architecture of DNA within the nucleus is a key determinant of interactions between genes, regulatory elements, and transcriptional machinery. As a result, differences in loop structure are associated with differences in gene expression and cell state. Here, we introduce diffloop, an R/Bioconductor package for identifying differential DNA looping between samples. The package additionally provides a suite of functions for the quality control, statistical testing, annotation and visualization of DNA loops. We demonstrate this functionality by detecting differences in DNA loops between ENCODE ChIA-PET datasets and relate looping to differences in epigenetic state and gene expression.


2017 ◽  
Author(s):  
Yiqun Zhang ◽  
Fengju Chen ◽  
Nuno A. Fonseca ◽  
Yao He ◽  
Masashi Fujita ◽  
...  

AbstractUsing a dataset of somatic Structural Variants (SVs) in cancers from 2658 patients—1220 with corresponding gene expression data—we identified hundreds of genes for which the nearby presence (within 100kb) of an SV breakpoint was associated with altered expression. For the vast majority of these genes, expression was increased rather than decreased with corresponding SV event. Well-known up-regulated cancer-associated genes impacted by this phenomenon included TERT, MDM2, CDK4, ERBB2, CD274, PDCD1LG2, and IGF2. SVs upstream of TERT involved ~3% of cancer cases and were most frequent in liver-biliary, melanoma, sarcoma, stomach, and kidney cancers. SVs associated with up-regulation of PD1 and PDL1 genes involved ~1% of non-amplified cases. For many genes, SVs were significantly associated with either increased numbers or greater proximity of enhancer regulatory elements near the gene. DNA methylation near the gene promoter was often increased with nearby SV breakpoint, which may involve inactivation of repressor elements.AbbreviationsPCAWGthe Pan-Cancer Analysis of Whole Genomes projectSVStructural Variant


Sign in / Sign up

Export Citation Format

Share Document