scholarly journals The landscape and driver potential of site-specific hotspots across cancer genomes

2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Randi Istrup Juul ◽  
Morten Muhlig Nielsen ◽  
Malene Juul ◽  
Lars Feuerbach ◽  
Jakob Skou Pedersen

AbstractLarge sets of whole cancer genomes make it possible to study mutation hotspots genome-wide. Here we detect, categorize, and characterize site-specific hotspots using 2279 whole cancer genomes from the Pan-Cancer Analysis of Whole Genomes project and provide a resource of annotated hotspots genome-wide. We investigate the excess of hotspots in both protein-coding and gene regulatory regions and develop measures of positive selection and functional impact for individual hotspots. Using cancer allele fractions, expression aberrations, mutational signatures, and a variety of genomic features, such as potential gain or loss of transcription factor binding sites, we annotate and prioritize all highly mutated hotspots. Genome-wide we find more high-frequency SNV and indel hotspots than expected given mutational background models. Protein-coding regions are generally enriched for SNV hotspots compared to other regions. Gene regulatory hotspots show enrichment of potential same-patient second-hit missense mutations, consistent with enrichment of hotspot driver mutations compared to singletons. For protein-coding regions, splice-sites, promoters, and enhancers, we see an excess of hotspots associated with cancer genes. Interestingly, missense hotspot mutations in tumor suppressors are associated with elevated expression, suggesting localized amino-acid changes with functional impact. For individual non-coding hotspots, only a small number show clear signs of positive selection, including known sites in the TERT promoter and the 5’ UTR of TP53. Most of the new candidates have few mutations and limited driver evidence. However, a hotspot in an enhancer of the oncogene POU2AF1, which may create a transcription factor binding site, presents multiple lines of driver-consistent evidence.

2016 ◽  
Author(s):  
Esben Eickhardt ◽  
Thomas Damm Als ◽  
Jakob Grove ◽  
Anders Dupont Boerglum ◽  
Francesco Lescai

AbstractBackgroundVariants in transcription factor binding sites (TFBSs) may have important regulatory effects, as they have the potential to alter transcription factor (TF) binding affinities and thereby affecting gene expression. With recent advances in sequencing technologies the number of variants identified in TFBSs has increased, hence understanding their role is of significant interest when interpreting next generation sequencing data. Current methods have two major limitations: they are limited to predicting the functional impact of single nucleotide variants (SNVs) and often rely on additional experimental data, laborious and expensive to acquire. We propose a purely bioinformatic method that addresses these two limitations while providing comparable results.ResultsOur method uses position weight matrices and a sliding window approach, in order to account for the sequence context of variants, and scores the consequences of both SNVs and INDELs in TFBSs. We tested the accuracy of our method in two different ways. Firstly, we compared it to a recent method based on DNase I hypersensitive sites sequencing (DHS-seq) data designed to predict the effects of SNVs: we found a significant correlation of our score both with their DHS-seq data and their prediction model. Secondly, we called INDELs on publicly available DHS-seq data from ENCODE, and found our score to represent well the experimental data. We concluded that our method is reliable and we used it to describe the landscape of variation in TFBSs in the human genome, by scoring all variants in the 1000 Genomes Project Phase 3. Surprisingly, we found that most insertions have neutral effects on binding sites, while deletions, as expected, were found to have the most severe TFBS-scores. We identified four categories of variants based on their TFBS-scores and tested them for enrichment of variants classified as pathogenic, benign and protective in ClinVar: we found that the variants with the most negative TFBS-scores have the most significant enrichment for pathogenic variants.ConclusionsOur method addresses key shortcomings of currently available bioinformatic tools in predicting the effects of INDELs in TFBSs, and provides an unprecedented window into the genome-wide landscape of INDELs, their predicted influences on TF binding, and potential relevance for human diseases. We thus offer an additional tool to help prioritising non-coding variants in sequencing studies.


2020 ◽  
Vol 36 (9) ◽  
pp. 2936-2937 ◽  
Author(s):  
Gareth Peat ◽  
William Jones ◽  
Michael Nuhn ◽  
José Carlos Marugán ◽  
William Newell ◽  
...  

Abstract Motivation Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. Results We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. Availability and implementation The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Sarah E. Pierce ◽  
Jeffrey M. Granja ◽  
William J. Greenleaf

AbstractChromatin accessibility profiling can identify putative regulatory regions genome wide; however, pooled single-cell methods for assessing the effects of regulatory perturbations on accessibility are limited. Here, we report a modified droplet-based single-cell ATAC-seq protocol for perturbing and evaluating dynamic single-cell epigenetic states. This method (Spear-ATAC) enables simultaneous read-out of chromatin accessibility profiles and integrated sgRNA spacer sequences from thousands of individual cells at once. Spear-ATAC profiling of 104,592 cells representing 414 sgRNA knock-down populations reveals the temporal dynamics of epigenetic responses to regulatory perturbations in cancer cells and the associations between transcription factor binding profiles.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Tejaswi Iyyanki ◽  
Baozhen Zhang ◽  
Qixuan Wang ◽  
Ye Hou ◽  
Qiushi Jin ◽  
...  

Abstract Muscle-invasive bladder cancers are characterized by their distinct expression of luminal and basal genes, which could be used to predict key clinical features such as disease progression and overall survival. Transcriptionally, FOXA1, GATA3, and PPARG are shown to be essential for luminal subtype-specific gene regulation and subtype switching, while TP63, STAT3, and TFAP2 family members are critical for regulation of basal subtype-specific genes. Despite these advances, the underlying epigenetic mechanisms and 3D chromatin architecture responsible for subtype-specific regulation in bladder cancer remain unknown. Result We determine the genome-wide transcriptome, enhancer landscape, and transcription factor binding profiles of FOXA1 and GATA3 in luminal and basal subtypes of bladder cancer. Furthermore, we report the first-ever mapping of genome-wide chromatin interactions by Hi-C in both bladder cancer cell lines and primary patient tumors. We show that subtype-specific transcription is accompanied by specific open chromatin and epigenomic marks, at least partially driven by distinct transcription factor binding at distal enhancers of luminal and basal bladder cancers. Finally, we identify a novel clinically relevant transcription factor, Neuronal PAS Domain Protein 2 (NPAS2), in luminal bladder cancers that regulates other subtype-specific genes and influences cancer cell proliferation and migration. Conclusion In summary, our work identifies unique epigenomic signatures and 3D genome structures in luminal and basal urinary bladder cancers and suggests a novel link between the circadian transcription factor NPAS2 and a clinical bladder cancer subtype.


PLoS ONE ◽  
2009 ◽  
Vol 4 (10) ◽  
pp. e7526 ◽  
Author(s):  
Alfredo Mendoza-Vargas ◽  
Leticia Olvera ◽  
Maricela Olvera ◽  
Ricardo Grande ◽  
Leticia Vega-Alvarado ◽  
...  

Circulation ◽  
2020 ◽  
Vol 142 (Suppl_3) ◽  
Author(s):  
Anthony M Gacita ◽  
Dominic Fullenkamp ◽  
Joyce C Ohiri ◽  
Tess Pottinger ◽  
Megan Puckelwartz ◽  
...  

Introduction: Inherited cardiomyopathy is caused by mutations in more than 100 genes. A well-recognized clinical feature of genetic cardiomyopathy is varying phenotypic expression. Even with identical primary mutations, there is a range of clinical outcomes. Genetic variants in protein coding regions have been shown to alter the phenotypic expression of primary cardiomyopathy-causing mutations. However, the contribution of noncoding variation has been less well studied. Methods and Results: We used an integrative analysis of >20 publicly-available heart enhancer function and enhancer target datasets to identify genomic regions predicted to regulate the cardiomyopathy genes, MYH7 and LMNA . We identified two candidate enhancer clusters around the MYH7 gene and three clusters around the LMNA gene. We tested enhancers in these clusters using reporter assays and CRISPr-mediated deletion in human cardiomyocytes derived from induced pluripotent stem cells (iCMs). We identified a super enhancer upstream of MYH7 that is necessary for high MYH7 expression in iCMs. These regulatory regions contained sequence variants within transcription factor binding sites that altered enhancer function. We created an informatic pipeline that extended this strategy genomewide to identify an additional enhancer modifying variant upstream of MYH7 . This variant disrupts a transcription factor binding site upstream of MYH7 and limits MYH7 upregulation. We extended these analyses by examining clinical correlates, finding that this variant correlated with a more dilated left ventricle over time in patients with cardiomyopathy. Conclusions: We identified two enhancer regions important for MYH7 expression in iCMs. These enhancer regions may be utilized to induce MYH7 during human development and heart failure. MYH7 changes in heart failure have been linked to cardiomyopathy phenotypes. The variant upstream of MYH7 likely alters these changes and results in a more severe phenotype. These findings demonstrate that noncoding variants have clinical utility and targeted assessment of noncoding modifiers may become integrated into clinical care.


2019 ◽  
Author(s):  
Olivera Grujic ◽  
Tanya N. Phung ◽  
Soo Bin Kwon ◽  
Adriana Arneson ◽  
Yuju Lee ◽  
...  

AbstractAnnotations of evolutionarily constraint provide important information for variant prioritization. Genome-wide maps of epigenomic marks and transcription factor binding provide complementary information for interpreting a subset of such prioritized variants. Here we developed the Constrained Non-Exonic Predictor (CNEP) to quantify the evidence of each base in the human genome being in a constrained non-exonic element from over 60,000 epigenomic and transcription factor binding features. We find that the CNEP score outperforms baseline and related existing scores at predicting constrained non-exonic bases from such data. However, a subset of such bases are still not well predicted by CNEP. We developed a complementary Conservation Signature Score by CNEP (CSS-CNEP) using conservation state and constrained element annotations that is predictive of those bases. Using human genetic variation, regulatory sequence motifs, mouse epigenomic data, and retrospectively considered additional human data we further characterize the nature of constrained non-exonic bases with low CNEP scores.


2020 ◽  
Author(s):  
Jinrong Huang ◽  
Lin Lin ◽  
Zhanying Dong ◽  
Ling Yang ◽  
Tianyu Zheng ◽  
...  

Abstract Adenosine-to-inosine (A-to-I) RNA editing, catalyzed by ADAR enzymes, is an essential post-transcriptional modification. Although hundreds of thousands of RNA editing sites have been reported in mammals, brain-wide analysis of the RNA editing in the mammalian brain remains rare. Here, a genome-wide RNA editing investigation is performed in 119 samples, representing 30 anatomically defined subregions in the pig brain. We identify a total of 682,037 A-to-I RNA editing sites of which 97% are not identified before. Within the pig brain, cerebellum and olfactory bulb are regions with most edited transcripts. The editing level of sites residing in protein-coding regions are similar across brain regions, whereas region-distinct editing is observed in repetitive sequences. Highly edited conserved recoding events in pig and human brain are found in neurotransmitter receptors, demonstrating the evolutionary importance of RNA editing in neurotransmission functions. The porcine brain-wide RNA landscape provides a rich resource to better understand the evolutionally importance of post-transcriptional RNA editing.


2018 ◽  
Vol 50 (10) ◽  
pp. 1483-1493 ◽  
Author(s):  
Yakir A. Reshef ◽  
Hilary K. Finucane ◽  
David R. Kelley ◽  
Alexander Gusev ◽  
Dylan Kotliar ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document