scholarly journals Probabilities of Fitness Consequences for Point Mutations Across the Human Genome

2014 ◽  
Author(s):  
Brad Gulko ◽  
Ilan Gronau ◽  
Melissa J Hubisz ◽  
Adam Siepel

We describe a novel computational method for estimating the probability that a point mutation at each position in a genome will influence fitness. These fitness consequence (fitCons) scores serve as evolution-based measures of potential genomic function. Our approach is to cluster genomic positions into groups exhibiting distinct "fingerprints" based on high-throughput functional genomic data, then to estimate a probability of fitness consequences for each group from associated patterns of genetic polymorphism and divergence. We have generated fitCons scores for three human cell types based on public data from ENCODE. Compared with conventional conservation scores, fitCons scores show considerably improved prediction power for cis-regulatory elements. In addition, fitCons scores indicate that 4.2-7.5% of nucleotides in the human genome have influenced fitness since the human-chimpanzee divergence, and, in contrast to several recent studies, they suggest that recent evolutionary turnover has had alimited impact on the functional content of the genome.


2021 ◽  
Author(s):  
Dominik Burri ◽  
Mihaela Zavolan

During pre-mRNA maturation 3' end processing can occur at different polyadenylation sites in the 3' untranslated region (3' UTR) to give rise to transcript isoforms that differ in the length of their 3' UTRs. Longer 3' UTRs contain additional cis-regulatory elements that impact the fate of the transcript and/or of the resulting protein. Extensive alternative polyadenylation (APA) has been observed in cancers, but the mechanisms and roles remain elusive. In particular, it is unclear whether the APA occurs in the malignant cells or in other cell types that infiltrate the tumor. To resolve this, we developed a computational method, called SCUREL, that quantifies changes in 3' UTR length between groups of cells, including cells of the same type originating from tumor and control tissue. We used this method to study APA in human lung adenocarcinoma (LUAD). SCUREL relies solely on annotated 3' UTRs and on control systems, such as T cell activation and spermatogenesis gives qualitatively similar results at much greater sensitivity compared to the previously published scAPA method. In the LUAD samples, we find a general trend towards 3' UTR shortening not only in cancer cells compared to the cell type of origin, but also when comparing other cell types from the tumor vs. the control tissue environment. However, we also find high variability in the individual targets between patients. The findings help to understand the extent and impact of APA in LUAD, which may support improvements in diagnosis and treatment.



2020 ◽  
Author(s):  
Yupeng Wang ◽  
Rosario Jaime-Lara ◽  
Abhrarup Roy ◽  
Ying Sun ◽  
Xinyue Liu ◽  
...  

Abstract ObjectiveComputational identification of cell type-specific regulatory elements on a genome-wide scale is very challenging.ResultsWe propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of “strong enhancer” chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, sequential k-mer (k=5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers including gkm-SVM and DanQ, with regard to distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL is able to directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified according to their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL.



2019 ◽  
Author(s):  
Guoliang Li ◽  
Tongkai Sun ◽  
Huidan Chang ◽  
Liuyang Cai ◽  
Ping Hong ◽  
...  

AbstractUnderstanding chromatin interactions is important since they create chromosome conformation and link the cis- and trans-regulatory elements to their target genes for transcriptional regulation. Chromatin Interaction Analysis with Paired-End Tag (ChIA-PET) sequencing is a genome-wide high-throughput technology that detects chromatin interactions associated with a specific protein of interest. Previously we developed ChIA-PET Tool in 2010 for ChIA-PET data analysis. Here we present the updated version of ChIA-PET Tool (V3), is a computational package to process the next-generation sequence data generated from ChIA-PET experiments. It processes the short-read data and long-read ChIA-PET data with multithreading and generates the statistics of results in a HTML file. In this paper, we provide a detailed demonstration of the design of ChIA-PET Tool V3 and how to install it and analyze a specific ChIA-PET data set with it. At present, other ChIA-PET data analysis tools have developed including ChiaSig, MICC, Mango and ChIA-PET2 and so on. We compared our tool with other tools using the same public data set in the same machine. Most of peaks detected by ChIA-PET Tool V3 overlap with those from other tools. There is higher enrichment for significant chromatin interactions of ChIA-PET Tool V3 in APA plot. ChIA-PET Tool V3 is open source and is available at GitHub (https://github.com/GuoliangLi-HZAU/ChIA-PET_Tool_V3/).



2019 ◽  
Author(s):  
Florian Schmidt ◽  
Alexander Marx ◽  
Marie Hebel ◽  
Martin Wegner ◽  
Nina Baumgarten ◽  
...  

AbstractUnderstanding the complexity of transcriptional regulation is a major goal of computational biology. Because experimental linkage of regulatory sites to genes is challenging, computational methods considering epigenomics data have been proposed to create tissue-specific regulatory maps. However, we showed that these approaches are not well suited to account for the variations of the regulatory landscape between cell-types. To overcome these drawbacks, we developed a new method called STITCHIT, that identifies and links putative regulatory sites to genes. Within STITCHIT, we consider the chromatin accessibility signal of all samples jointly to identify regions exhibiting a signal variation related to the expression of a distinct gene. STITCHIToutperforms previous approaches in various validation experiments and was used with a genome-wide CRISPR-Cas9 screen to prioritize novel doxorubicin-resistance genes and their associated non-coding regulatory regions. We believe that our work paves the way for a more refined understanding of transcriptional regulation at the gene-level.



2015 ◽  
Vol 47 (3) ◽  
pp. 276-283 ◽  
Author(s):  
Brad Gulko ◽  
Melissa J Hubisz ◽  
Ilan Gronau ◽  
Adam Siepel


2014 ◽  
Vol 43 (4) ◽  
pp. e27-e27 ◽  
Author(s):  
Aurélien Griffon ◽  
Quentin Barbier ◽  
Jordi Dalino ◽  
Jacques van Helden ◽  
Salvatore Spicuglia ◽  
...  

Abstract The large collections of ChIP-seq data rapidly accumulating in public data warehouses provide genome-wide binding site maps for hundreds of transcription factors (TFs). However, the extent of the regulatory occupancy space in the human genome has not yet been fully apprehended by integrating public ChIP-seq data sets and combining it with ENCODE TFs map. To enable genome-wide identification of regulatory elements we have collected, analysed and retained 395 available ChIP-seq data sets merged with ENCODE peaks covering a total of 237 TFs. This enhanced repertoire complements and refines current genome-wide occupancy maps by increasing the human genome regulatory search space by 14% compared to ENCODE alone, and also increases the complexity of the regulatory dictionary. As a direct application we used this unified binding repertoire to annotate variant enhancer loci (VELs) from H3K4me1 mark in two cancer cell lines (MCF-7, CRC) and observed enrichments of specific TFs involved in biological key functions to cancer development and proliferation. Those enrichments of TFs within VELs provide a direct annotation of non-coding regions detected in cancer genomes. Finally, full access to this catalogue is available online together with the TFs enrichment analysis tool (http://tagc.univ-mrs.fr/remap/).



2014 ◽  
Author(s):  
Sofie Demeyer ◽  
Tom Michoel

Transcriptional regulation of gene expression is one of the main processes that affect cell diversification from a single set of genes. Regulatory proteins often interact with DNA regions located distally from the transcription start sites (TSS) of the genes. We developed a computational method that combines open chromatin and gene expression information for a large number of cell types to identify these distal regulatory elements. Our method builds correlation graphs for publicly available DNase-seq and exon array datasets with matching samples and uses graph-based methods to filter findings supported by multiple datasets and remove indirect interactions. The resulting set of interactions was validated with both anecdotal information of known long-range interactions and unbiased experimental data deduced from Hi-C and CAGE experiments. Our results provide a novel set of high-confidence candidate open chromatin regions involved in gene regulation, often located several Mb away from the TSS of their target gene.



2018 ◽  
Author(s):  
Naresh Doni Jayavelu ◽  
Ajay Jajodia ◽  
Arpit Mishra ◽  
R. David Hawkins

ABSTRACTThe study of gene regulation is dominated by a focus on the control of gene activation or controlling an increase in the level of expression. Just as critical is the process of gene repression or silencing. Chromatin signatures have allowed for the global mapping of enhancer cis-regulatory elements, however, the identification of silencer elements by computational or experimental approaches in a genome-wide manner are lacking. We present a simple but powerful computational approach to identify putative silencers genome-wide. We used a series of consortia data to predict silencers in over 100 human and mouse cell or tissue types. We performed several analyses to determine if these elements exhibited characteristics expected of a silencers. Motif enrichment analyses on putative silencers determined that motifs belonging to known transcriptional repressors are enriched, as well as overlapping known transcription repressor binding sites. Leveraging promoter capture HiC data from several human and mouse cell types, we found that over 50% of putative silencer elements are interacting with gene promoters having very low to no expression. Next, to validate our silencer predictions, we quantified silencer activity using massively parallel reporter assays (MPRAs) on 7500 selected elements in K562 cells. We trained a support vector machine model classifier on MPRA data and used it to refine potential silencers in other cell types. We also show that similar to enhancer elements, silencer elements are enriched in disease-associated variants. Our results suggest a general strategy for genome-wide identification and characterization of silencer elements.



2014 ◽  
Author(s):  
Maxwell W Libbrecht ◽  
Ferhat Ay ◽  
Michael M Hoffman ◽  
David M Gilbert ◽  
Jeffrey A Bilmes ◽  
...  

The genomic neighborhood of a gene influences its activity, a behavior that is attributable in part to domain-scale regulation, in which regions of hundreds or thousands of kilobases known as domains are regulated as a unit. Previous studies using genomics assays such as chromatin immunoprecipitation (ChIP)-seq and chromatin conformation capture (3C)-based assays have identified many types of regulatory domains. However, due to the difficulty of integrating genomics data sets, the relationships among these domain types are poorly understood. Semi-automated genome annotation (SAGA) algorithms facilitate human interpretation of heterogeneous collections of genomics data by simultaneously partitioning the human genome and assigning labels to the resulting genomic segments. However, existing SAGA methods can incorporate only data sets that can be expressed as a one-dimensional vector over the genome and therefore cannot integrate inherently pairwise chromatin conformation data. We developed a new computational method, called graph-based regularization (GBR), for expressing a pairwise prior that encourages certain pairs of genomic loci to receive the same label in a genome annotation. We used GBR to exploit chromatin conformation information during genome annotation by encouraging positions that are close in 3D to occupy the same type of domain. Using this approach, we produced a comprehensive model of chromatin domains in eight human cell types, thereby revealing the relationships among known domain types. Through this model, we identified clusters of tightly-regulated genes expressed in only a small number of cell types, which we term "specific expression domains." We additionally found that a subset of domain boundaries marked by promoters and CTCF motifs are consistent between cell types even when domain activity changes. Finally, we showed that GBR can be used for the seemingly unrelated task of transferring information from well-studied cell types to less well characterized cell types during genome annotation, making it possible to produce high-quality annotations of the hundreds of cell types with limited available data.



2016 ◽  
Author(s):  
Hui Zhang ◽  
Feifei Li ◽  
Yan Jia ◽  
Bingxiang Xu ◽  
Yiqun Zhang ◽  
...  

AbstractHigh-throughput chromosome conformation capture technologies, such as Hi-C, have made it possible to survey 3D genome structure. However, the ability to obtain 3D profiles at kilobase resolution at low cost remains a major challenge. Therefore, we herein report a computational method to precisely identify chromatin interaction sites at kilobase resolution from MNase-seq data, termed chromatin interaction site detector (CISD), and a CISD-based chromatin loop predictor (CISD_loop) that predicts chromatin-chromatin interaction (CCI) from low-resolution Hi-C data. The methods are built on a hypothesis that CCIs result in a characteristic nucleosome arrangement pattern flanking the interaction sites. Accordingly, we show that the predictions of CISD and CISD_loop overlap closely with chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) anchors and loops, respectively. Moreover, the methods trained in one cell type can be applied to other cell types with high accuracy. The validity of the methods was further supported by chromosome conformation capture (3C) experiments at 5kb resolution. Finally, we demonstrate that only modest amounts of MNase-seq and Hi-C data are sufficient to achieve ultrahigh resolution CCI map. The predictive power of CISD/CISD_loop supports the hypothesis that CCIs induce local nucleosome rearrangement and that the pattern may serve as probes for 3D dynamics of the genome. Thus, our method will facilitate precise and systematic investigations of the interactions between distal regulatory elements on a larger scale than hitherto have been possible.



Sign in / Sign up

Export Citation Format

Share Document