scholarly journals EpiSAFARI: Sensitive detection of valleys in epigenetic signals for enhancing annotations of functional elements

2019 ◽  
Author(s):  
Arif Harmanci ◽  
Akdes Serin Harmanci ◽  
Jyothishmathi Swaminathan ◽  
Vidya Gopalakrishnan

AbstractThe genomewide signal profiles from functional genomics experiments are dense information sources for annotating the regulatory elements. These profiles measure epigenetic activity at the nucleotide resolution and they exhibit distinct patterns along the genome. Most notable of these patterns are the valley patterns that are prevalently observed in many epigenetic assays such as ChIP-Seq and bisulfite sequencing. Valleys mark locations of cis-regulatory elements such as enhancers. Systematic identification of the valleys provides novel information for delineating the annotation of regulatory elements using epigenetic data. Nevertheless, the valleys are generally not reported by analysis pipelines. Here, we describe EpiSAFARI, a computational method for sensitive detection of valleys from diverse types of epigenetic profiles. EpiSAFARI employs a novel smoothing method for decreasing noise in signal profiles and accounts for technical factors such as sparse signals, mappability, and nucleotide content. In performance comparisons, EpiSAFARI performs favorably in terms of accuracy. The histone modification and DNA methylation valleys detected by EpiSAFARI exhibit high conservation, transcription factor binding, and they are enriched in nascent transcription. In addition, the large clusters of histone valleys are found to be enriched at the promoters of the developmentally associated genes.

2019 ◽  
Author(s):  
Arif Harmanci ◽  
Akdes Serin Harmanci ◽  
Jyothishmathi Swaminathan ◽  
Vidya Gopalakrishnan

Abstract Motivation Functional genomics experiments generate genomewide signal profiles that are dense information sources for annotating the regulatory elements. These profiles measure epigenetic activity at the nucleotide resolution and they exhibit distinctive patterns as they fluctuate along the genome. Most notable of these patterns are the valley patterns that are prevalently observed in assays such as ChIP Sequencing and bisulfite sequencing. The genomic positions of valleys pinpoint locations of cis-regulatory elements such as enhancers and insulators. Systematic identification of the valleys provides novel information for delineating the annotation of regulatory elements. Nevertheless, the valleys are not reported by majority of the analysis pipelines. Results We describe EpiSAFARI, a computational method for sensitive detection of valleys from diverse types of epigenetic profiles. EpiSAFARI employs a novel smoothing method for decreasing noise in signal profiles and accounts for technical factors such as sparse signals, mappability, and nucleotide content. In performance comparisons, EpiSAFARI performs favorably in terms of accuracy. The histone modification valleys detected by EpiSAFARI exhibit high conservation, transcription factor binding, and they are enriched in nascent transcription. In addition, the large clusters of histone valleys are found to be enriched at the promoters of the developmentally associated genes. Differential histone valleys exhibit concordance with differential DNase signal at cell line specific valleys. DNA methylation valleys exhibit elevated conservation and high transcription factor binding. Specifically, we observed enriched binding of transcription factors associated with chromatin structure around methyl-valleys. Availability EpiSAFARI is publicly available at https://github.com/harmancilab/EpiSAFARI Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Hyobin Jeong ◽  
Karen Grimes ◽  
Peter-Martin Bruch ◽  
Tobias Rausch ◽  
Patrick Hasenfeld ◽  
...  

Somatic structural variants (SVs) are widespread in cancer genomes, however, their impact on tumorigenesis and intra-tumour heterogeneity is incompletely understood, since methods to functionally characterize the broad spectrum of SVs arising in cancerous single-cells are lacking. We present a computational method, scNOVA, that couples SV discovery with nucleosome occupancy analysis by haplotype-resolved single-cell sequencing, to systematically uncover SV effects on cis-regulatory elements and gene activity. Application to leukemias and cell lines uncovered SV outcomes at several loci, including dysregulated cancer-related pathways and mono-allelic oncogene expression near SV breakpoints. At the intra-patient level, we identified different yet overlapping subclonal SVs that converge on aberrant Wnt signaling. We also deconvoluted the effects of catastrophic chromosomal rearrangements resulting in oncogenic transcription factor dysregulation. scNOVA directly links SVs to their functional consequences, opening the door for single-cell multiomics of SVs in heterogeneous cell populations.


2012 ◽  
Vol 2012 ◽  
pp. 1-8 ◽  
Author(s):  
Aniruddha Chatterjee ◽  
Euan J. Rodger ◽  
Peter A. Stockwell ◽  
Robert J. Weeks ◽  
Ian M. Morison

Reduced representation bisulfite sequencing (RRBS), which couples bisulfite conversion and next generation sequencing, is an innovative method that specifically enriches genomic regions with a high density of potential methylation sites and enables investigation of DNA methylation at single-nucleotide resolution. Recent advances in the Illumina DNA sample preparation protocol and sequencing technology have vastly improved sequencing throughput capacity. Although the new Illumina technology is now widely used, the unique challenges associated with multiplexed RRBS libraries on this platform have not been previously described. We have made modifications to the RRBS library preparation protocol to sequence multiplexed libraries on a single flow cell lane of the Illumina HiSeq 2000. Furthermore, our analysis incorporates a bioinformatics pipeline specifically designed to process bisulfite-converted sequencing reads and evaluate the output and quality of the sequencing data generated from the multiplexed libraries. We obtained an average of 42 million paired-end reads per sample for each flow-cell lane, with a high unique mapping efficiency to the reference human genome. Here we provide a roadmap of modifications, strategies, and trouble shooting approaches we implemented to optimize sequencing of multiplexed libraries on an a RRBS background.


2013 ◽  
Vol 368 (1632) ◽  
pp. 20130029 ◽  
Author(s):  
Harendra Guturu ◽  
Andrew C. Doxey ◽  
Aaron M. Wenger ◽  
Gill Bejerano

Mapping the DNA-binding preferences of transcription factor (TF) complexes is critical for deciphering the functions of cis -regulatory elements. Here, we developed a computational method that compares co-occurring motif spacings in conserved versus unconserved regions of the human genome to detect evolutionarily constrained binding sites of rigid TF complexes. Structural data were used to estimate TF complex physical plausibility, explore overlapping motif arrangements seldom tackled by non-structure-aware methods, and generate and analyse three-dimensional models of the predicted complexes bound to DNA. Using this approach, we predicted 422 physically realistic TF complex motifs at 18% false discovery rate, the majority of which (326, 77%) contain some sequence overlap between binding sites. The set of mostly novel complexes is enriched in known composite motifs, predictive of binding site configurations in TF–TF–DNA crystal structures, and supported by ChIP-seq datasets. Structural modelling revealed three cooperativity mechanisms: direct protein–protein interactions, potentially indirect interactions and ‘through-DNA’ interactions. Indeed, 38% of the predicted complexes were found to contain four or more bases in which TF pairs appear to synergize through overlapping binding to the same DNA base pairs in opposite grooves or strands. Our TF complex and associated binding site predictions are available as a web resource at http://bejerano.stanford.edu/complex .


2021 ◽  
Author(s):  
Dominik Burri ◽  
Mihaela Zavolan

During pre-mRNA maturation 3' end processing can occur at different polyadenylation sites in the 3' untranslated region (3' UTR) to give rise to transcript isoforms that differ in the length of their 3' UTRs. Longer 3' UTRs contain additional cis-regulatory elements that impact the fate of the transcript and/or of the resulting protein. Extensive alternative polyadenylation (APA) has been observed in cancers, but the mechanisms and roles remain elusive. In particular, it is unclear whether the APA occurs in the malignant cells or in other cell types that infiltrate the tumor. To resolve this, we developed a computational method, called SCUREL, that quantifies changes in 3' UTR length between groups of cells, including cells of the same type originating from tumor and control tissue. We used this method to study APA in human lung adenocarcinoma (LUAD). SCUREL relies solely on annotated 3' UTRs and on control systems, such as T cell activation and spermatogenesis gives qualitatively similar results at much greater sensitivity compared to the previously published scAPA method. In the LUAD samples, we find a general trend towards 3' UTR shortening not only in cancer cells compared to the cell type of origin, but also when comparing other cell types from the tumor vs. the control tissue environment. However, we also find high variability in the individual targets between patients. The findings help to understand the extent and impact of APA in LUAD, which may support improvements in diagnosis and treatment.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Sourya Bhattacharyya ◽  
Vivek Chandra ◽  
Pandurangan Vijayanand ◽  
Ferhat Ay

Abstract HiChIP/PLAC-seq is increasingly becoming popular for profiling 3D chromatin contacts among regulatory elements and for annotating functions of genetic variants. Here we describe FitHiChIP, a computational method for loop calling from HiChIP/PLAC-seq data, which jointly models the non-uniform coverage and genomic distance scaling of contact counts to compute statistical significance estimates. We also develop a technique to filter putative bystander loops that can be explained by stronger adjacent loops. Compared to existing methods, FitHiChIP performs better in recovering contacts reported by Hi-C, promoter capture Hi-C and ChIA-PET experiments and in capturing previously validated promoter-enhancer interactions. FitHiChIP loop calls are reproducible among replicates and are consistent across different experimental settings. Our work also provides a framework for differential HiChIP analysis with an option to utilize ChIP-seq data for further characterizing differential loops. Even though designed for HiChIP, FitHiChIP is also applicable to other conformation capture assays.


Sign in / Sign up

Export Citation Format

Share Document