scholarly journals F-Seq2: improving the feature density based peak caller with dynamic statistics

2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Nanxiang Zhao ◽  
Alan P Boyle

Abstract Genomic and epigenomic features are captured at a genome-wide level by using high-throughput sequencing (HTS) technologies. Peak calling delineates features identified in HTS experiments, such as open chromatin regions and transcription factor binding sites, by comparing the observed read distributions to a random expectation. Since its introduction, F-Seq has been widely used and shown to be the most sensitive and accurate peak caller for DNase I hypersensitive site (DNase-seq) data. However, the first release (F-Seq1) has two key limitations: lack of support for user-input control datasets, and poor test statistic reporting. These constrain its ability to capture systematic and experimental biases inherent to the background distributions in peak prediction, and to subsequently rank predicted peaks by confidence. To address these limitations, we present F-Seq2, which combines kernel density estimation and a dynamic ‘continuous’ Poisson test to account for local biases and accurately rank candidate peaks. The output of F-Seq2 is suitable for irreproducible discovery rate analysis as test statistics are calculated for individual candidate summits, allowing direct comparison of predictions across replicates. These improvements significantly boost the performance of F-Seq2 for ATAC-seq and ChIP-seq datasets, outperforming competing peak callers used by the ENCODE Consortium in terms of precision and recall.

2020 ◽  
Author(s):  
Nanxiang Zhao ◽  
Alan P. Boyle

ABSTRACTGenomic and epigenomic features are captured at a genome-wide level by using high-throughput sequencing technologies. Peak calling is one of the first essential steps in analyzing these features by delineating regions such as open chromatin regions and transcription factor binding sites. Our original peak calling software, F-Seq, has been widely used and shown to be the most sensitive and accurate peak caller for DNase I hypersensitive sites sequencing (DNase-seq) data. However, F-Seq lacks support for user-input control dataset nor reporting test statistics, limiting its ability to capture systematic and experimental biases and accurately estimate background distributions. Here we present an improved version, F-Seq2, which combined the power of kernel density estimation and a dynamic “continuous” Poisson distribution to robustly account for local biases and solve ties when ranking candidate peaks. In F-score and motif distance analysis, we demonstrated the superior performance of F-Seq2 than other competing peak callers used by the ENCODE Consortium on simulated and real ATAC-seq and ChIP-seq datasets. The output of F-Seq2 is suitable for irreproducible discovery rate (IDR) analysis as the test statistics calculated for individual candidate summit and ties are robustly solved.


BMC Genetics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Liping Guan ◽  
Ke Cao ◽  
Yong Li ◽  
Jian Guo ◽  
Qiang Xu ◽  
...  

Abstract Background Peach (Prunus persica L.) is a diploid species and model plant of the Rosaceae family. In the past decade, significant progress has been made in peach genetic research via DNA markers, but the number of these markers remains limited. Results In this study, we performed a genome-wide DNA markers detection based on sequencing data of six distantly related peach accessions. A total of 650,693~1,053,547 single nucleotide polymorphisms (SNPs), 114,227~178,968 small insertion/deletions (InDels), 8386~12,298 structure variants (SVs), 2111~2581 copy number variants (CNVs) and 229,357~346,940 simple sequence repeats (SSRs) were detected and annotated. To demonstrate the application of DNA markers, 944 SNPs were filtered for association study of fruit ripening time and 15 highly polymorphic SSRs were selected to analyze the genetic relationship among 221 accessions. Conclusions The results showed that the use of high-throughput sequencing to develop DNA markers is fast and effective. Comprehensive identification of DNA markers, including SVs and SSRs, would be of benefit to genetic diversity evaluation, genetic mapping, and molecular breeding of peach.


2014 ◽  
Vol 13s4 ◽  
pp. CIN.S13978
Author(s):  
Yen-Tsung Huang ◽  
Thomas Hsu ◽  
David C. Christiani

The effects of copy number alterations make up a significant part of the tumor genome profile, but pathway analyses of these alterations are still not well established. We proposed a novel method to analyze multiple copy numbers of genes within a pathway, termed Test for the Effect of a Gene Set with Copy Number data (TEGS-CN). TEGS-CN was adapted from TEGS, a method that we previously developed for gene expression data using a variance component score test. With additional development, we extend the method to analyze DNA copy number data, accounting for different sizes and thus various numbers of copy number probes in genes. The test statistic follows a mixture of X 2 distributions that can be obtained using permutation with scaled X 2 approximation. We conducted simulation studies to evaluate the size and the power of TEGS-CN and to compare its performance with TEGS. We analyzed a genome-wide copy number data from 264 patients of non-small-cell lung cancer. With the Molecular Signatures Database (MSigDB) pathway database, the genome-wide copy number data can be classified into 1814 biological pathways or gene sets. We investigated associations of the copy number profile of the 1814 gene sets with pack-years of cigarette smoking. Our analysis revealed five pathways with significant P values after Bonferroni adjustment (<2.8 x 10-5), including the PTEN pathway (7.8 x 10-7), the gene set up-regulated under heat shock (3.6 x 10-6), the gene sets involved in the immune profile for rejection of kidney transplantation (9.2 x 10-6) and for transcriptional control of leukocytes (2.2 x 10-5), and the ganglioside biosynthesis pathway (2.7 x 10-5). In conclusion, we present a new method for pathway analyses of copy number data, and causal mechanisms of the five pathways require further study.


2019 ◽  
Vol 70 (15) ◽  
pp. 3867-3879 ◽  
Author(s):  
Anneke Frerichs ◽  
Julia Engelhorn ◽  
Janine Altmüller ◽  
Jose Gutierrez-Marcos ◽  
Wolfgang Werr

Abstract Fluorescence-activated cell sorting (FACS) and assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) were combined to analyse the chromatin state of lateral organ founder cells (LOFCs) in the peripheral zone of the Arabidopsis apetala1-1 cauliflower-1 double mutant inflorescence meristem. On a genome-wide level, we observed a striking correlation between transposase hypersensitive sites (THSs) detected by ATAC-seq and DNase I hypersensitive sites (DHSs). The mostly expanded DHSs were often substructured into several individual THSs, which correlated with phylogenetically conserved DNA sequences or enhancer elements. Comparing chromatin accessibility with available RNA-seq data, THS change configuration was reflected by gene activation or repression and chromatin regions acquired or lost transposase accessibility in direct correlation with gene expression levels in LOFCs. This was most pronounced immediately upstream of the transcription start, where genome-wide THSs were abundant in a complementary pattern to established H3K4me3 activation or H3K27me3 repression marks. At this resolution, the combined application of FACS/ATAC-seq is widely applicable to detect chromatin changes during cell-type specification and facilitates the detection of regulatory elements in plant promoters.


2014 ◽  
Vol 32 (4_suppl) ◽  
pp. 464-464
Author(s):  
Thai Huu Ho ◽  
Jeong-Heon Lee ◽  
Rafael Nunez Nateras ◽  
Erik P. Castle ◽  
Melissa L. Stanton ◽  
...  

464 Background: Although the von Hippel-Lindau (VHL) tumor suppressor gene is mutated in 60% of ccRCC, deletion of VHL in mice is insufficient for tumorigenesis. Sequencing of ccRCC tumors identified mutations in SETD2, a histone H3 lysine 36 (H3K36) trimethyltransferase. We hypothesize that loss of SETD2 methyltransferase activity alters the genome wide pattern of H3K36 trimethylation (H3K36me3) in ccRCC, and contributes to the cancer phenotype. Methods: To generate a genome-wide profile of H3K36me3 in frozen nephrectomy samples and RCC cell lines, we optimized a chromatin immunoprecipitation (ChIP) protocol for the isolation of DNA associated with H3K36me3. H3K36me3 is associated with open chromatin and an H3K36me3-specific antibody was used for immunoprecipitation of endogenous H3K36me3-bound DNA. ChIP PCR primers were optimized for active genes, such as actin, glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and a “gene desert” on chromosome 12 (negative control). ChIP libraries were then generated from 3 paired uninvolved kidney and RCC and 2 RCC cell lines. In order to identify H3K36Me3 upregulated regions in uninvolved kidney and RCC, reads from the ChIP sequencing were mapped to the human genome using Burrows-Wheeler Aligner and SICER algorithms. Results: Using ChIP PCR, we found that active genomic regions were enriched 15-30 fold over the negative controls indicating that the quality and yield of immunoprecipitated DNA/chromatin complexes from frozen tissue was sufficient for ChIP sequencing. A preliminary ChIP sequencing analysis of RCC cell lines and frozen ccRCC tissue indicates that H3K36me3 enriched DNA sequences were mapped to exons (31.3%) compared to introns (13.5%, p<0.001), consistent with the role of H3K36me3 in transcription. Conclusions: Genomic regions enriched for H3K36Me3 binding were identified from patient-derived tissue and RCC cell lines. Current efforts are focused on comparing the H3K36me3 profiles between matched tumor and uninvolved kidney ChIP libraries to generate a genome wide map of dysregulated H3K36me3 modifications.


Thorax ◽  
2011 ◽  
Vol 67 (5) ◽  
pp. 385-391 ◽  
Author(s):  
Jared M Bischof ◽  
Christopher J Ott ◽  
Shih-Hsing Leir ◽  
Nehal Gosalia ◽  
Lingyun Song ◽  
...  

2017 ◽  
Vol 114 (40) ◽  
pp. E8362-E8371 ◽  
Author(s):  
Anna Vilborg ◽  
Niv Sabath ◽  
Yuval Wiesel ◽  
Jenny Nathans ◽  
Flonia Levy-Adam ◽  
...  

Transcription is a highly regulated process, and stress-induced changes in gene transcription have been shown to play a major role in stress responses and adaptation. Genome-wide studies reveal prevalent transcription beyond known protein-coding gene loci, generating a variety of RNA classes, most of unknown function. One such class, termed downstream of gene-containing transcripts (DoGs), was reported to result from transcriptional readthrough upon osmotic stress in human cells. However, how widespread the readthrough phenomenon is, and what its causes and consequences are, remain elusive. Here we present a genome-wide mapping of transcriptional readthrough, using nuclear RNA-Seq, comparing heat shock, osmotic stress, and oxidative stress in NIH 3T3 mouse fibroblast cells. We observe massive induction of transcriptional readthrough, both in levels and length, under all stress conditions, with significant, yet not complete, overlap of readthrough-induced loci between different conditions. Importantly, our analyses suggest that stress-induced transcriptional readthrough is not a random failure process, but is rather differentially induced across different conditions. We explore potential regulators and find a role for HSF1 in the induction of a subset of heat shock-induced readthrough transcripts. Analysis of public datasets detected increases in polymerase II occupancy in DoG regions after heat shock, supporting our findings. Interestingly, DoGs tend to be produced in the vicinity of neighboring genes, leading to a marked increase in their antisense-generating potential. Finally, we examine genomic features of readthrough transcription and observe a unique chromatin signature typical of DoG-producing regions, suggesting that readthrough transcription is associated with the maintenance of an open chromatin state.


2020 ◽  
Vol 16 (11) ◽  
pp. e1008422
Author(s):  
Azusa Tanaka ◽  
Yasuhiro Ishitsuka ◽  
Hiroki Ohta ◽  
Akihiro Fujimoto ◽  
Jun-ichirou Yasunaga ◽  
...  

The huge amount of data acquired by high-throughput sequencing requires data reduction for effective analysis. Here we give a clustering algorithm for genome-wide open chromatin data using a new data reduction method. This method regards the genome as a string of 1s and 0s based on a set of peaks and calculates the Hamming distances between the strings. This algorithm with the systematically optimized set of peaks enables us to quantitatively evaluate differences between samples of hematopoietic cells and classify cell types, potentially leading to a better understanding of leukemia pathogenesis.


2021 ◽  
Vol 7 (26) ◽  
pp. eabf8962
Author(s):  
Ke Xiao ◽  
Dan Xiong ◽  
Gong Chen ◽  
Jinsong Yu ◽  
Yue Li ◽  
...  

Like most DNA viruses, herpesviruses precisely deliver their genomes into the sophisticatedly organized nuclei of the infected host cells to initiate subsequent transcription and replication. However, it remains elusive how the viral genome specifically interacts with the host genome and hijacks host transcription machinery. Using pseudorabies virus (PRV) as model virus, we performed chromosome conformation capture assays to demonstrate a genome-wide specific trans-species chromatin interaction between the virus and host. Our data show that the PRV genome is delivered by the host DNA binding protein RUNX1 into the open chromatin and active transcription zone. This facilitates virus hijacking host RNAPII to efficiently transcribe viral genes, which is significantly inhibited by either a RUNX1 inhibitor or RNA interference. Together, these findings provide insights into the chromatin interaction between viral and host genomes and identify new areas of research to advance the understanding of herpesvirus genome transcription.


Sign in / Sign up

Export Citation Format

Share Document