F-Seq2: improving the feature density based peak caller with dynamic statistics

Abstract Genomic and epigenomic features are captured at a genome-wide level by using high-throughput sequencing (HTS) technologies. Peak calling delineates features identified in HTS experiments, such as open chromatin regions and transcription factor binding sites, by comparing the observed read distributions to a random expectation. Since its introduction, F-Seq has been widely used and shown to be the most sensitive and accurate peak caller for DNase I hypersensitive site (DNase-seq) data. However, the first release (F-Seq1) has two key limitations: lack of support for user-input control datasets, and poor test statistic reporting. These constrain its ability to capture systematic and experimental biases inherent to the background distributions in peak prediction, and to subsequently rank predicted peaks by confidence. To address these limitations, we present F-Seq2, which combines kernel density estimation and a dynamic ‘continuous’ Poisson test to account for local biases and accurately rank candidate peaks. The output of F-Seq2 is suitable for irreproducible discovery rate analysis as test statistics are calculated for individual candidate summits, allowing direct comparison of predictions across replicates. These improvements significantly boost the performance of F-Seq2 for ATAC-seq and ChIP-seq datasets, outperforming competing peak callers used by the ENCODE Consortium in terms of precision and recall.

Download Full-text

F-Seq2: improving the feature density based peak caller with dynamic statistics

10.1101/2020.10.06.328674 ◽

2020 ◽

Author(s):

Nanxiang Zhao ◽

Alan P. Boyle

Keyword(s):

High Throughput Sequencing ◽

Superior Performance ◽

Open Chromatin ◽

Test Statistics ◽

Peak Calling ◽

User Input ◽

Distance Analysis ◽

Sequencing Technologies ◽

A Genome ◽

Peak Caller

ABSTRACTGenomic and epigenomic features are captured at a genome-wide level by using high-throughput sequencing technologies. Peak calling is one of the first essential steps in analyzing these features by delineating regions such as open chromatin regions and transcription factor binding sites. Our original peak calling software, F-Seq, has been widely used and shown to be the most sensitive and accurate peak caller for DNase I hypersensitive sites sequencing (DNase-seq) data. However, F-Seq lacks support for user-input control dataset nor reporting test statistics, limiting its ability to capture systematic and experimental biases and accurately estimate background distributions. Here we present an improved version, F-Seq2, which combined the power of kernel density estimation and a dynamic “continuous” Poisson distribution to robustly account for local biases and solve ties when ranking candidate peaks. In F-score and motif distance analysis, we demonstrated the superior performance of F-Seq2 than other competing peak callers used by the ENCODE Consortium on simulated and real ATAC-seq and ChIP-seq datasets. The output of F-Seq2 is suitable for irreproducible discovery rate (IDR) analysis as the test statistics calculated for individual candidate summit and ties are robustly solved.

Download Full-text

Detection and application of genome-wide variations in peach for association and genetic relationship analysis

BMC Genetics ◽

10.1186/s12863-019-0799-8 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 2

Author(s):

Liping Guan ◽

Ke Cao ◽

Yong Li ◽

Jian Guo ◽

Qiang Xu ◽

...

Keyword(s):

Genetic Relationship ◽

Dna Markers ◽

High Throughput Sequencing ◽

Prunus Persica ◽

Genetic Research ◽

Diploid Species ◽

Sequencing Data ◽

Relationship Analysis ◽

Genome Wide ◽

A Genome

Abstract Background Peach (Prunus persica L.) is a diploid species and model plant of the Rosaceae family. In the past decade, significant progress has been made in peach genetic research via DNA markers, but the number of these markers remains limited. Results In this study, we performed a genome-wide DNA markers detection based on sequencing data of six distantly related peach accessions. A total of 650,693~1,053,547 single nucleotide polymorphisms (SNPs), 114,227~178,968 small insertion/deletions (InDels), 8386~12,298 structure variants (SVs), 2111~2581 copy number variants (CNVs) and 229,357~346,940 simple sequence repeats (SSRs) were detected and annotated. To demonstrate the application of DNA markers, 944 SNPs were filtered for association study of fruit ripening time and 15 highly polymorphic SSRs were selected to analyze the genetic relationship among 221 accessions. Conclusions The results showed that the use of high-throughput sequencing to develop DNA markers is fast and effective. Comprehensive identification of DNA markers, including SVs and SSRs, would be of benefit to genetic diversity evaluation, genetic mapping, and molecular breeding of peach.

Download Full-text

A genome-wide identification, characterization and functional analysis of salt-related long non-coding RNAs in non-model plant Pistacia vera L. using transcriptome high throughput sequencing

Scientific Reports ◽

10.1038/s41598-020-62108-6 ◽

2020 ◽

Vol 10 (1) ◽

Cited By ~ 3

Author(s):

Masoomeh Jannesar ◽

Seyed Mahdi Seyedi ◽

Maryam Moazzam Jazi ◽

Vahid Niknam ◽

Hassan Ebrahimzadeh ◽

...

Keyword(s):

Functional Analysis ◽

High Throughput ◽

High Throughput Sequencing ◽

Pistacia Vera ◽

Model Plant ◽

Genome Wide ◽

A Genome ◽

Non Coding Rnas

Download Full-text

TEGS-CN: A Statistical Method for Pathway Analysis of Genome-wide Copy Number Profile

Cancer Informatics ◽

10.4137/cin.s13978 ◽

2014 ◽

Vol 13s4 ◽

pp. CIN.S13978

Author(s):

Yen-Tsung Huang ◽

Thomas Hsu ◽

David C. Christiani

Keyword(s):

Copy Number ◽

Copy Number Data ◽

Copy Number Profile ◽

Test Statistic ◽

Gene Set ◽

Bonferroni Adjustment ◽

Gene Sets ◽

Genome Wide ◽

A Genome ◽

Pathway Analyses

The effects of copy number alterations make up a significant part of the tumor genome profile, but pathway analyses of these alterations are still not well established. We proposed a novel method to analyze multiple copy numbers of genes within a pathway, termed Test for the Effect of a Gene Set with Copy Number data (TEGS-CN). TEGS-CN was adapted from TEGS, a method that we previously developed for gene expression data using a variance component score test. With additional development, we extend the method to analyze DNA copy number data, accounting for different sizes and thus various numbers of copy number probes in genes. The test statistic follows a mixture of X 2 distributions that can be obtained using permutation with scaled X 2 approximation. We conducted simulation studies to evaluate the size and the power of TEGS-CN and to compare its performance with TEGS. We analyzed a genome-wide copy number data from 264 patients of non-small-cell lung cancer. With the Molecular Signatures Database (MSigDB) pathway database, the genome-wide copy number data can be classified into 1814 biological pathways or gene sets. We investigated associations of the copy number profile of the 1814 gene sets with pack-years of cigarette smoking. Our analysis revealed five pathways with significant P values after Bonferroni adjustment (<2.8 x 10-5), including the PTEN pathway (7.8 x 10-7), the gene set up-regulated under heat shock (3.6 x 10-6), the gene sets involved in the immune profile for rejection of kidney transplantation (9.2 x 10-6) and for transcriptional control of leukocytes (2.2 x 10-5), and the ganglioside biosynthesis pathway (2.7 x 10-5). In conclusion, we present a new method for pathway analyses of copy number data, and causal mechanisms of the five pathways require further study.

Download Full-text

Specific chromatin changes mark lateral organ founder cells in the Arabidopsis inflorescence meristem

Journal of Experimental Botany ◽

10.1093/jxb/erz181 ◽

2019 ◽

Vol 70 (15) ◽

pp. 3867-3879 ◽

Cited By ~ 7

Author(s):

Anneke Frerichs ◽

Julia Engelhorn ◽

Janine Altmüller ◽

Jose Gutierrez-Marcos ◽

Wolfgang Werr

Keyword(s):

Dna Sequences ◽

High Throughput Sequencing ◽

Gene Activation ◽

Regulatory Elements ◽

Inflorescence Meristem ◽

Genome Wide ◽

A Genome ◽

Hypersensitive Sites ◽

Lateral Organ ◽

Founder Cells

Abstract Fluorescence-activated cell sorting (FACS) and assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) were combined to analyse the chromatin state of lateral organ founder cells (LOFCs) in the peripheral zone of the Arabidopsis apetala1-1 cauliflower-1 double mutant inflorescence meristem. On a genome-wide level, we observed a striking correlation between transposase hypersensitive sites (THSs) detected by ATAC-seq and DNase I hypersensitive sites (DHSs). The mostly expanded DHSs were often substructured into several individual THSs, which correlated with phylogenetically conserved DNA sequences or enhancer elements. Comparing chromatin accessibility with available RNA-seq data, THS change configuration was reflected by gene activation or repression and chromatin regions acquired or lost transposase accessibility in direct correlation with gene expression levels in LOFCs. This was most pronounced immediately upstream of the transcription start, where genome-wide THSs were abundant in a complementary pattern to established H3K4me3 activation or H3K27me3 repression marks. At this resolution, the combined application of FACS/ATAC-seq is widely applicable to detect chromatin changes during cell-type specification and facilitates the detection of regulatory elements in plant promoters.

Download Full-text

Genome-wide profiling of histone 3 lysine 36 trimethylation in clear cell renal cell carcinoma.

Journal of Clinical Oncology ◽

10.1200/jco.2014.32.4_suppl.464 ◽

2014 ◽

Vol 32 (4_suppl) ◽

pp. 464-464

Author(s):

Thai Huu Ho ◽

Jeong-Heon Lee ◽

Rafael Nunez Nateras ◽

Erik P. Castle ◽

Melissa L. Stanton ◽

...

Keyword(s):

Cell Lines ◽

Dna Sequences ◽

Open Chromatin ◽

Sequencing Analysis ◽

Gene Desert ◽

Cell Renal Cell Carcinoma ◽

Chip Sequencing ◽

Genome Wide ◽

A Genome ◽

Genomic Regions

464 Background: Although the von Hippel-Lindau (VHL) tumor suppressor gene is mutated in 60% of ccRCC, deletion of VHL in mice is insufficient for tumorigenesis. Sequencing of ccRCC tumors identified mutations in SETD2, a histone H3 lysine 36 (H3K36) trimethyltransferase. We hypothesize that loss of SETD2 methyltransferase activity alters the genome wide pattern of H3K36 trimethylation (H3K36me3) in ccRCC, and contributes to the cancer phenotype. Methods: To generate a genome-wide profile of H3K36me3 in frozen nephrectomy samples and RCC cell lines, we optimized a chromatin immunoprecipitation (ChIP) protocol for the isolation of DNA associated with H3K36me3. H3K36me3 is associated with open chromatin and an H3K36me3-specific antibody was used for immunoprecipitation of endogenous H3K36me3-bound DNA. ChIP PCR primers were optimized for active genes, such as actin, glyceraldehyde-3-phosphate dehydrogenase (GAPDH) and a “gene desert” on chromosome 12 (negative control). ChIP libraries were then generated from 3 paired uninvolved kidney and RCC and 2 RCC cell lines. In order to identify H3K36Me3 upregulated regions in uninvolved kidney and RCC, reads from the ChIP sequencing were mapped to the human genome using Burrows-Wheeler Aligner and SICER algorithms. Results: Using ChIP PCR, we found that active genomic regions were enriched 15-30 fold over the negative controls indicating that the quality and yield of immunoprecipitated DNA/chromatin complexes from frozen tissue was sufficient for ChIP sequencing. A preliminary ChIP sequencing analysis of RCC cell lines and frozen ccRCC tissue indicates that H3K36me3 enriched DNA sequences were mapped to exons (31.3%) compared to introns (13.5%, p<0.001), consistent with the role of H3K36me3 in transcription. Conclusions: Genomic regions enriched for H3K36Me3 binding were identified from patient-derived tissue and RCC cell lines. Current efforts are focused on comparing the H3K36me3 profiles between matched tumor and uninvolved kidney ChIP libraries to generate a genome wide map of dysregulated H3K36me3 modifications.

Download Full-text

A genome-wide analysis of open chromatin in human tracheal epithelial cells reveals novel candidate regulatory elements for lung function

Thorax ◽

10.1136/thoraxjnl-2011-200880 ◽

2011 ◽

Vol 67 (5) ◽

pp. 385-391 ◽

Cited By ~ 17

Author(s):

Jared M Bischof ◽

Christopher J Ott ◽

Shih-Hsing Leir ◽

Nehal Gosalia ◽

Lingyun Song ◽

...

Keyword(s):

Lung Function ◽

Epithelial Cells ◽

Regulatory Elements ◽

Open Chromatin ◽

Genome Wide Analysis ◽

Tracheal Epithelial Cells ◽

Genome Wide ◽

A Genome ◽

Tracheal Epithelial

Download Full-text

Comparative analysis reveals genomic features of stress-induced transcriptional readthrough

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1711120114 ◽

2017 ◽

Vol 114 (40) ◽

pp. E8362-E8371 ◽

Cited By ~ 36

Author(s):

Anna Vilborg ◽

Niv Sabath ◽

Yuval Wiesel ◽

Jenny Nathans ◽

Flonia Levy-Adam ◽

...

Keyword(s):

Heat Shock ◽

Osmotic Stress ◽

Stress Responses ◽

Failure Process ◽

Mouse Fibroblast ◽

Open Chromatin ◽

Chromatin Signature ◽

Genomic Features ◽

Genome Wide ◽

A Genome

Transcription is a highly regulated process, and stress-induced changes in gene transcription have been shown to play a major role in stress responses and adaptation. Genome-wide studies reveal prevalent transcription beyond known protein-coding gene loci, generating a variety of RNA classes, most of unknown function. One such class, termed downstream of gene-containing transcripts (DoGs), was reported to result from transcriptional readthrough upon osmotic stress in human cells. However, how widespread the readthrough phenomenon is, and what its causes and consequences are, remain elusive. Here we present a genome-wide mapping of transcriptional readthrough, using nuclear RNA-Seq, comparing heat shock, osmotic stress, and oxidative stress in NIH 3T3 mouse fibroblast cells. We observe massive induction of transcriptional readthrough, both in levels and length, under all stress conditions, with significant, yet not complete, overlap of readthrough-induced loci between different conditions. Importantly, our analyses suggest that stress-induced transcriptional readthrough is not a random failure process, but is rather differentially induced across different conditions. We explore potential regulators and find a role for HSF1 in the induction of a subset of heat shock-induced readthrough transcripts. Analysis of public datasets detected increases in polymerase II occupancy in DoG regions after heat shock, supporting our findings. Interestingly, DoGs tend to be produced in the vicinity of neighboring genes, leading to a marked increase in their antisense-generating potential. Finally, we examine genomic features of readthrough transcription and observe a unique chromatin signature typical of DoG-producing regions, suggesting that readthrough transcription is associated with the maintenance of an open chromatin state.

Download Full-text

Systematic clustering algorithm for chromatin accessibility data and its application to hematopoietic cells

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008422 ◽

2020 ◽

Vol 16 (11) ◽

pp. e1008422

Author(s):

Azusa Tanaka ◽

Yasuhiro Ishitsuka ◽

Hiroki Ohta ◽

Akihiro Fujimoto ◽

Jun-ichirou Yasunaga ◽

...

Keyword(s):

Data Reduction ◽

Clustering Algorithm ◽

High Throughput Sequencing ◽

Hematopoietic Cells ◽

Cell Types ◽

Chromatin Accessibility ◽

Open Chromatin ◽

Genome Wide ◽

Data Reduction Method ◽

Effective Analysis

The huge amount of data acquired by high-throughput sequencing requires data reduction for effective analysis. Here we give a clustering algorithm for genome-wide open chromatin data using a new data reduction method. This method regards the genome as a string of 1s and 0s based on a set of peaks and calculates the Hamming distances between the strings. This algorithm with the systematically optimized set of peaks enables us to quantitatively evaluate differences between samples of hematopoietic cells and classify cell types, potentially leading to a better understanding of leukemia pathogenesis.

Download Full-text

RUNX1-mediated alphaherpesvirus-host trans-species chromatin interaction promotes viral transcription

Science Advances ◽

10.1126/sciadv.abf8962 ◽

2021 ◽

Vol 7 (26) ◽

pp. eabf8962

Author(s):

Ke Xiao ◽

Dan Xiong ◽

Gong Chen ◽

Jinsong Yu ◽

Yue Li ◽

...

Keyword(s):

Pseudorabies Virus ◽

Host Cells ◽

Chromatin Interaction ◽

Open Chromatin ◽

Chromosome Conformation ◽

Dna Viruses ◽

Genome Wide ◽

A Genome ◽

Genome Transcription ◽

Active Transcription

Like most DNA viruses, herpesviruses precisely deliver their genomes into the sophisticatedly organized nuclei of the infected host cells to initiate subsequent transcription and replication. However, it remains elusive how the viral genome specifically interacts with the host genome and hijacks host transcription machinery. Using pseudorabies virus (PRV) as model virus, we performed chromosome conformation capture assays to demonstrate a genome-wide specific trans-species chromatin interaction between the virus and host. Our data show that the PRV genome is delivered by the host DNA binding protein RUNX1 into the open chromatin and active transcription zone. This facilitates virus hijacking host RNAPII to efficiently transcribe viral genes, which is significantly inhibited by either a RUNX1 inhibitor or RNA interference. Together, these findings provide insights into the chromatin interaction between viral and host genomes and identify new areas of research to advance the understanding of herpesvirus genome transcription.

Download Full-text