A Comparison of Peak Callers Used for DNase-seq Data

Mapping Intimacies ◽

10.1101/003608 ◽

2014 ◽

Cited By ~ 1

Author(s):

Hashem Koohy ◽

Thomas Down ◽

Mikhail Spivakov ◽

Tim Hubbard

Keyword(s):

High Throughput Sequencing ◽

Cell Types ◽

Regulatory Elements ◽

Open Chromatin ◽

Independent Dataset ◽

Genome Wide ◽

Binding Data ◽

Threshold Setting ◽

Higher Sensitivity

Genome-wide profiling of open chromatin regions using DNase I and high-throughput sequencing (DNase- seq) is an increasingly popular approach for finding and studying regulatory elements. A variety of algorithms have been developed to identify regions of open chromatin from raw sequence-tag data, which has motivated us to assess and compare their performance. In this study, four published, publicly available peak calling algorithms used for DNase-seq data analysis (F-seq, Hotspot, MACS and ZINBA) are assessed at a range of signal thresholds on two published DNase-seq datasets for three cell types. The results were benchmarked against an independent dataset of regulatory regions derived from ENCODE in vivo transcription factor binding data for each particular cell type. The level of overlap between peak regions reported by each algorithm and this ENCODE-derived reference set was used to assess sensitivity and specificity of the algorithms. Our study suggests that F-seq has a slightly higher sensitivity than the next best algorithms. Hotspot and the ChIP-seq oriented method, MACS, both perform competitively when used with their default parameters. However the generic peak finder ZINBA appears to be less sensitive than the other three. We also assess accuracy of each algorithm over a range of signal thresholds. In particular, we show that the accuracy of F-Seq can be considerably improved by using a threshold setting that is different from the default value.

Systematic clustering algorithm for chromatin accessibility data and its application to hematopoietic cells

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008422 ◽

2020 ◽

Vol 16 (11) ◽

pp. e1008422

Author(s):

Azusa Tanaka ◽

Yasuhiro Ishitsuka ◽

Hiroki Ohta ◽

Akihiro Fujimoto ◽

Jun-ichirou Yasunaga ◽

...

Keyword(s):

Data Reduction ◽

Clustering Algorithm ◽

High Throughput Sequencing ◽

Hematopoietic Cells ◽

Cell Types ◽

Chromatin Accessibility ◽

Open Chromatin ◽

Genome Wide ◽

Data Reduction Method ◽

Effective Analysis

The huge amount of data acquired by high-throughput sequencing requires data reduction for effective analysis. Here we give a clustering algorithm for genome-wide open chromatin data using a new data reduction method. This method regards the genome as a string of 1s and 0s based on a set of peaks and calculates the Hamming distances between the strings. This algorithm with the systematically optimized set of peaks enables us to quantitatively evaluate differences between samples of hematopoietic cells and classify cell types, potentially leading to a better understanding of leukemia pathogenesis.

Simultaneous profiling of multiple chromatin proteins in the same cells

10.1101/2021.04.27.441642 ◽

2021 ◽

Author(s):

Sneha Gopalan ◽

Yuqing Wang ◽

Nicholas W. Harper ◽

Manuel Garber ◽

Thomas G Fazzio

Keyword(s):

Rna Polymerase Ii ◽

Direct Analysis ◽

Cell Types ◽

Regulatory Elements ◽

Genome Wide ◽

Distinct Cell ◽

Direct Measurements ◽

Cell Type Specific ◽

Chromatin Proteins

Methods derived from CUT&RUN and CUT&Tag enable genome-wide mapping of the localization of proteins on chromatin from as few as one cell. These and other mapping approaches focus on one protein at a time, preventing direct measurements of co-localization of different chromatin proteins in the same cells and requiring prioritization of targets where samples are limiting. Here we describe multi-CUT&Tag, an adaptation of CUT&Tag that overcomes these hurdles by using antibody-specific barcodes to simultaneously map multiple proteins in the same cells. Highly specific multi-CUT&Tag maps of histone marks and RNA Polymerase II uncovered sites of co-localization in the same cells, active and repressed genes, and candidate cis-regulatory elements. Single-cell multi-CUT&Tag profiling facilitated identification of distinct cell types from a mixed population and characterization of cell type-specific chromatin architecture. In sum, multi-CUT&Tag increases the information content per cell of epigenomic maps, facilitating direct analysis of the interplay of different proteins on chromatin.

HCR-FlowFISH: A flexible CRISPR screening method to identify cis-regulatory elements and their target genes

10.1101/2020.05.11.078675 ◽

2020 ◽

Author(s):

SK Reilly ◽

SJ Gosai ◽

A Gutierrez ◽

JC Ulirsch ◽

M Kanai ◽

...

Keyword(s):

Gene Expression ◽

Target Genes ◽

Screening Method ◽

Cell Types ◽

Regulatory Elements ◽

Hybridization Chain Reaction ◽

Genome Wide ◽

Wide Range ◽

Causal Variants ◽

Endogenous Loci

AbstractCRISPR screens for cis-regulatory elements (CREs) have shown unprecedented power to endogenously characterize the non-coding genome. To characterize CREs we developed HCR-FlowFISH (Hybridization Chain Reaction Fluorescent In-Situ Hybridization coupled with Flow Cytometry), which directly quantifies native transcripts within their endogenous loci following CRISPR perturbations of regulatory elements, eliminating the need for restrictive phenotypic assays such as growth or transcript-tagging. HCR-FlowFISH accurately quantifies gene expression across a wide range of transcript levels and cell types. We also developed CASA (CRISPR Activity Screen Analysis), a hierarchical Bayesian model to identify and quantify CRE activity. Using >270,000 perturbations, we identified CREs for GATA1, HDAC6, ERP29, LMO2, MEF2C, CD164, NMU, FEN1 and the FADS gene cluster. Our methods detect subtle gene expression changes and identify CREs regulating multiple genes, sometimes at different magnitudes and directions. We demonstrate the power of HCR-FlowFISH to parse genome-wide association signals by nominating causal variants and target genes.

DualSeqDB: the host–pathogen dual RNA sequencing database for infection processes

Nucleic Acids Research ◽

10.1093/nar/gkaa890 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D687-D693

Author(s):

Javier Macho Rendón ◽

Benjamin Lang ◽

Marc Ramos Llorens ◽

Gian Gaetano Tartaglia ◽

Marc Torrent Burgas

Keyword(s):

Pathogenic Bacteria ◽

High Throughput Sequencing ◽

Homo Sapiens ◽

Cell Types ◽

Natural Hosts ◽

Host Infection ◽

Infection Processes ◽

Different Strains

Abstract Despite antibiotic resistance being a matter of growing concern worldwide, the bacterial mechanisms of pathogenesis remain underexplored, restraining our ability to develop new antimicrobials. The rise of high-throughput sequencing technology has made available a massive amount of transcriptomic data that could help elucidate the mechanisms underlying bacterial infection. Here, we introduce the DualSeqDB database, a resource that helps the identification of gene transcriptional changes in both pathogenic bacteria and their natural hosts upon infection. DualSeqDB comprises nearly 300 000 entries from eight different studies, with information on bacterial and host differential gene expression under in vivo and in vitro conditions. Expression data values were calculated entirely from raw data and analyzed through a standardized pipeline to ensure consistency between different studies. It includes information on seven different strains of pathogenic bacteria and a variety of cell types and tissues in Homo sapiens, Mus musculus and Macaca fascicularis at different time points. We envisage that DualSeqDB can help the research community in the systematic characterization of genes involved in host infection and help the development and tailoring of new molecules against infectious diseases. DualSeqDB is freely available at http://www.tartaglialab.com/dualseq.

Specific chromatin changes mark lateral organ founder cells in the Arabidopsis inflorescence meristem

Journal of Experimental Botany ◽

10.1093/jxb/erz181 ◽

2019 ◽

Vol 70 (15) ◽

pp. 3867-3879 ◽

Cited By ~ 7

Author(s):

Anneke Frerichs ◽

Julia Engelhorn ◽

Janine Altmüller ◽

Jose Gutierrez-Marcos ◽

Wolfgang Werr

Keyword(s):

Dna Sequences ◽

High Throughput Sequencing ◽

Gene Activation ◽

Regulatory Elements ◽

Inflorescence Meristem ◽

Genome Wide ◽

A Genome ◽

Hypersensitive Sites ◽

Lateral Organ ◽

Founder Cells

Abstract Fluorescence-activated cell sorting (FACS) and assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) were combined to analyse the chromatin state of lateral organ founder cells (LOFCs) in the peripheral zone of the Arabidopsis apetala1-1 cauliflower-1 double mutant inflorescence meristem. On a genome-wide level, we observed a striking correlation between transposase hypersensitive sites (THSs) detected by ATAC-seq and DNase I hypersensitive sites (DHSs). The mostly expanded DHSs were often substructured into several individual THSs, which correlated with phylogenetically conserved DNA sequences or enhancer elements. Comparing chromatin accessibility with available RNA-seq data, THS change configuration was reflected by gene activation or repression and chromatin regions acquired or lost transposase accessibility in direct correlation with gene expression levels in LOFCs. This was most pronounced immediately upstream of the transcription start, where genome-wide THSs were abundant in a complementary pattern to established H3K4me3 activation or H3K27me3 repression marks. At this resolution, the combined application of FACS/ATAC-seq is widely applicable to detect chromatin changes during cell-type specification and facilitates the detection of regulatory elements in plant promoters.

Analysis of putative cis-regulatory elements regulating blood pressure variation

Human Molecular Genetics ◽

10.1093/hmg/ddaa098 ◽

2020 ◽

Vol 29 (11) ◽

pp. 1922-1932

Author(s):

Priyanka Nandakumar ◽

Dongwon Lee ◽

Thomas J Hoffmann ◽

Georg B Ehret ◽

Dan Arking ◽

...

Keyword(s):

Blood Pressure ◽

Association Studies ◽

Specific Effect ◽

Cell Types ◽

Regulatory Elements ◽

Open Chromatin ◽

Genome Wide Association Studies ◽

Cell Type ◽

Functional Scores ◽

Cell Type Specific

Abstract Hundreds of loci have been associated with blood pressure (BP) traits from many genome-wide association studies. We identified an enrichment of these loci in aorta and tibial artery expression quantitative trait loci in our previous work in ~100 000 Genetic Epidemiology Research on Aging study participants. In the present study, we sought to fine-map known loci and identify novel genes by determining putative regulatory regions for these and other tissues relevant to BP. We constructed maps of putative cis-regulatory elements (CREs) using publicly available open chromatin data for the heart, aorta and tibial arteries, and multiple kidney cell types. Variants within these regions may be evaluated quantitatively for their tissue- or cell-type-specific regulatory impact using deltaSVM functional scores, as described in our previous work. We aggregate variants within these putative CREs within 50 Kb of the start or end of ‘expressed’ genes in these tissues or cell types using public expression data and use deltaSVM scores as weights in the group-wise sequence kernel association test to identify candidates. We test for association with both BP traits and expression within these tissues or cell types of interest and identify the candidates MTHFR, C10orf32, CSK, NOV, ULK4, SDCCAG8, SCAMP5, RPP25, HDGFRP3, VPS37B and PPCDC. Additionally, we examined two known QT interval genes, SCN5A and NOS1AP, in the Atherosclerosis Risk in Communities Study, as a positive control, and observed the expected heart-specific effect. Thus, our method identifies variants and genes for further functional testing using tissue- or cell-type-specific putative regulatory information.

Make way for the ‘next generation’: application and prospects for genome-wide, epigenome-specific technologies in endocrine research

Journal of Molecular Endocrinology ◽

10.1530/jme-12-0045 ◽

2012 ◽

Vol 49 (1) ◽

pp. R19-R27 ◽

Cited By ~ 14

Author(s):

Richard D Emes ◽

William E Farrell

Keyword(s):

High Throughput Sequencing ◽

Cell Types ◽

Epigenetic Changes ◽

Disease States ◽

Base Level ◽

Genome Wide ◽

Endocrine Organs ◽

Life Threatening ◽

Mechanism Of Interaction ◽

Genome Level

Epigenetic changes, which target DNA and associated histones, can be described as a pivotal mechanism of interaction between genes and the environment. The field of epigenomics aims to detect and interpret epigenetic modifications at the whole genome level. These approaches have the potential to increase resolution of epigenetic changes to the single base level in multiple disease states or across a population of individuals. Identification and comparison of the epigenomic landscape has challenged our understanding of the regulation of phenotype. Additionally, inclusion of these marks as biomarkers in the early detection or progression monitoring of disease is providing novel avenues for future biomedical research. Cells of the endocrine organs, which include pituitary, thyroid, thymus, pancreas ovary and testes, have been shown to be susceptible to epigenetic alteration, leading to both local and systemic changes often resulting in life-threatening metabolic disease. As with other cell types and populations, endocrine cells are susceptible to tumour development, which in turn may have resulted from aberration of epigenetic control. Techniques including high-throughput sequencing and array-based analysis to investigate these changes have rapidly emerged and are continually evolving. Here, we present a review of these methods and their promise to influence our studies on the epigenome for endocrine research and perhaps to uncover novel therapeutic options in disease states.

A genome-wide analysis of open chromatin in human tracheal epithelial cells reveals novel candidate regulatory elements for lung function

Thorax ◽

10.1136/thoraxjnl-2011-200880 ◽

2011 ◽

Vol 67 (5) ◽

pp. 385-391 ◽

Cited By ~ 17

Author(s):

Jared M Bischof ◽

Christopher J Ott ◽

Shih-Hsing Leir ◽

Nehal Gosalia ◽

Lingyun Song ◽

...

Keyword(s):

Lung Function ◽

Epithelial Cells ◽

Regulatory Elements ◽

Open Chromatin ◽

Genome Wide Analysis ◽

Tracheal Epithelial Cells ◽

Genome Wide ◽

A Genome ◽

Tracheal Epithelial

Analysis of putative cis-regulatory elements regulating blood pressure variation

10.1101/820522 ◽

2019 ◽

Author(s):

Priyanka Nandakumar ◽

Dongwon Lee ◽

Thomas J. Hoffmann ◽

Georg B. Ehret ◽

Dan Arking ◽

...

Keyword(s):

Gene Expression ◽

Blood Pressure ◽

Cell Types ◽

Regulatory Elements ◽

Open Chromatin ◽

Genome Wide Association Studies ◽

Cell Type ◽

Functional Scores ◽

Cell Type Specific ◽

Different Tissues

AbstractHundreds of loci have been associated with blood pressure traits from many genome-wide association studies. We identified an enrichment of these loci in aorta and tibial artery expression quantitative trait loci in our previous work in ∼100,000 Genetic Epidemiology Research on Aging (GERA) study participants. In the present study, we subsequently focused on determining putative regulatory regions for these and other tissues of relevance to blood pressure, to both fine-map these loci by pinpointing genes and variants of functional interest within them, and to identify any novel genes.We constructed maps of putative cis-regulatory elements using publicly available open chromatin data for the heart, aorta and tibial arteries, and multiple kidney cell types. Sequence variants within these regions may be evaluated quantitatively for their tissue- or cell-type-specific regulatory impact using deltaSVM functional scores, as described in our previous work. In order to identify genes of interest, we aggregate these variants in these putative cis-regulatory elements within 50Kb of the start or end of genes considered as “expressed” in these tissues or cell types using publicly available gene expression data, and use the deltaSVM scores as weights in the well-known group-wise sequence kernel association test (SKAT). We test for association with both blood pressure traits as well as expression within these tissues or cell types of interest, and identify several genes, including MTHFR, C10orf32, CSK, NOV, ULK4, SDCCAG8, SCAMP5, RPP25, HDGFRP3, VPS37B, and PPCDC. Although our study centers on blood pressure traits, we additionally examined two known genes, SCN5A and NOS1AP involved in the cardiac trait QT interval, in the Atherosclerosis Risk in Communities Study (ARIC), as a positive control, and observed an expected heart-specific effect. Thus, our method may be used to identify variants and genes for further functional testing using tissue- or cell-type-specific putative regulatory information.Author SummarySequence change in genes (“variants”) are linked to the presence and severity of different traits or diseases. However, as genes may be expressed in different tissues and at different times and degrees, using this information is expected to more accurately identify genes of interest. Variants within the genes are essential, but also in the sequences (“regulatory elements”) that control the genes’ expression in different tissues or cell types. In this study, we aim to use this information about expression and variants potentially involved in gene expression regulation to better pinpoint genes and variants in regulatory elements of interest for blood pressure regulation. We do so by taking advantage of such data that are publicly available, and use methods to combine information about variants in aggregate within a gene’s putative regulatory elements in tissues thought to be relevant for blood pressure, and identify several genes, meant to enable experimental follow-up.

CellWalker integrates single-cell and bulk data to resolve regulatory elements across cell types in complex tissues

10.1101/847657 ◽

2019 ◽

Cited By ~ 1

Author(s):

Pawel F. Przytycki ◽

Katherine S. Pollard

Keyword(s):

Single Cell ◽

Cell Types ◽

Regulatory Elements ◽

Cell Labeling ◽

Open Chromatin ◽

Specific Cell ◽

Rna Seq ◽

Data Types ◽

Bulk Data ◽

Cell Type Specific

Single-cell and bulk genomics assays have complementary strengths and weaknesses, and alone neither strategy can fully capture regulatory elements across the diversity of cells in complex tissues. We present CellWalker, a method that integrates single-cell open chromatin (scATAC-seq) data with gene expression (RNA-seq) and other data types using a network model that simultaneously improves cell labeling in noisy scATAC-seq and annotates cell-type specific regulatory elements in bulk data. We demonstrate CellWalker’s robustness to sparse annotations and noise using simulations and combined RNA-seq and ATAC-seq in individual cells. We then apply CellWalker to the developing brain. We identify cells transitioning between transcriptional states, resolve enhancers to specific cell types, and observe that autism and other neurological traits can be mapped to specific cell types through their enhancers.