Reducing mitochondrial reads in ATAC-seq using CRISPR/Cas9

Mapping Intimacies ◽

10.1101/087890 ◽

2016 ◽

Author(s):

Lindsey Montefiori ◽

Liana Hernandez ◽

Zijie Zhang ◽

Yoav Gilad ◽

Carole Ober ◽

...

Keyword(s):

Dna Sequences ◽

Cost Reduction ◽

High Throughput Sequencing ◽

Nuclear Genome ◽

Cell Types ◽

Quality Data ◽

Open Chromatin ◽

Guide Rnas ◽

Considerable Cost ◽

Cell Lysis Buffer

AbstractATAC-seq is a high-throughput sequencing technique that aims at identifying DNA sequences located in open chromatin. Depending on the cell type, ATAC-seq may yield a high number of mitochondrial sequencing reads (~20-80% of the reads). As the regions of open chromatin of interest are usually located in the nuclear genome, mitochondrial reads are typically discarded from the analysis. To decrease wasted sequencing, we performed targeted cleavage of mitochondrial DNA using CRISPR/Cas9 and 100 mtDNA-specific guide RNAs. We also tested a modified ATAC-seq protocol that does not include detergent in the cell lysis buffer. Both treatments resulted in considerable reduction of mitochondrial reads (1.7 and 3-fold, respectively). The removal of detergent, however, resulted in increased background and fewer peaks identified. The highest number of peaks and highest quality data was obtained by preparing samples with the original ATAC-seq protocol (using detergent) and treating them with anti-mitochondrial guide RNAs and Cas9. This strategy could lead to considerable cost reduction and improved peak calling when performing ATAC-seq on a moderate to large number of samples and in cell types that contain a large amount of mitochondria.

Systematic clustering algorithm for chromatin accessibility data and its application to hematopoietic cells

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008422 ◽

2020 ◽

Vol 16 (11) ◽

pp. e1008422

Author(s):

Azusa Tanaka ◽

Yasuhiro Ishitsuka ◽

Hiroki Ohta ◽

Akihiro Fujimoto ◽

Jun-ichirou Yasunaga ◽

...

Keyword(s):

Data Reduction ◽

Clustering Algorithm ◽

High Throughput Sequencing ◽

Hematopoietic Cells ◽

Cell Types ◽

Chromatin Accessibility ◽

Open Chromatin ◽

Genome Wide ◽

Data Reduction Method ◽

Effective Analysis

The huge amount of data acquired by high-throughput sequencing requires data reduction for effective analysis. Here we give a clustering algorithm for genome-wide open chromatin data using a new data reduction method. This method regards the genome as a string of 1s and 0s based on a set of peaks and calculates the Hamming distances between the strings. This algorithm with the systematically optimized set of peaks enables us to quantitatively evaluate differences between samples of hematopoietic cells and classify cell types, potentially leading to a better understanding of leukemia pathogenesis.

High-resolution TADs reveal DNA sequences underlying genome organization in flies

10.1101/115063 ◽

2017 ◽

Cited By ~ 12

Author(s):

Fidel Ramírez ◽

Vivek Bhardwaj ◽

José Villaveces ◽

Laura Arrigoni ◽

Björn A. Grüning ◽

...

Keyword(s):

High Resolution ◽

Dna Sequences ◽

Spatial Organization ◽

Molecular Mechanisms ◽

Cell Types ◽

Open Chromatin ◽

Promoter Regions ◽

Dna Motifs ◽

3D Genome ◽

Eukaryotic Chromatin

AbstractEukaryotic chromatin is partitioned into domains called TADs that are broadly conserved between species and virtually identical among cell types within the same species. Previous studies in mammals have shown that the DNA binding protein CTCF and cohesin contribute to a fraction of TAD boundaries. Apart from this, the molecular mechanisms governing this partitioning remain poorly understood. Using our new software, HiCExplorer, we annotated high-resolution (570 bp) TAD boundaries in flies and identified eight DNA motifs enriched at boundaries. Known insulator proteins bind five of these motifs while the remaining three motifs are novel. We find that boundaries are either at core promoters of active genes or at non-promoter regions of inactive chromatin and that these two groups are characterized by different sets of DNA motifs. Most boundaries are present at divergent promoters of constitutively expressed genes and the gene expression tends to be coordinated within TADs. In contrast to mammals, the CTCF motif is only present on 2% of boundaries in flies. We demonstrate that boundaries can be accurately predicted using only the motif sequences, along with open chromatin, suggesting that DNA sequence encodes the 3D genome architecture in flies. Finally, we present an interactive online database to access and explore the spatial organization of fly, mouse and human genomes, available at http://chorogeome.ie-freiburg.mpg.de.

A Comparison of Peak Callers Used for DNase-seq Data

10.1101/003608 ◽

2014 ◽

Cited By ~ 1

Author(s):

Hashem Koohy ◽

Thomas Down ◽

Mikhail Spivakov ◽

Tim Hubbard

Keyword(s):

High Throughput Sequencing ◽

Cell Types ◽

Regulatory Elements ◽

Open Chromatin ◽

Independent Dataset ◽

Genome Wide ◽

Binding Data ◽

Threshold Setting ◽

Higher Sensitivity

Genome-wide profiling of open chromatin regions using DNase I and high-throughput sequencing (DNase- seq) is an increasingly popular approach for finding and studying regulatory elements. A variety of algorithms have been developed to identify regions of open chromatin from raw sequence-tag data, which has motivated us to assess and compare their performance. In this study, four published, publicly available peak calling algorithms used for DNase-seq data analysis (F-seq, Hotspot, MACS and ZINBA) are assessed at a range of signal thresholds on two published DNase-seq datasets for three cell types. The results were benchmarked against an independent dataset of regulatory regions derived from ENCODE in vivo transcription factor binding data for each particular cell type. The level of overlap between peak regions reported by each algorithm and this ENCODE-derived reference set was used to assess sensitivity and specificity of the algorithms. Our study suggests that F-seq has a slightly higher sensitivity than the next best algorithms. Hotspot and the ChIP-seq oriented method, MACS, both perform competitively when used with their default parameters. However the generic peak finder ZINBA appears to be less sensitive than the other three. We also assess accuracy of each algorithm over a range of signal thresholds. In particular, we show that the accuracy of F-Seq can be considerably improved by using a threshold setting that is different from the default value.

Abstract 066: Enhancer Repertoires That Define Renin Cell Identity

Hypertension ◽

10.1161/hyp.68.suppl_1.066 ◽

2016 ◽

Vol 68 (suppl_1) ◽

Author(s):

Maria F Martinez ◽

Silvia Medrano ◽

Masafumi Oka ◽

Ellen S Pentz ◽

Allan W Dickerman ◽

...

Keyword(s):

Dna Sequences ◽

Cell Types ◽

Chromatin Accessibility ◽

Cell Phenotype ◽

Open Chromatin ◽

Independent Manner ◽

Cell Identity ◽

Genome Wide ◽

Renin Gene ◽

Chromatin Configuration

Control of the renin cell phenotype is crucial for the regulation of blood pressure and fluid- electrolyte homeostasis. Enhancers are cis -acting DNA sequences that harbor distinct chromatin features and regulate gene expression in an orientation-independent manner. Recently, clusters of enhancers or super-enhancers (SE) highly enriched with master transcription factors, possessing open chromatin configuration and in close proximity to cell-identity genes have been proposed. We tested the hypothesis that renin cells have unique repertoires of enhancers and super-enhancers, distinct from other cell types. Those regulatory clusters may in turn confer the identity of renin cells. To define the genome-wide enhancer landscape characteristic of renin cells, we studied As4.1 cells, kidney tumor cells that express renin constitutively, and native renin cells sorted from the kidneys of Ren1cKO-YFP + mice. In these mice, the renin promoter drives YFP expression thus marking the renin cells. We used genome-wide ChIP-Seq for Med1 (subunit 1 of the Mediator complex), H3K27Ac (active enhancers) and Pol II (to visualize putative genomic areas undergoing transcription). The ROSE algorithm we used to ascertain super-enhancers. Chromatin accessibility genome-wide was assessed using ATAC-Seq. The results were compared to twenty-one other cell types that do not express renin. In As4.1 cells, we identified 14,871 enhancers based on H3K27Ac. Of those, 888 were classified as super-enhancers. The Med1 signal in As4.1 cells showed a SE localized 5kb upstream the Ren1 gene, which was ranked at position 25 among other SEs. The H3K27Ac signal showed highest occupancy in the same region. ChIP-Seq for H3K27Ac in YFP + cells showed 211 SEs of 2,987 peaks. The SE for the renin gene possessed the highest signal and ranked number 1, indicating its importance in renin cells. One hundred and thirteen SEs were unique to renin cells, including the SE associated with the renin gene. ATAC-Seq signals overlapped with the renin SE and the classical enhancer indicating that the chromatin was accessible for transcription. In summary, renin-expressing cells possess distinct repertoires of unique enhancers and super-enhancers that acting in concert are likely to determine the renin phenotype.

Cell-Specific Determinants of Peroxisome Proliferator-Activated Receptor γ Function in Adipocytes and Macrophages

Molecular and Cellular Biology ◽

10.1128/mcb.01651-09 ◽

2010 ◽

Vol 30 (9) ◽

pp. 2078-2089 ◽

Cited By ~ 153

Author(s):

Martina I. Lefterova ◽

David J. Steger ◽

David Zhuo ◽

Mohammed Qatanani ◽

Shannon E. Mullican ◽

...

Keyword(s):

Insulin Resistance ◽

Histone Acetylation ◽

Histone Modifications ◽

High Throughput Sequencing ◽

Cell Types ◽

Chromatin Accessibility ◽

Open Chromatin ◽

Peroxisome Proliferator ◽

Peroxisome Proliferator Activated Receptor ◽

Immune Genes

ABSTRACT The nuclear receptor peroxisome proliferator activator receptor γ (PPARγ) is the target of antidiabetic thiazolidinedione drugs, which improve insulin resistance but have side effects that limit widespread use. PPARγ is required for adipocyte differentiation, but it is also expressed in other cell types, notably macrophages, where it influences atherosclerosis, insulin resistance, and inflammation. A central question is whether PPARγ binding in macrophages occurs at genomic locations the same as or different from those in adipocytes. Here, utilizing chromatin immunoprecipitation and high-throughput sequencing (ChIP-seq), we demonstrate that PPARγ cistromes in mouse adipocytes and macrophages are predominantly cell type specific. In thioglycolate-elicited macrophages, PPARγ colocalizes with the hematopoietic transcription factor PU.1 in areas of open chromatin and histone acetylation, near a distinct set of immune genes in addition to a number of metabolic genes shared with adipocytes. In adipocytes, the macrophage-unique binding regions are marked with repressive histone modifications, typically associated with local chromatin compaction and gene silencing. PPARγ, when introduced into preadipocytes, bound only to regions depleted of repressive histone modifications, where it increased DNA accessibility, enhanced histone acetylation, and induced gene expression. Thus, the cell specificity of PPARγ function is regulated by cell-specific transcription factors, chromatin accessibility, and histone marks. Our data support the existence of an epigenomic hierarchy in which PPARγ binding to cell-specific sites not marked by repressive marks opens chromatin and leads to local activation marks, including histone acetylation.

Classifying cells with Scasat - a tool to analyse single-cell ATAC-seq

10.1101/227397 ◽

2017 ◽

Cited By ~ 1

Author(s):

Syed Murtuza Baker ◽

Connor Rogerson ◽

Andrew Hayes ◽

Andrew D. Sharrocks ◽

Magnus Rattray

Keyword(s):

Single Cell ◽

Dna Sequences ◽

Mammalian Cells ◽

Cell Types ◽

Regulatory Elements ◽

Cellular Heterogeneity ◽

Supplementary Information ◽

Open Chromatin ◽

Analysis Tool ◽

Link Type

AbstractMotivationThe assay for transposase-accessible chromatin using sequencing (ATAC-seq) reveals the landscape and principles of DNA regulatory mechanisms by identifying the accessible genome of mammalian cells. When done at single-cell resolution, it provides an insight into the cell-to-cell variability that emerges from identical DNA sequences by identifying the variability in the genomic location of open chromatin sites in each of the cells. Processing of single-cell ATAC-seq requires a number of steps and a simple pipeline to processes and analyse single-cell ATAC-seq is not yet available.ResultsThis paper presents ScAsAT (single-cell ATAC-seq analysis tool), a complete pipeline to process scATAC-seq data with simple steps. The pipeline is developed in a Jupyter notebook environment that holds the executable code along with the necessary description and results. For the initial sequence processing steps, the pipeline uses a number of well-known tools which it executes from a python environment for each of the fastq files. While functions for the data analysis part are mostly written in R, it is robust, flexible, interactive and easy to extend. The pipeline was applied to a single-cell ATAC-seq dataset in order to identify different cell-types from a complex cell mixture. The results from Scasat showed that open chromatin location corresponding to potential regulatory elements can account for cellular heterogeneity and can identify regulatory regions that separates cells from a complex population.AvailabilityThe jupyter notebook with the complete pipeline applied to the dataset published with this paper are publicly available on the Github (https://github.com/ManchesterBioinference/Scasat). An additional notebook is also provided for analysis of a publicly available dataset. The fastq files are submitted at ArrayExpress database at EMBL-EBI (www.ebi.ac.uk/arrayexpress) under accession number [email protected] and [email protected] informationSupplementary data are available at bioRxiv online.

Role of Transcriptional Read-Through in PRE Activity in Drosophila melanogaster

Acta Naturae ◽

10.32607/20758251-2016-8-2-79-86 ◽

2016 ◽

Vol 8 (2) ◽

pp. 79-86 ◽

Cited By ~ 3

Author(s):

P. V. Elizar’ev ◽

D. V. Lomaev ◽

D. A. Chetverina ◽

P. G. Georgiev ◽

M. M. Erokhin

Keyword(s):

Dna Sequences ◽

Cell Types ◽

Transcription Terminator ◽

Response Elements ◽

Polycomb Response Elements ◽

The Individual ◽

Multicellular Organisms ◽

Different Cell Types ◽

Main Factor

Maintenance of the individual patterns of gene expression in different cell types is required for the differentiation and development of multicellular organisms. Expression of many genes is controlled by Polycomb (PcG) and Trithorax (TrxG) group proteins that act through association with chromatin. PcG/TrxG are assembled on the DNA sequences termed PREs (Polycomb Response Elements), the activity of which can be modulated and switched from repression to activation. In this study, we analyzed the influence of transcriptional read-through on PRE activity switch mediated by the yeast activator GAL4. We show that a transcription terminator inserted between the promoter and PRE doesnt prevent switching of PRE activity from repression to activation. We demonstrate that, independently of PRE orientation, high levels of transcription fail to dislodge PcG/TrxG proteins from PRE in the absence of a terminator. Thus, transcription is not the main factor required for PRE activity switch.

Advantages of using graph databases to explore chromatin conformation capture experiments

BMC Bioinformatics ◽

10.1186/s12859-020-03937-0 ◽

2021 ◽

Vol 22 (S2) ◽

Author(s):

Daniele D’Agostino ◽

Pietro Liò ◽

Marco Aldinucci ◽

Ivan Merelli

Keyword(s):

Web Application ◽

High Throughput Sequencing ◽

Cell Types ◽

Graph Database ◽

Graph Databases ◽

Sources Of Information ◽

Chromosome Conformation ◽

Wide Scale ◽

User Friendly ◽

Different Cell Types

Abstract Background High-throughput sequencing Chromosome Conformation Capture (Hi-C) allows the study of DNA interactions and 3D chromosome folding at the genome-wide scale. Usually, these data are represented as matrices describing the binary contacts among the different chromosome regions. On the other hand, a graph-based representation can be advantageous to describe the complex topology achieved by the DNA in the nucleus of eukaryotic cells. Methods Here we discuss the use of a graph database for storing and analysing data achieved by performing Hi-C experiments. The main issue is the size of the produced data and, working with a graph-based representation, the consequent necessity of adequately managing a large number of edges (contacts) connecting nodes (genes), which represents the sources of information. For this, currently available graph visualisation tools and libraries fall short with Hi-C data. The use of graph databases, instead, supports both the analysis and the visualisation of the spatial pattern present in Hi-C data, in particular for comparing different experiments or for re-mapping omics data in a space-aware context efficiently. In particular, the possibility of describing graphs through statistical indicators and, even more, the capability of correlating them through statistical distributions allows highlighting similarities and differences among different Hi-C experiments, in different cell conditions or different cell types. Results These concepts have been implemented in NeoHiC, an open-source and user-friendly web application for the progressive visualisation and analysis of Hi-C networks based on the use of the Neo4j graph database (version 3.5). Conclusion With the accumulation of more experiments, the tool will provide invaluable support to compare neighbours of genes across experiments and conditions, helping in highlighting changes in functional domains and identifying new co-organised genomic compartments.

Heritable pattern of oxidized DNA base repair coincides with pre-targeting of repair complexes to open chromatin

Nucleic Acids Research ◽

10.1093/nar/gkaa1120 ◽

2020 ◽

Cited By ~ 1

Author(s):

Albino Bacolla ◽

Shiladitya Sengupta ◽

Zu Ye ◽

Chunying Yang ◽

Joy Mitra ◽

...

Keyword(s):

Genome Stability ◽

High Throughput Sequencing ◽

Excision Repair ◽

Essential Gene ◽

Atmospheric Oxygen ◽

Super Resolution ◽

Population Variation ◽

Mutation Rates ◽

Open Chromatin ◽

Base Excision

Abstract Human genome stability requires efficient repair of oxidized bases, which is initiated via damage recognition and excision by NEIL1 and other base excision repair (BER) pathway DNA glycosylases (DGs). However, the biological mechanisms underlying detection of damaged bases among the million-fold excess of undamaged bases remain enigmatic. Indeed, mutation rates vary greatly within individual genomes, and lesion recognition by purified DGs in the chromatin context is inefficient. Employing super-resolution microscopy and co-immunoprecipitation assays, we find that acetylated NEIL1 (AcNEIL1), but not its non-acetylated form, is predominantly localized in the nucleus in association with epigenetic marks of uncondensed chromatin. Furthermore, chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) revealed non-random AcNEIL1 binding near transcription start sites of weakly transcribed genes and along highly transcribed chromatin domains. Bioinformatic analyses revealed a striking correspondence between AcNEIL1 occupancy along the genome and mutation rates, with AcNEIL1-occupied sites exhibiting fewer mutations compared to AcNEIL1-free domains, both in cancer genomes and in population variation. Intriguingly, from the evolutionarily conserved unstructured domain that targets NEIL1 to open chromatin, its damage surveillance of highly oxidation-susceptible sites to preserve essential gene function and to limit instability and cancer likely originated ∼500 million years ago during the buildup of free atmospheric oxygen.

Periphyton diversity in two different Antarctic lakes assessed using metabarcoding

Antarctic Science ◽

10.1017/s0954102021000316 ◽

2021 ◽

pp. 1-9

Author(s):

Paulo E.A.S. Câmara ◽

Láuren M.D. De Souza ◽

Otávio Henrique Bezerra Pinto ◽

Peter Convey ◽

Eduardo T. Amorim ◽

...

Keyword(s):

Dna Sequences ◽

High Throughput Sequencing ◽

Shannon Index ◽

South Shetland Islands ◽

King George Island ◽

Deception Island ◽

Maritime Antarctic ◽

Shetland Islands ◽

Antarctic Lakes ◽

Significant Difference

Abstract Antarctic lakes have generally simple periphyton communities when compared with those of lower latitudes. To date, assessment of microbial diversity in Antarctica has relied heavily on traditional direct observation and cultivation methods. In this study, sterilized cotton baits were left submerged for two years in two lakes on King George Island and Deception Island, South Shetland Islands (Maritime Antarctic), followed by assessment of diversity by metabarcoding using high-throughput sequencing. DNA sequences of 44 taxa belonging to four kingdoms and seven phyla were found. Thirty-six taxa were detected in Hennequin Lake on King George Island and 20 taxa were detected in Soto Lake on Deception Island. However, no significant difference in species composition was detected between the two assemblages (Shannon index). Our data suggest that metabarcoding provides a suitable method for the assessment of periphyton biodiversity in oligotrophic Antarctic lakes.