Learning immune cell differentiation

SUMMARYThe mammalian genome contains several million cis-regulatory elements, whose differential activity marked by open chromatin determines organogenesis and differentiation. This activity is itself embedded in the DNA sequence, decoded by sequence-specific transcription factors. Leveraging a granular ATAC-seq atlas of chromatin activity across 81 immune cell-types we show that a convolutional neural network (“AI-TAC”) can learn to infer cell-type-specific chromatin activity solely from the DNA sequence. AI-TAC does so by rediscovering, with astonishing precision, binding motifs for known regulators, and some unknown ones, mapping them with high concordance to positions validated by ChIP-seq data. AI-TAC also uncovers combinatorial influences, establishing a hierarchy of transcription factors (TFs) and their interactions involved in immunocyte specification, with intriguingly different strategies between lineages. Mouse-trained AI-TAC can parse human DNA, revealing a strikingly similar ranking of influential TFs. Thus, Deep Learning can reveal the regulatory syntax that drives the full differentiative complexity of the immune system.

Download Full-text

Analysis of putative cis-regulatory elements regulating blood pressure variation

Human Molecular Genetics ◽

10.1093/hmg/ddaa098 ◽

2020 ◽

Vol 29 (11) ◽

pp. 1922-1932

Author(s):

Priyanka Nandakumar ◽

Dongwon Lee ◽

Thomas J Hoffmann ◽

Georg B Ehret ◽

Dan Arking ◽

...

Keyword(s):

Blood Pressure ◽

Association Studies ◽

Specific Effect ◽

Cell Types ◽

Regulatory Elements ◽

Open Chromatin ◽

Genome Wide Association Studies ◽

Cell Type ◽

Functional Scores ◽

Cell Type Specific

Abstract Hundreds of loci have been associated with blood pressure (BP) traits from many genome-wide association studies. We identified an enrichment of these loci in aorta and tibial artery expression quantitative trait loci in our previous work in ~100 000 Genetic Epidemiology Research on Aging study participants. In the present study, we sought to fine-map known loci and identify novel genes by determining putative regulatory regions for these and other tissues relevant to BP. We constructed maps of putative cis-regulatory elements (CREs) using publicly available open chromatin data for the heart, aorta and tibial arteries, and multiple kidney cell types. Variants within these regions may be evaluated quantitatively for their tissue- or cell-type-specific regulatory impact using deltaSVM functional scores, as described in our previous work. We aggregate variants within these putative CREs within 50 Kb of the start or end of ‘expressed’ genes in these tissues or cell types using public expression data and use deltaSVM scores as weights in the group-wise sequence kernel association test to identify candidates. We test for association with both BP traits and expression within these tissues or cell types of interest and identify the candidates MTHFR, C10orf32, CSK, NOV, ULK4, SDCCAG8, SCAMP5, RPP25, HDGFRP3, VPS37B and PPCDC. Additionally, we examined two known QT interval genes, SCN5A and NOS1AP, in the Atherosclerosis Risk in Communities Study, as a positive control, and observed the expected heart-specific effect. Thus, our method identifies variants and genes for further functional testing using tissue- or cell-type-specific putative regulatory information.

Download Full-text

Analysis of putative cis-regulatory elements regulating blood pressure variation

10.1101/820522 ◽

2019 ◽

Author(s):

Priyanka Nandakumar ◽

Dongwon Lee ◽

Thomas J. Hoffmann ◽

Georg B. Ehret ◽

Dan Arking ◽

...

Keyword(s):

Gene Expression ◽

Blood Pressure ◽

Cell Types ◽

Regulatory Elements ◽

Open Chromatin ◽

Genome Wide Association Studies ◽

Cell Type ◽

Functional Scores ◽

Cell Type Specific ◽

Different Tissues

AbstractHundreds of loci have been associated with blood pressure traits from many genome-wide association studies. We identified an enrichment of these loci in aorta and tibial artery expression quantitative trait loci in our previous work in ∼100,000 Genetic Epidemiology Research on Aging (GERA) study participants. In the present study, we subsequently focused on determining putative regulatory regions for these and other tissues of relevance to blood pressure, to both fine-map these loci by pinpointing genes and variants of functional interest within them, and to identify any novel genes.We constructed maps of putative cis-regulatory elements using publicly available open chromatin data for the heart, aorta and tibial arteries, and multiple kidney cell types. Sequence variants within these regions may be evaluated quantitatively for their tissue- or cell-type-specific regulatory impact using deltaSVM functional scores, as described in our previous work. In order to identify genes of interest, we aggregate these variants in these putative cis-regulatory elements within 50Kb of the start or end of genes considered as “expressed” in these tissues or cell types using publicly available gene expression data, and use the deltaSVM scores as weights in the well-known group-wise sequence kernel association test (SKAT). We test for association with both blood pressure traits as well as expression within these tissues or cell types of interest, and identify several genes, including MTHFR, C10orf32, CSK, NOV, ULK4, SDCCAG8, SCAMP5, RPP25, HDGFRP3, VPS37B, and PPCDC. Although our study centers on blood pressure traits, we additionally examined two known genes, SCN5A and NOS1AP involved in the cardiac trait QT interval, in the Atherosclerosis Risk in Communities Study (ARIC), as a positive control, and observed an expected heart-specific effect. Thus, our method may be used to identify variants and genes for further functional testing using tissue- or cell-type-specific putative regulatory information.Author SummarySequence change in genes (“variants”) are linked to the presence and severity of different traits or diseases. However, as genes may be expressed in different tissues and at different times and degrees, using this information is expected to more accurately identify genes of interest. Variants within the genes are essential, but also in the sequences (“regulatory elements”) that control the genes’ expression in different tissues or cell types. In this study, we aim to use this information about expression and variants potentially involved in gene expression regulation to better pinpoint genes and variants in regulatory elements of interest for blood pressure regulation. We do so by taking advantage of such data that are publicly available, and use methods to combine information about variants in aggregate within a gene’s putative regulatory elements in tissues thought to be relevant for blood pressure, and identify several genes, meant to enable experimental follow-up.

Download Full-text

CellWalker integrates single-cell and bulk data to resolve regulatory elements across cell types in complex tissues

10.1101/847657 ◽

2019 ◽

Cited By ~ 1

Author(s):

Pawel F. Przytycki ◽

Katherine S. Pollard

Keyword(s):

Single Cell ◽

Cell Types ◽

Regulatory Elements ◽

Cell Labeling ◽

Open Chromatin ◽

Specific Cell ◽

Rna Seq ◽

Data Types ◽

Bulk Data ◽

Cell Type Specific

Single-cell and bulk genomics assays have complementary strengths and weaknesses, and alone neither strategy can fully capture regulatory elements across the diversity of cells in complex tissues. We present CellWalker, a method that integrates single-cell open chromatin (scATAC-seq) data with gene expression (RNA-seq) and other data types using a network model that simultaneously improves cell labeling in noisy scATAC-seq and annotates cell-type specific regulatory elements in bulk data. We demonstrate CellWalker’s robustness to sparse annotations and noise using simulations and combined RNA-seq and ATAC-seq in individual cells. We then apply CellWalker to the developing brain. We identify cells transitioning between transcriptional states, resolve enhancers to specific cell types, and observe that autism and other neurological traits can be mapped to specific cell types through their enhancers.

Download Full-text

Addiction-associated genetic variants implicate brain cell type- and region-specific cis-regulatory elements in addiction neurobiology

10.1101/2020.09.29.318329 ◽

2020 ◽

Cited By ~ 1

Author(s):

Chaitanya Srinivasan ◽

BaDoi N. Phan ◽

Alyssa J. Lawler ◽

Easwaran Ramamurthy ◽

Michael Kleyman ◽

...

Keyword(s):

Genetic Variants ◽

Cell Types ◽

Regulatory Elements ◽

Brain Regions ◽

Open Chromatin ◽

Genome Wide Association Studies ◽

Cell Type ◽

Coding Regions ◽

Cell Type Specific ◽

The Impact

ABSTRACTRecent large genome-wide association studies (GWAS) have identified multiple confident risk loci linked to addiction-associated behavioral traits. Genetic variants linked to addiction-associated traits lie largely in non-coding regions of the genome, likely disrupting cis-regulatory element (CRE) function. CREs tend to be highly cell type-specific and may contribute to the functional development of the neural circuits underlying addiction. Yet, a systematic approach for predicting the impact of risk variants on the CREs of specific cell populations is lacking. To dissect the cell types and brain regions underlying addiction-associated traits, we applied LD score regression to compare GWAS to genomic regions collected from human and mouse assays for open chromatin, which is associated with CRE activity. We found enrichment of addiction-associated variants in putative regulatory elements marked by open chromatin in neuronal (NeuN+) nuclei collected from multiple prefrontal cortical areas and striatal regions known to play major roles in reward and addiction. To further dissect the cell type-specific basis of addiction-associated traits, we also identified enrichments in human orthologs of open chromatin regions of mouse neuron subtypes: cortical excitatory, PV, D1, and D2. Lastly, we developed machine learning models from mouse cell type-specific regions of open chromatin to further dissect human NeuN+ open chromatin regions into cortical excitatory or striatal D1 and D2 neurons and predict the functional impact of addiction-associated genetic variants. Our results suggest that different neuron subtypes within the reward system play distinct roles in the variety of traits that contribute to addiction.Significance StatementOur study on cell types and brain regions contributing to heritability of addiction-associated traits suggests that the conserved non-coding regions within cortical excitatory and striatal medium spiny neurons contribute to genetic predisposition for nicotine, alcohol, and cannabis use behaviors. This computational framework can flexibly integrate epigenomic data across species to screen for putative causal variants in a cell type- and tissue-specific manner across numerous complex traits.

Download Full-text

IMPACT: Genomic annotation of cell-state-specific regulatory elements inferred from the epigenome of bound transcription factors

10.1101/366864 ◽

2018 ◽

Cited By ~ 1

Author(s):

Tiffany Amariuta ◽

Yang Luo ◽

Steven Gazal ◽

Emma E. Davenport ◽

Bryce van de Geijn ◽

...

Keyword(s):

Transcription Factors ◽

Rna Polymerase Ii ◽

Complex Trait ◽

Cell Types ◽

Regulatory Elements ◽

Open Chromatin ◽

Value Capture ◽

Cell State ◽

A Genome ◽

Significant Enrichment

Despite significant progress in annotating the genome with experimental methods, much of the regulatory noncoding genome remains poorly defined. Here we assert that regulatory elements may be characterized by leveraging local epigenomic signatures at sites where specific transcription factors (TFs) are bound. To link these two identifying features, we introduce IMPACT, a genome annotation strategy which identifies regulatory elements defined by cell-state-specific TF binding profiles, learned from 515 chromatin and sequence annotations. We validate IMPACT using multiple compelling applications. First, IMPACT predicts TF motif binding with high accuracy (average AUC 0.92, s.e. 0.03; across 8 TFs), a significant improvement (all p<6.9e-15) over intersecting motifs with open chromatin (average AUC 0.66, s.e. 0.11). Second, an IMPACT annotation trained on RNA polymerase II is more enriched for peripheral blood cis-eQTL variation (N=3,754) than sequence based annotations, such as promoters and regions around the TSS, (permutation p<1e-3, 25% average increase in enrichment). Third, integration with rheumatoid arthritis (RA) summary statistics from European (N=38,242) and East Asian (N=22,515) populations revealed that the top 5% of CD4+ Treg IMPACT regulatory elements capture 85.7% (s.e. 19.4%) of RA h2 (p<1.6e-5) and that the top 9.8% of Treg IMPACT regulatory elements, consisting of all SNPs with a non-zero annotation value, capture 97.3% (s.e. 18.2%) of RA h2 (p<7.6e-7), the most comprehensive explanation for RA h2 to date. In comparison, the average RA h2 captured by compared CD4+ T histone marks is 42.3% and by CD4+ T specifically expressed gene sets is 36.4%. Finally, integration with RA fine-mapping data (N=27,345) revealed a significant enrichment (2.87, p<8.6e-3) of putatively causal variants across 20 RA associated loci in the top 1% of CD4+ Treg IMPACT regulatory regions. Overall, we find that IMPACT generalizes well to other cell types in identifying complex trait associated regulatory elements.

Download Full-text

Profiling of accessible chromatin regions across multiple plant species and cell types reveals common gene regulatory principles and new control modules

10.1101/167932 ◽

2017 ◽

Cited By ~ 1

Author(s):

Kelsey A. Maher ◽

Marko Bajic ◽

Kaisa Kajala ◽

Mauricio Reynoso ◽

Germain Pauluzzi ◽

...

Keyword(s):

Hair Cell ◽

Plant Species ◽

Stress Responses ◽

Binding Sites ◽

Cell Types ◽

Regulatory Elements ◽

Open Chromatin ◽

Cell Type ◽

Cell Type Specific ◽

Accessible Chromatin

ABSTRACTThe transcriptional regulatory structure of plant genomes remains poorly defined relative to animals. It is unclear how many cis-regulatory elements exist, where these elements lie relative to promoters, and how these features are conserved across plant species. We employed the Assay for Transposase-Accessible Chromatin (ATAC-seq) in four plant species (Arabidopsis thaliana, Medicago truncatula, Solanum lycopersicum, and Oryza sativa) to delineate open chromatin regions and transcription factor (TF) binding sites across each genome. Despite 10-fold variation in intergenic space among species, the majority of open chromatin regions lie within 3 kb upstream of a transcription start site in all species. We find a common set of four TFs that appear to regulate conserved gene sets in the root tips of all four species, suggesting that TF-gene networks are generally conserved. Comparative ATAC-seq profiling of Arabidopsis root hair and non-hair cell types revealed extensive similarity as well as many cell type-specific differences. Analyzing TF binding sites in differentially accessible regions identified a MYB-driven regulatory module unique to the hair cell, which appears to control both cell fate regulators and abiotic stress responses. Our analyses revealed common regulatory principles among species and shed light on the mechanisms producing cell type-specific transcriptomes during development.

Download Full-text

CellWalker integrates single-cell and bulk data to resolve regulatory elements across cell types in complex tissues

Genome Biology ◽

10.1186/s13059-021-02279-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Pawel F. Przytycki ◽

Katherine S. Pollard

Keyword(s):

Single Cell ◽

Cell Types ◽

Regulatory Elements ◽

Cell Labeling ◽

Open Chromatin ◽

Specific Cell ◽

Rna Seq ◽

Data Types ◽

Bulk Data ◽

Cell Type Specific

AbstractSingle-cell and bulk genomics assays have complementary strengths and weaknesses, and alone neither strategy can fully capture regulatory elements across the diversity of cells in complex tissues. We present CellWalker, a method that integrates single-cell open chromatin (scATAC-seq) data with gene expression (RNA-seq) and other data types using a network model that simultaneously improves cell labeling in noisy scATAC-seq and annotates cell type-specific regulatory elements in bulk data. We demonstrate CellWalker’s robustness to sparse annotations and noise using simulations and combined RNA-seq and ATAC-seq in individual cells. We then apply CellWalker to the developing brain. We identify cells transitioning between transcriptional states, resolve regulatory elements to cell types, and observe that autism and other neurological traits can be mapped to specific cell types through their regulatory elements.

Download Full-text

Faculty Opinions recommendation of Human cardiac cis-regulatory elements, their cognate transcription factors, and regulatory DNA sequence variants.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.733852054.793550234 ◽

2018 ◽

Author(s):

Maria Karayiorgou

Keyword(s):

Transcription Factors ◽

Dna Sequence ◽

Regulatory Elements ◽

Sequence Variants ◽

Dna Sequence Variants ◽

Regulatory Dna

Download Full-text

Promoter G-quadruplexes and transcription factors cooperate to shape the cell type-specific transcriptome

Nature Communications ◽

10.1038/s41467-021-24198-2 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Sara Lago ◽

Matteo Nadai ◽

Filippo M. Cernilogar ◽

Maryam Kazerani ◽

Helena Domíniguez Moreno ◽

...

Keyword(s):

Transcription Factors ◽

Binding Sites ◽

Specific Gene ◽

Open Chromatin ◽

Rna Seq ◽

Transcript Levels ◽

Promoter Sequences ◽

G Quadruplex ◽

Cell Type Specific ◽

Folding State

AbstractCell identity is maintained by activation of cell-specific gene programs, regulated by epigenetic marks, transcription factors and chromatin organization. DNA G-quadruplex (G4)-folded regions in cells were reported to be associated with either increased or decreased transcriptional activity. By G4-ChIP-seq/RNA-seq analysis on liposarcoma cells we confirmed that G4s in promoters are invariably associated with high transcription levels in open chromatin. Comparing G4 presence, location and transcript levels in liposarcoma cells to available data on keratinocytes, we showed that the same promoter sequences of the same genes in the two cell lines had different G4-folding state: high transcript levels consistently associated with G4-folding. Transcription factors AP-1 and SP1, whose binding sites were the most significantly represented in G4-folded sequences, coimmunoprecipitated with their G4-folded promoters. Thus, G4s and their associated transcription factors cooperate to determine cell-specific transcriptional programs, making G4s to strongly emerge as new epigenetic regulators of the transcription machinery.

Download Full-text

Comprehensive analysis of single cell ATAC-seq data with SnapATAC

Nature Communications ◽

10.1038/s41467-021-21583-9 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Rongxin Fang ◽

Sebastian Preissl ◽

Yang Li ◽

Xiaomeng Hou ◽

Jacinta Lucero ◽

...

Keyword(s):

Single Cell ◽

Single Cell Analysis ◽

Expression Patterns ◽

Regulatory Elements ◽

Cellular Heterogeneity ◽

Specific Gene ◽

Open Chromatin ◽

Cell Type ◽

Process Data ◽

Cell Type Specific

AbstractIdentification of the cis-regulatory elements controlling cell-type specific gene expression patterns is essential for understanding the origin of cellular diversity. Conventional assays to map regulatory elements via open chromatin analysis of primary tissues is hindered by sample heterogeneity. Single cell analysis of accessible chromatin (scATAC-seq) can overcome this limitation. However, the high-level noise of each single cell profile and the large volume of data pose unique computational challenges. Here, we introduce SnapATAC, a software package for analyzing scATAC-seq datasets. SnapATAC dissects cellular heterogeneity in an unbiased manner and map the trajectories of cellular states. Using the Nyström method, SnapATAC can process data from up to a million cells. Furthermore, SnapATAC incorporates existing tools into a comprehensive package for analyzing single cell ATAC-seq dataset. As demonstration of its utility, SnapATAC is applied to 55,592 single-nucleus ATAC-seq profiles from the mouse secondary motor cortex. The analysis reveals ~370,000 candidate regulatory elements in 31 distinct cell populations in this brain region and inferred candidate cell-type specific transcriptional regulators.

Download Full-text