Cis-regulatory code for predicting plant cell-type specific high salinity response

AbstractMulticellular organisms have diverse cell types with distinct roles in development and responses to the environment. At the transcriptional level, the differences in environmental response between cell types are due to differences in regulatory programs. In plants, although cell-type environmental responses have been examined, details on how these responses are regulated remain spotty. Here, we identify a set of putative cis-regulatory elements (pCREs) enriched in the promoters of genes responsive to high salinity stress in six Arabidopsis thaliana root cell types. Using machine learning with pCREs as predictors, we establish cis-regulatory codes, i.e. models predicting whether a gene is responsive to high salinity for each cell type. These pCRE-based models outperform models utilizing in vitro binding data of 758 A. thaliana transcription factors. Surprisingly, organ pCREs identified based on whole root high salinity response can predict cell-type responses as well as pCREs derived from cell-type data -because organ and cell-type pCREs predict complementary subsets of high salinity response genes. Our findings not only advance our understanding of the regulatory mechanisms of plant spatial transcriptional response through cis-regulatory codes, but also suggest broad applicability of the approach to any species, particularly those with little or no trans regulatory data.

Download Full-text

accuEnhancer: Accurate enhancer prediction by integration of multiple cell type data with deep learning

10.1101/2020.11.10.375717 ◽

2020 ◽

Author(s):

Yi-An Tung ◽

Wen-Tse Yang ◽

Tsung-Ting Hsieh ◽

Yu-Chuan Chang ◽

June-Tai Wu ◽

...

Keyword(s):

Deep Learning ◽

Cell Types ◽

Regulatory Elements ◽

Sequence Motifs ◽

Cell Type ◽

Enhancer Activity ◽

Multiple Cell ◽

Type Data ◽

Different Cell Types ◽

Multiple Cell Type

AbstractEnhancers are one class of the regulatory elements that have been shown to act as key components to assist promoters in modulating the gene expression in living cells. At present, the number of enhancers as well as their activities in different cell types are still largely unclear. Previous studies have shown that enhancer activities are associated with various functional data, such as histone modifications, sequence motifs, and chromatin accessibilities. In this study, we utilized DNase data to build a deep learning model for predicting the H3K27ac peaks as the active enhancers in a target cell type. We propose joint training of multiple cell types to boost the model performance in predicting the enhancer activities of an unstudied cell type. The results demonstrated that by incorporating more datasets across different cell types, the complex regulatory patterns could be captured by deep learning models and the prediction accuracy can be largely improved. The analyses conducted in this study demonstrated that the cell type-specific enhancer activity can be predicted by joint learning of multiple cell type data using only DNase data and the primitive sequences as the input features. This reveals the importance of cross-cell type learning, and the constructed model can be applied to investigate potential active enhancers of a novel cell type which does not have the H3K27ac modification data yet.AvailabilityThe accuEnhancer package can be freely accessed at: https://github.com/callsobing/accuEnhancer

Download Full-text

A scalable platform for the development of cell-type-specific viral drivers

eLife ◽

10.7554/elife.48089 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 12

Author(s):

Sinisa Hrvatin ◽

Christopher P Tzeng ◽

M Aurel Nagy ◽

Hume Stroud ◽

Charalampia Koutsioumpa ◽

...

Keyword(s):

Gene Expression ◽

Heterologous Gene Expression ◽

High Specificity ◽

Cell Types ◽

Regulatory Elements ◽

Cell Type ◽

Cell Type Specificity ◽

Cell Type Specific ◽

The Many ◽

Dna Regulatory Elements

Enhancers are the primary DNA regulatory elements that confer cell type specificity of gene expression. Recent studies characterizing individual enhancers have revealed their potential to direct heterologous gene expression in a highly cell-type-specific manner. However, it has not yet been possible to systematically identify and test the function of enhancers for each of the many cell types in an organism. We have developed PESCA, a scalable and generalizable method that leverages ATAC- and single-cell RNA-sequencing protocols, to characterize cell-type-specific enhancers that should enable genetic access and perturbation of gene function across mammalian cell types. Focusing on the highly heterogeneous mammalian cerebral cortex, we apply PESCA to find enhancers and generate viral reagents capable of accessing and manipulating a subset of somatostatin-expressing cortical interneurons with high specificity. This study demonstrates the utility of this platform for developing new cell-type-specific viral reagents, with significant implications for both basic and translational research.

Download Full-text

Analysis of putative cis-regulatory elements regulating blood pressure variation

Human Molecular Genetics ◽

10.1093/hmg/ddaa098 ◽

2020 ◽

Vol 29 (11) ◽

pp. 1922-1932

Author(s):

Priyanka Nandakumar ◽

Dongwon Lee ◽

Thomas J Hoffmann ◽

Georg B Ehret ◽

Dan Arking ◽

...

Keyword(s):

Blood Pressure ◽

Association Studies ◽

Specific Effect ◽

Cell Types ◽

Regulatory Elements ◽

Open Chromatin ◽

Genome Wide Association Studies ◽

Cell Type ◽

Functional Scores ◽

Cell Type Specific

Abstract Hundreds of loci have been associated with blood pressure (BP) traits from many genome-wide association studies. We identified an enrichment of these loci in aorta and tibial artery expression quantitative trait loci in our previous work in ~100 000 Genetic Epidemiology Research on Aging study participants. In the present study, we sought to fine-map known loci and identify novel genes by determining putative regulatory regions for these and other tissues relevant to BP. We constructed maps of putative cis-regulatory elements (CREs) using publicly available open chromatin data for the heart, aorta and tibial arteries, and multiple kidney cell types. Variants within these regions may be evaluated quantitatively for their tissue- or cell-type-specific regulatory impact using deltaSVM functional scores, as described in our previous work. We aggregate variants within these putative CREs within 50 Kb of the start or end of ‘expressed’ genes in these tissues or cell types using public expression data and use deltaSVM scores as weights in the group-wise sequence kernel association test to identify candidates. We test for association with both BP traits and expression within these tissues or cell types of interest and identify the candidates MTHFR, C10orf32, CSK, NOV, ULK4, SDCCAG8, SCAMP5, RPP25, HDGFRP3, VPS37B and PPCDC. Additionally, we examined two known QT interval genes, SCN5A and NOS1AP, in the Atherosclerosis Risk in Communities Study, as a positive control, and observed the expected heart-specific effect. Thus, our method identifies variants and genes for further functional testing using tissue- or cell-type-specific putative regulatory information.

Download Full-text

Evolution of regulatory signatures in primate cortical neurons at cell-type resolution

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2011884117 ◽

2020 ◽

Vol 117 (45) ◽

pp. 28422-28432

Author(s):

Alexey Kozlenkov ◽

Marit W. Vermunt ◽

Pasha Apontes ◽

Junhao Li ◽

Ke Hao ◽

...

Keyword(s):

Cortical Neurons ◽

Brain Evolution ◽

Cell Types ◽

Regulatory Elements ◽

Autism Spectrum ◽

Projection Neurons ◽

Evolutionary Divergence ◽

Cell Type ◽

Tissue Samples ◽

Functional Changes

The human cerebral cortex contains many cell types that likely underwent independent functional changes during evolution. However, cell-type–specific regulatory landscapes in the cortex remain largely unexplored. Here we report epigenomic and transcriptomic analyses of the two main cortical neuronal subtypes, glutamatergic projection neurons and GABAergic interneurons, in human, chimpanzee, and rhesus macaque. Using genome-wide profiling of the H3K27ac histone modification, we identify neuron-subtype–specific regulatory elements that previously went undetected in bulk brain tissue samples. Human-specific regulatory changes are uncovered in multiple genes, including those associated with language, autism spectrum disorder, and drug addiction. We observe preferential evolutionary divergence in neuron subtype-specific regulatory elements and show that a substantial fraction of pan-neuronal regulatory elements undergoes subtype-specific evolutionary changes. This study sheds light on the interplay between regulatory evolution and cell-type–dependent gene-expression programs, and provides a resource for further exploration of human brain evolution and function.

Download Full-text

Analysis of putative cis-regulatory elements regulating blood pressure variation

10.1101/820522 ◽

2019 ◽

Author(s):

Priyanka Nandakumar ◽

Dongwon Lee ◽

Thomas J. Hoffmann ◽

Georg B. Ehret ◽

Dan Arking ◽

...

Keyword(s):

Gene Expression ◽

Blood Pressure ◽

Cell Types ◽

Regulatory Elements ◽

Open Chromatin ◽

Genome Wide Association Studies ◽

Cell Type ◽

Functional Scores ◽

Cell Type Specific ◽

Different Tissues

AbstractHundreds of loci have been associated with blood pressure traits from many genome-wide association studies. We identified an enrichment of these loci in aorta and tibial artery expression quantitative trait loci in our previous work in ∼100,000 Genetic Epidemiology Research on Aging (GERA) study participants. In the present study, we subsequently focused on determining putative regulatory regions for these and other tissues of relevance to blood pressure, to both fine-map these loci by pinpointing genes and variants of functional interest within them, and to identify any novel genes.We constructed maps of putative cis-regulatory elements using publicly available open chromatin data for the heart, aorta and tibial arteries, and multiple kidney cell types. Sequence variants within these regions may be evaluated quantitatively for their tissue- or cell-type-specific regulatory impact using deltaSVM functional scores, as described in our previous work. In order to identify genes of interest, we aggregate these variants in these putative cis-regulatory elements within 50Kb of the start or end of genes considered as “expressed” in these tissues or cell types using publicly available gene expression data, and use the deltaSVM scores as weights in the well-known group-wise sequence kernel association test (SKAT). We test for association with both blood pressure traits as well as expression within these tissues or cell types of interest, and identify several genes, including MTHFR, C10orf32, CSK, NOV, ULK4, SDCCAG8, SCAMP5, RPP25, HDGFRP3, VPS37B, and PPCDC. Although our study centers on blood pressure traits, we additionally examined two known genes, SCN5A and NOS1AP involved in the cardiac trait QT interval, in the Atherosclerosis Risk in Communities Study (ARIC), as a positive control, and observed an expected heart-specific effect. Thus, our method may be used to identify variants and genes for further functional testing using tissue- or cell-type-specific putative regulatory information.Author SummarySequence change in genes (“variants”) are linked to the presence and severity of different traits or diseases. However, as genes may be expressed in different tissues and at different times and degrees, using this information is expected to more accurately identify genes of interest. Variants within the genes are essential, but also in the sequences (“regulatory elements”) that control the genes’ expression in different tissues or cell types. In this study, we aim to use this information about expression and variants potentially involved in gene expression regulation to better pinpoint genes and variants in regulatory elements of interest for blood pressure regulation. We do so by taking advantage of such data that are publicly available, and use methods to combine information about variants in aggregate within a gene’s putative regulatory elements in tissues thought to be relevant for blood pressure, and identify several genes, meant to enable experimental follow-up.

Download Full-text

DNA Methylation Atlas of the Mouse Brain at Single-Cell Resolution

10.1101/2020.04.30.069377 ◽

2020 ◽

Cited By ~ 1

Author(s):

Hanqing Liu ◽

Jingtian Zhou ◽

Wei Tian ◽

Chongyuan Luo ◽

Anna Bartlett ◽

...

Keyword(s):

Dna Methylation ◽

Mouse Brain ◽

Spatial Organization ◽

Brain Area ◽

Cell Types ◽

Regulatory Elements ◽

Mammalian Brain ◽

Open Chromatin ◽

Cell Type ◽

Single Nucleus

SummaryMammalian brain cells are remarkably diverse in gene expression, anatomy, and function, yet the regulatory DNA landscape underlying this extensive heterogeneity is poorly understood. We carried out a comprehensive assessment of the epigenomes of mouse brain cell types by applying single nucleus DNA methylation sequencing to profile 110,294 nuclei from 45 regions of the mouse cortex, hippocampus, striatum, pallidum, and olfactory areas. We identified 161 cell clusters with distinct spatial locations and projection targets. We constructed taxonomies of these epigenetic types, annotated with signature genes, regulatory elements, and transcription factors. These features indicate the potential regulatory landscape supporting the assignment of putative cell types, and reveal repetitive usage of regulators in excitatory and inhibitory cells for determining subtypes. The DNA methylation landscape of excitatory neurons in the cortex and hippocampus varied continuously along spatial gradients. Using this deep dataset, an artificial neural network model was constructed that precisely predicts single neuron cell-type identity and brain area spatial location. Integration of high-resolution DNA methylomes with single-nucleus chromatin accessibility data allowed prediction of high-confidence enhancer-gene interactions for all identified cell types, which were subsequently validated by cell-type-specific chromatin conformation capture experiments. By combining multi-omic datasets (DNA methylation, chromatin contacts, and open chromatin) from single nuclei and annotating the regulatory genome of hundreds of cell types in the mouse brain, our DNA methylation atlas establishes the epigenetic basis for neuronal diversity and spatial organization throughout the mouse brain.

Download Full-text

SeqEnhDL: sequence-based classification of cell type-specific enhancers using deep learning models

10.21203/rs.3.rs-94396/v1 ◽

2020 ◽

Author(s):

Yupeng Wang ◽

Rosario Jaime-Lara ◽

Abhrarup Roy ◽

Ying Sun ◽

Xinyue Liu ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Cell Types ◽

Regulatory Elements ◽

Learning Models ◽

Cell Type ◽

Coding Sequences ◽

Sequence Features ◽

A Genome ◽

Cell Type Specific

Abstract ObjectiveComputational identification of cell type-specific regulatory elements on a genome-wide scale is very challenging.ResultsWe propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of “strong enhancer” chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, sequential k-mer (k=5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers including gkm-SVM and DanQ, with regard to distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL is able to directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified according to their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL.

Download Full-text

A unified encyclopedia of human functional DNA elements through fully automated annotation of 164 human cell types

10.1101/086025 ◽

2016 ◽

Cited By ~ 5

Author(s):

Maxwell W. Libbrecht ◽

Oscar Rodriguez ◽

Zhiping Weng ◽

Jeffrey A. Bilmes ◽

Michael M. Hoffman ◽

...

Keyword(s):

Human Cell ◽

Genome Annotation ◽

Cell Types ◽

Regulatory Elements ◽

Activity Score ◽

Data Sets ◽

Cell Type ◽

Automated Annotation ◽

Aggregate Information ◽

Genome Annotations

AbstractSemi-automated genome annotation methods such as Segway enable understanding of chromatin activity. Here we present chromatin state annotations of 164 human cell types using 1,615 genomics data sets. To produce these annotations, we developed a fully-automated annotation strategy in which we train separate unsupervised annotation models on each cell type and use a machine learning classifier to automate the state interpretation step. Using these annotations, we developed a measure of the importance of each genomic position called the “conservation-associated activity score,” which we use to aggregate information across cell types into a multi-cell type view. The aggregated conservation-associated activity score provides a measure of importance directly attributable to a specific activity in a specific set of cell types. In contrast to evolutionary conservation, this measure is not biased to detect only elements shared with related species. Using the conservation-associated activity score, we combined all our annotations into a single, cell type-agnostic encyclopedia that catalogs all human transcriptional and regulatory elements, enabling easy and intuitive interpretation of the effect of genome variants on phenotype, such as in disease-associated, evolutionarily conserved or positively selected loci. These resources, including cell type-specific annotations, encyclopedia, and a visualization server, are available at http://noble.gs.washington.edu/proj/encyclopedia.Author SummaryGenome annotation algorithms are an effective class of tools for understanding the function of the genome. These algorithms take as input a set of genome-wide measurements about the activity at each base pair in a given tissue, such as where a given protein is binding or how accessible the DNA is to being read by a protein. The genome is then partitioned and each segment is assigned a label such that positions with the same label exhibit similar patterns in the input data. Such annotations are widely used for many applications, such as to understand the mechanism of impact of a given genetic variant. Here we present, to our knowledge, the most comprehensive set of genome annotations created so far, encompassing 164 human cell types and including 1,615 genomics data sets. These comprehensive annotations are made possible by a strategy that automates the previous interpretation step. Furthermore, we present several methodological innovations that make these genome annotations more useful.

Download Full-text

The single-cell epigenetic regulatory landscape in mammalian perinatal testis development

10.1101/2021.03.17.435776 ◽

2021 ◽

Author(s):

Jinyue Liao ◽

Hoi Ching Suen ◽

Shitao Rao ◽

Alfred Chun Shui Luk ◽

Ruoyu Zhang ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Fate ◽

Germ Cells ◽

Somatic Cells ◽

Cell Types ◽

Regulatory Elements ◽

Cellular Heterogeneity ◽

Cell Populations ◽

Cell Type

AbstractSpermatogenesis depends on an orchestrated series of developing events in germ cells and full maturation of the somatic microenvironment. To date, the majority of efforts to study cellular heterogeneity in testis has been focused on single-cell gene expression rather than the chromatin landscape shaping gene expression. To advance our understanding of the regulatory programs underlying testicular cell types, we analyzed single-cell chromatin accessibility profiles in more than 25,000 cells from mouse developing testis. We showed that scATAC-Seq allowed us to deconvolve distinct cell populations and identify cis-regulatory elements (CREs) underlying cell type specification. We identified sets of transcription factors associated with cell type-specific accessibility, revealing novel regulators of cell fate specification and maintenance. Pseudotime reconstruction revealed detailed regulatory dynamics coordinating the sequential developmental progressions of germ cells and somatic cells. This high-resolution data also revealed putative stem cells within the Sertoli and Leydig cell populations. Further, we defined candidate target cell types and genes of several GWAS signals, including those associated with testosterone levels and coronary artery disease. Collectively, our data provide a blueprint of the ‘regulon’ of the mouse male germline and supporting somatic cells.

Download Full-text

Single-Cell Epigenomics and Functional Fine-Mapping of Atherosclerosis GWAS Loci

Circulation Research ◽

10.1161/circresaha.121.318971 ◽

2021 ◽

Author(s):

Tiit Örd ◽

Kadri Õunap ◽

Lindsey Stolze ◽

Rédouane Aherrahrou ◽

Valtteri Nurminen ◽

...

Keyword(s):

Smooth Muscle ◽

Smooth Muscle Cells ◽

Muscle Cells ◽

Cell Types ◽

Regulatory Elements ◽

Chromatin Accessibility ◽

Cell Type ◽

Atherosclerotic Lesions ◽

Genome Wide ◽

Single Nucleus

Rationale: Genome-wide association studies (GWAS) have identified hundreds of loci associated with coronary artery disease (CAD). Many of these loci are enriched in cis-regulatory elements (CREs) but not linked to cardiometabolic risk factors nor to candidate causal genes, complicating their functional interpretation. Objective: Single nucleus chromatin accessibility profiling of the human atherosclerotic lesions was used to investigate cell type-specific patterns of CREs, to understand transcription factors establishing cell identity and to interpret CAD-relevant, non-coding genetic variation. Methods and Results: We used single nucleus ATAC-seq to generate DNA accessibility maps in > 7,000 cells derived from human atherosclerotic lesions. We identified five major lesional cell types including endothelial cells, smooth muscle cells, monocyte/macrophages, NK/T-cells and B-cells and further investigated subtype characteristics of macrophages and smooth muscle cells transitioning into fibromyocytes. We demonstrated that CAD associated genetic variants are particularly enriched in endothelial and smooth muscle cell-specific open chromatin. Using single cell co-accessibility and cis-eQTL information, we prioritized putative target genes and candidate regulatory elements for ~30% of all known CAD loci. Finally, we performed genome-wide experimental fine-mapping of the CAD GWAS variants using epigenetic QTL analysis in primary human aortic endothelial cells and STARR-Seq massively parallel reporter assay in smooth muscle cells. This analysis identified potential causal SNP(s) and the associated target gene for over 30 CAD loci. We present several examples where the chromatin accessibility and gene expression could be assigned to one cell type predicting the cell type of action for CAD loci. Conclusions: These findings highlight the potential of applying snATAC-seq to human tissues in revealing relative contributions of distinct cell types to diseases and in identifying genes likely to be influenced by non-coding GWAS variants.

Download Full-text