A unified encyclopedia of human functional DNA elements through fully automated annotation of 164 human cell types

AbstractSemi-automated genome annotation methods such as Segway enable understanding of chromatin activity. Here we present chromatin state annotations of 164 human cell types using 1,615 genomics data sets. To produce these annotations, we developed a fully-automated annotation strategy in which we train separate unsupervised annotation models on each cell type and use a machine learning classifier to automate the state interpretation step. Using these annotations, we developed a measure of the importance of each genomic position called the “conservation-associated activity score,” which we use to aggregate information across cell types into a multi-cell type view. The aggregated conservation-associated activity score provides a measure of importance directly attributable to a specific activity in a specific set of cell types. In contrast to evolutionary conservation, this measure is not biased to detect only elements shared with related species. Using the conservation-associated activity score, we combined all our annotations into a single, cell type-agnostic encyclopedia that catalogs all human transcriptional and regulatory elements, enabling easy and intuitive interpretation of the effect of genome variants on phenotype, such as in disease-associated, evolutionarily conserved or positively selected loci. These resources, including cell type-specific annotations, encyclopedia, and a visualization server, are available at http://noble.gs.washington.edu/proj/encyclopedia.Author SummaryGenome annotation algorithms are an effective class of tools for understanding the function of the genome. These algorithms take as input a set of genome-wide measurements about the activity at each base pair in a given tissue, such as where a given protein is binding or how accessible the DNA is to being read by a protein. The genome is then partitioned and each segment is assigned a label such that positions with the same label exhibit similar patterns in the input data. Such annotations are widely used for many applications, such as to understand the mechanism of impact of a given genetic variant. Here we present, to our knowledge, the most comprehensive set of genome annotations created so far, encompassing 164 human cell types and including 1,615 genomics data sets. These comprehensive annotations are made possible by a strategy that automates the previous interpretation step. Furthermore, we present several methodological innovations that make these genome annotations more useful.

Download Full-text

A human cell atlas of fetal chromatin accessibility

Science ◽

10.1126/science.aba7612 ◽

2020 ◽

Vol 370 (6518) ◽

pp. eaba7612 ◽

Cited By ~ 1

Author(s):

Silvia Domcke ◽

Andrew J. Hill ◽

Riza M. Daza ◽

Junyue Cao ◽

Diana R. O’Day ◽

...

Keyword(s):

Gene Expression ◽

Human Cell ◽

Single Cells ◽

Complex Trait ◽

Cell Types ◽

Regulatory Elements ◽

Chromatin Accessibility ◽

Cell Type ◽

Cell Type Specific

The chromatin landscape underlying the specification of human cell types is of fundamental interest. We generated human cell atlases of chromatin accessibility and gene expression in fetal tissues. For chromatin accessibility, we devised a three-level combinatorial indexing assay and applied it to 53 samples representing 15 organs, profiling ~800,000 single cells. We leveraged cell types defined by gene expression to annotate these data and cataloged hundreds of thousands of candidate regulatory elements that exhibit cell type–specific chromatin accessibility. We investigated the properties of lineage-specific transcription factors (such as POU2F1 in neurons), organ-specific specializations of broadly distributed cell types (such as blood and endothelial), and cell type–specific enrichments of complex trait heritability. These data represent a rich resource for the exploration of in vivo human gene regulation in diverse tissues and cell types.

Download Full-text

Profiling Bioactivity of the ToxCast Chemical Library Using BioMAP Primary Human Cell Systems

CrossRef Listing of Deleted DOIs ◽

10.1177/1087057109345525 ◽

2009 ◽

Vol 14 (9) ◽

pp. 1054-1066 ◽

Cited By ~ 77

Author(s):

Keith A. Houck ◽

David J. Dix ◽

Richard S. Judson ◽

Robert J. Kavlock ◽

Jian Yang ◽

...

Keyword(s):

Human Cell ◽

Regulatory Networks ◽

Cell Types ◽

Environmental Chemicals ◽

Data Sets ◽

Chemical Library ◽

Data Set ◽

Cell Systems ◽

Disease Biology ◽

Nfκb Pathway

The complexity of human biology has made prediction of health effects as a consequence of exposure to environmental chemicals especially challenging. Complex cell systems, such as the Biologically Multiplexed Activity Profiling (BioMAP) primary, human, cell-based disease models, leverage cellular regulatory networks to detect and distinguish chemicals with a broad range of target mechanisms and biological processes relevant to human toxicity. Here the authors use the BioMAP human cell systems to characterize effects relevant to human tissue and inflammatory disease biology following exposure to the 320 environmental chemicals in the Environmental Protection Agency’s (EPA’s) ToxCast phase I library. The ToxCast chemicals were assayed at 4 concentrations in 8 BioMAP cell systems, with a total of 87 assay endpoints resulting in more than 100,000 data points. Within the context of the BioMAP database, ToxCast compounds could be classified based on their ability to cause overt cytotoxicity in primary human cell types or according to toxicity mechanism class derived from comparisons to activity profiles of BioMAP reference compounds. ToxCast chemicals with similarity to inducers of mitochondrial dysfunction, cAMP elevators, inhibitors of tubulin function, inducers of endoplasmic reticulum stress, or NFκB pathway inhibitors were identified based on this BioMAP analysis. This data set is being combined with additional ToxCast data sets for development of predictive toxicity models at the EPA. ( Journal of Biomolecular Screening 2009:1054-1066)

Download Full-text

A scalable platform for the development of cell-type-specific viral drivers

eLife ◽

10.7554/elife.48089 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 12

Author(s):

Sinisa Hrvatin ◽

Christopher P Tzeng ◽

M Aurel Nagy ◽

Hume Stroud ◽

Charalampia Koutsioumpa ◽

...

Keyword(s):

Gene Expression ◽

Heterologous Gene Expression ◽

High Specificity ◽

Cell Types ◽

Regulatory Elements ◽

Cell Type ◽

Cell Type Specificity ◽

Cell Type Specific ◽

The Many ◽

Dna Regulatory Elements

Enhancers are the primary DNA regulatory elements that confer cell type specificity of gene expression. Recent studies characterizing individual enhancers have revealed their potential to direct heterologous gene expression in a highly cell-type-specific manner. However, it has not yet been possible to systematically identify and test the function of enhancers for each of the many cell types in an organism. We have developed PESCA, a scalable and generalizable method that leverages ATAC- and single-cell RNA-sequencing protocols, to characterize cell-type-specific enhancers that should enable genetic access and perturbation of gene function across mammalian cell types. Focusing on the highly heterogeneous mammalian cerebral cortex, we apply PESCA to find enhancers and generate viral reagents capable of accessing and manipulating a subset of somatostatin-expressing cortical interneurons with high specificity. This study demonstrates the utility of this platform for developing new cell-type-specific viral reagents, with significant implications for both basic and translational research.

Download Full-text

Genomic Architecture of Cells in Tissues (GeACT): Study of Human Mid-gestation Fetus

10.1101/2020.04.12.038000 ◽

2020 ◽

Author(s):

Feng Tian ◽

Fan Zhou ◽

Xiang Li ◽

Wenping Ma ◽

Honggui Wu ◽

...

Keyword(s):

Transcription Factors ◽

Single Cell ◽

Human Cell ◽

Expression Profiles ◽

Single Cells ◽

Cell Types ◽

List Type ◽

Cell Type ◽

Genomic Architecture ◽

Gene Modules

SummaryBy circumventing cellular heterogeneity, single cell omics have now been widely utilized for cell typing in human tissues, culminating with the undertaking of human cell atlas aimed at characterizing all human cell types. However, more important are the probing of gene regulatory networks, underlying chromatin architecture and critical transcription factors for each cell type. Here we report the Genomic Architecture of Cells in Tissues (GeACT), a comprehensive genomic data base that collectively address the above needs with the goal of understanding the functional genome in action. GeACT was made possible by our novel single-cell RNA-seq (MALBAC-DT) and ATAC-seq (METATAC) methods of high detectability and precision. We exemplified GeACT by first studying representative organs in human mid-gestation fetus. In particular, correlated gene modules (CGMs) are observed and found to be cell-type-dependent. We linked gene expression profiles to the underlying chromatin states, and found the key transcription factors for representative CGMs.HighlightsGenomic Architecture of Cells in Tissues (GeACT) data for human mid-gestation fetusDetermining correlated gene modules (CGMs) in different cell types by MALBAC-DTMeasuring chromatin open regions in single cells with high detectability by METATACIntegrating transcriptomics and chromatin accessibility to reveal key TFs for a CGM

Download Full-text

MIXTURE: an improved algorithm for immune tumor microenvironment estimation based on gene expression data

10.1101/726562 ◽

2019 ◽

Cited By ~ 3

Author(s):

Elmer A. Fernández ◽

Yamil D. Mahmoud ◽

Florencia Veigas ◽

Darío Rocha ◽

Mónica Balzarini ◽

...

Keyword(s):

Tumor Microenvironment ◽

Immune Cell ◽

Therapy Response ◽

Cell Types ◽

Gene Signature ◽

Response To Therapy ◽

Support Vector ◽

Data Sets ◽

Cell Type ◽

Before And After

AbstractRNA sequencing has proved to be an efficient high-throughput technique to robustly characterize the presence and quantity of RNA in tumor biopsies at a given time. Importantly, it can be used to computationally estimate the composition of the tumor immune infiltrate and to infer the immunological phenotypes of those cells. Given the significant impact of anti-cancer immunotherapies and the role of the associated immune tumor microenvironment (ITME) on its prognosis and therapy response, the estimation of the immune cell-type content in the tumor is crucial for designing effective strategies to understand and treat cancer. Current digital estimation of the ITME cell mixture content can be performed using different analytical tools. However, current methods tend to over-estimate the number of cell-types present in the sample, thus under-estimating true proportions, biasing the results. We developed MIXTURE, a noise-constrained recursive feature selection for support vector regression that overcomes such limitations. MIXTURE deconvolutes cell-type proportions of bulk tumor samples for both RNA microarray or RNA-Seq platforms from a leukocyte validated gene signature. We evaluated MIXTURE over simulated and benchmark data sets. It overcomes competitive methods in terms of accuracy on the true number of present cell-types and proportions estimates with increased robustness to estimation bias. It also shows superior robustness to collinearity problems. Finally, we investigated the human immune microenvironment of breast cancer, head and neck squamous cell carcinoma, and melanoma biopsies before and after anti-PD-1 immunotherapy treatment revealing associations to response to therapy which have not seen by previous methods.

Download Full-text

Panoramic stitching of heterogeneous single-cell transcriptomic data

10.1101/371179 ◽

2018 ◽

Cited By ~ 17

Author(s):

Brian Hie ◽

Bryan Bryson ◽

Bonnie Berger

Keyword(s):

Single Cell ◽

Cell Types ◽

Data Sets ◽

Cell Type ◽

Data Set ◽

Wide Range ◽

Data Set Integration ◽

Biological Patterns ◽

Insight Into ◽

Comprehensive Reference

AbstractResearchers are generating single-cell RNA sequencing (scRNA-seq) profiles of diverse biological systems1–4 and every cell type in the human body.5 Leveraging this data to gain unprecedented insight into biology and disease will require assembling heterogeneous cell populations across multiple experiments, laboratories, and technologies. Although methods for scRNA-seq data integration exist6,7, they often naively merge data sets together even when the data sets have no cell types in common, leading to results that do not correspond to real biological patterns. Here we present Scanorama, inspired by algorithms for panorama stitching, that overcomes the limitations of existing methods to enable accurate, heterogeneous scRNA-seq data set integration. Our strategy identifies and merges the shared cell types among all pairs of data sets and is orders of magnitude faster than existing techniques. We use Scanorama to combine 105,476 cells from 26 diverse scRNA-seq experiments across 9 different technologies into a single comprehensive reference, demonstrating how Scanorama can be used to obtain a more complete picture of cellular function across a wide range of scRNA-seq experiments.

Download Full-text

Analysis of putative cis-regulatory elements regulating blood pressure variation

Human Molecular Genetics ◽

10.1093/hmg/ddaa098 ◽

2020 ◽

Vol 29 (11) ◽

pp. 1922-1932

Author(s):

Priyanka Nandakumar ◽

Dongwon Lee ◽

Thomas J Hoffmann ◽

Georg B Ehret ◽

Dan Arking ◽

...

Keyword(s):

Blood Pressure ◽

Association Studies ◽

Specific Effect ◽

Cell Types ◽

Regulatory Elements ◽

Open Chromatin ◽

Genome Wide Association Studies ◽

Cell Type ◽

Functional Scores ◽

Cell Type Specific

Abstract Hundreds of loci have been associated with blood pressure (BP) traits from many genome-wide association studies. We identified an enrichment of these loci in aorta and tibial artery expression quantitative trait loci in our previous work in ~100 000 Genetic Epidemiology Research on Aging study participants. In the present study, we sought to fine-map known loci and identify novel genes by determining putative regulatory regions for these and other tissues relevant to BP. We constructed maps of putative cis-regulatory elements (CREs) using publicly available open chromatin data for the heart, aorta and tibial arteries, and multiple kidney cell types. Variants within these regions may be evaluated quantitatively for their tissue- or cell-type-specific regulatory impact using deltaSVM functional scores, as described in our previous work. We aggregate variants within these putative CREs within 50 Kb of the start or end of ‘expressed’ genes in these tissues or cell types using public expression data and use deltaSVM scores as weights in the group-wise sequence kernel association test to identify candidates. We test for association with both BP traits and expression within these tissues or cell types of interest and identify the candidates MTHFR, C10orf32, CSK, NOV, ULK4, SDCCAG8, SCAMP5, RPP25, HDGFRP3, VPS37B and PPCDC. Additionally, we examined two known QT interval genes, SCN5A and NOS1AP, in the Atherosclerosis Risk in Communities Study, as a positive control, and observed the expected heart-specific effect. Thus, our method identifies variants and genes for further functional testing using tissue- or cell-type-specific putative regulatory information.

Download Full-text

Evolution of regulatory signatures in primate cortical neurons at cell-type resolution

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2011884117 ◽

2020 ◽

Vol 117 (45) ◽

pp. 28422-28432

Author(s):

Alexey Kozlenkov ◽

Marit W. Vermunt ◽

Pasha Apontes ◽

Junhao Li ◽

Ke Hao ◽

...

Keyword(s):

Cortical Neurons ◽

Brain Evolution ◽

Cell Types ◽

Regulatory Elements ◽

Autism Spectrum ◽

Projection Neurons ◽

Evolutionary Divergence ◽

Cell Type ◽

Tissue Samples ◽

Functional Changes

The human cerebral cortex contains many cell types that likely underwent independent functional changes during evolution. However, cell-type–specific regulatory landscapes in the cortex remain largely unexplored. Here we report epigenomic and transcriptomic analyses of the two main cortical neuronal subtypes, glutamatergic projection neurons and GABAergic interneurons, in human, chimpanzee, and rhesus macaque. Using genome-wide profiling of the H3K27ac histone modification, we identify neuron-subtype–specific regulatory elements that previously went undetected in bulk brain tissue samples. Human-specific regulatory changes are uncovered in multiple genes, including those associated with language, autism spectrum disorder, and drug addiction. We observe preferential evolutionary divergence in neuron subtype-specific regulatory elements and show that a substantial fraction of pan-neuronal regulatory elements undergoes subtype-specific evolutionary changes. This study sheds light on the interplay between regulatory evolution and cell-type–dependent gene-expression programs, and provides a resource for further exploration of human brain evolution and function.

Download Full-text

Analysis of putative cis-regulatory elements regulating blood pressure variation

10.1101/820522 ◽

2019 ◽

Author(s):

Priyanka Nandakumar ◽

Dongwon Lee ◽

Thomas J. Hoffmann ◽

Georg B. Ehret ◽

Dan Arking ◽

...

Keyword(s):

Gene Expression ◽

Blood Pressure ◽

Cell Types ◽

Regulatory Elements ◽

Open Chromatin ◽

Genome Wide Association Studies ◽

Cell Type ◽

Functional Scores ◽

Cell Type Specific ◽

Different Tissues

AbstractHundreds of loci have been associated with blood pressure traits from many genome-wide association studies. We identified an enrichment of these loci in aorta and tibial artery expression quantitative trait loci in our previous work in ∼100,000 Genetic Epidemiology Research on Aging (GERA) study participants. In the present study, we subsequently focused on determining putative regulatory regions for these and other tissues of relevance to blood pressure, to both fine-map these loci by pinpointing genes and variants of functional interest within them, and to identify any novel genes.We constructed maps of putative cis-regulatory elements using publicly available open chromatin data for the heart, aorta and tibial arteries, and multiple kidney cell types. Sequence variants within these regions may be evaluated quantitatively for their tissue- or cell-type-specific regulatory impact using deltaSVM functional scores, as described in our previous work. In order to identify genes of interest, we aggregate these variants in these putative cis-regulatory elements within 50Kb of the start or end of genes considered as “expressed” in these tissues or cell types using publicly available gene expression data, and use the deltaSVM scores as weights in the well-known group-wise sequence kernel association test (SKAT). We test for association with both blood pressure traits as well as expression within these tissues or cell types of interest, and identify several genes, including MTHFR, C10orf32, CSK, NOV, ULK4, SDCCAG8, SCAMP5, RPP25, HDGFRP3, VPS37B, and PPCDC. Although our study centers on blood pressure traits, we additionally examined two known genes, SCN5A and NOS1AP involved in the cardiac trait QT interval, in the Atherosclerosis Risk in Communities Study (ARIC), as a positive control, and observed an expected heart-specific effect. Thus, our method may be used to identify variants and genes for further functional testing using tissue- or cell-type-specific putative regulatory information.Author SummarySequence change in genes (“variants”) are linked to the presence and severity of different traits or diseases. However, as genes may be expressed in different tissues and at different times and degrees, using this information is expected to more accurately identify genes of interest. Variants within the genes are essential, but also in the sequences (“regulatory elements”) that control the genes’ expression in different tissues or cell types. In this study, we aim to use this information about expression and variants potentially involved in gene expression regulation to better pinpoint genes and variants in regulatory elements of interest for blood pressure regulation. We do so by taking advantage of such data that are publicly available, and use methods to combine information about variants in aggregate within a gene’s putative regulatory elements in tissues thought to be relevant for blood pressure, and identify several genes, meant to enable experimental follow-up.

Download Full-text

DNA Methylation Atlas of the Mouse Brain at Single-Cell Resolution

10.1101/2020.04.30.069377 ◽

2020 ◽

Cited By ~ 1

Author(s):

Hanqing Liu ◽

Jingtian Zhou ◽

Wei Tian ◽

Chongyuan Luo ◽

Anna Bartlett ◽

...

Keyword(s):

Dna Methylation ◽

Mouse Brain ◽

Spatial Organization ◽

Brain Area ◽

Cell Types ◽

Regulatory Elements ◽

Mammalian Brain ◽

Open Chromatin ◽

Cell Type ◽

Single Nucleus

SummaryMammalian brain cells are remarkably diverse in gene expression, anatomy, and function, yet the regulatory DNA landscape underlying this extensive heterogeneity is poorly understood. We carried out a comprehensive assessment of the epigenomes of mouse brain cell types by applying single nucleus DNA methylation sequencing to profile 110,294 nuclei from 45 regions of the mouse cortex, hippocampus, striatum, pallidum, and olfactory areas. We identified 161 cell clusters with distinct spatial locations and projection targets. We constructed taxonomies of these epigenetic types, annotated with signature genes, regulatory elements, and transcription factors. These features indicate the potential regulatory landscape supporting the assignment of putative cell types, and reveal repetitive usage of regulators in excitatory and inhibitory cells for determining subtypes. The DNA methylation landscape of excitatory neurons in the cortex and hippocampus varied continuously along spatial gradients. Using this deep dataset, an artificial neural network model was constructed that precisely predicts single neuron cell-type identity and brain area spatial location. Integration of high-resolution DNA methylomes with single-nucleus chromatin accessibility data allowed prediction of high-confidence enhancer-gene interactions for all identified cell types, which were subsequently validated by cell-type-specific chromatin conformation capture experiments. By combining multi-omic datasets (DNA methylation, chromatin contacts, and open chromatin) from single nuclei and annotating the regulatory genome of hundreds of cell types in the mouse brain, our DNA methylation atlas establishes the epigenetic basis for neuronal diversity and spatial organization throughout the mouse brain.

Download Full-text