Identifying branch-specific positive selection throughout the regulatory genome using an appropriate neutral proxy

Mapping Intimacies ◽

10.1101/722884 ◽

2019 ◽

Cited By ~ 1

Author(s):

Alejandro Berrio ◽

Ralph Haygood ◽

Gregory A Wray

Keyword(s):

Positive Selection ◽

Regulatory Elements ◽

Directional Selection ◽

Neutral Evolution ◽

Sequence Length ◽

Open Chromatin ◽

Sequence Alignments ◽

Noncoding Dna ◽

Region Length ◽

The Impact

AbstractAdaptive changes in cis-regulatory elements are an essential component of evolution by natural selection. Identifying adaptive and functional noncoding DNA elements throughout the genome is therefore crucial for understanding the relationship between phenotype and genotype. Here, we introduce a method we called adaptyPhy, which adds significant improvements to our earlier method that tests for branch-specific directional selection in noncoding sequences. The motivation for these improvements is to provide a more sensitive and better targeted characterization of directional selection and neutral evolution across the genome. We use ENCODE annotations to identify appropriate proxy neutral sequences and demonstrate that the conservativeness of the test can be modulated during the filtration of reference alignments. We apply the method to noncoding Human Accelerated Elements as well as open chromatin elements previously identified in 125 human tissues and cell lines to demonstrate its utility. We also simulate sequence alignments under different classes of evolution in order to validate the ability of adaptiPhy to distinguish positive selection from relaxation of constraint and neutral evolution. Finally, we evaluate the impact of query region length, proxy neutral sequence length, and branch count on test sensitivity.

Addiction-associated genetic variants implicate brain cell type- and region-specific cis-regulatory elements in addiction neurobiology

10.1101/2020.09.29.318329 ◽

2020 ◽

Cited By ~ 1

Author(s):

Chaitanya Srinivasan ◽

BaDoi N. Phan ◽

Alyssa J. Lawler ◽

Easwaran Ramamurthy ◽

Michael Kleyman ◽

...

Keyword(s):

Genetic Variants ◽

Cell Types ◽

Regulatory Elements ◽

Brain Regions ◽

Open Chromatin ◽

Genome Wide Association Studies ◽

Cell Type ◽

Coding Regions ◽

Cell Type Specific ◽

The Impact

ABSTRACTRecent large genome-wide association studies (GWAS) have identified multiple confident risk loci linked to addiction-associated behavioral traits. Genetic variants linked to addiction-associated traits lie largely in non-coding regions of the genome, likely disrupting cis-regulatory element (CRE) function. CREs tend to be highly cell type-specific and may contribute to the functional development of the neural circuits underlying addiction. Yet, a systematic approach for predicting the impact of risk variants on the CREs of specific cell populations is lacking. To dissect the cell types and brain regions underlying addiction-associated traits, we applied LD score regression to compare GWAS to genomic regions collected from human and mouse assays for open chromatin, which is associated with CRE activity. We found enrichment of addiction-associated variants in putative regulatory elements marked by open chromatin in neuronal (NeuN+) nuclei collected from multiple prefrontal cortical areas and striatal regions known to play major roles in reward and addiction. To further dissect the cell type-specific basis of addiction-associated traits, we also identified enrichments in human orthologs of open chromatin regions of mouse neuron subtypes: cortical excitatory, PV, D1, and D2. Lastly, we developed machine learning models from mouse cell type-specific regions of open chromatin to further dissect human NeuN+ open chromatin regions into cortical excitatory or striatal D1 and D2 neurons and predict the functional impact of addiction-associated genetic variants. Our results suggest that different neuron subtypes within the reward system play distinct roles in the variety of traits that contribute to addiction.Significance StatementOur study on cell types and brain regions contributing to heritability of addiction-associated traits suggests that the conserved non-coding regions within cortical excitatory and striatal medium spiny neurons contribute to genetic predisposition for nicotine, alcohol, and cannabis use behaviors. This computational framework can flexibly integrate epigenomic data across species to screen for putative causal variants in a cell type- and tissue-specific manner across numerous complex traits.

Cell-type specific open chromatin profiling in human postmortem brain infers functional roles for non-coding schizophrenia loci

10.1101/062513 ◽

2016 ◽

Author(s):

John F. Fullard ◽

Claudia Giambartolomei ◽

Mads E. Hauberg ◽

Ke Xu ◽

Christopher Bare ◽

...

Keyword(s):

Human Brain ◽

Regulatory Elements ◽

Postmortem Brain ◽

Open Chromatin ◽

Cell Type ◽

Postmortem Human Brain ◽

Functional Roles ◽

Risk Variants ◽

Cell Type Specific ◽

The Impact

SUMMARYTo better understand the role of cis regulatory elements in neuropsychiatric disorders we applied ATAC-seq to neuronal and non-neuronal nuclei isolated from frozen postmortem human brain. Most of the identified open chromatin regions (OCRs) are differentially accessible between neurons and non-neurons, and show enrichment with known cell type markers, promoters and enhancers. Relative to those of non-neurons, neuronal OCRs are more evolutionarily conserved and are enriched in distal regulatory elements. Our data reveals sex differences in chromatin accessibility and identifies novel OCRs that escape X chromosome inactivation, with implications for intellectual disability. Transcription factor footprinting analysis identifies differences in the regulome between neuronal and non-neuronal cells and ascribes putative functional roles to 16 non-coding schizophrenia risk variants. These results represent the first analysis of cell-type-specific OCRs and TF binding sites in postmortem human brain and further our understanding of the regulome and the impact of neuropsychiatric disease-associated genetic risk variants.

Mixture Density Regression reveals frequent recent adaptation in the human genome

10.1101/2021.12.20.473463 ◽

2021 ◽

Author(s):

Diego F Salazar-Tortosa ◽

Yi-Fei Huang ◽

David Enard

Keyword(s):

Positive Selection ◽

Human Genome ◽

Immune Cells ◽

Gc Content ◽

Regulatory Elements ◽

Wide Distribution ◽

Neutral Evolution ◽

Gaussian Distributions ◽

Mixture Density ◽

Genomic Adaptation

How much genome differences between species reflect neutral or adaptive evolution is a central question in evolutionary genomics. In humans and other mammals, the prevalence of adaptive versus neutral genomic evolution has proven particularly difficult to quantify. The difficulty notably stems from the highly heterogenous organization of mammalian genomes at multiple levels (functional sequence density, recombination, etc.) that complicates the interpretation and distinction of adaptive vs. neutral evolution signals. Here, we introduce Mixture Density Regressions (MDRs) for the study of the determinants of recent adaptation in the human genome. MDRs provide a flexible regression model based on multiple Gaussian distributions. We use MDRs to model the association between recent selection signals and multiple genomic factors likely to affect positive selection, if the latter was common enough in the first place to generate these associations. We find that a MDR model with two Gaussian distributions provides an excellent fit to the genome-wide distribution of a common sweep summary statistic (iHS), with one of the two distributions likely capturing the positively selected component of the genome. We further find several factors associated with recent adaptation, including the recombination rate, the density of regulatory elements in immune cells and testis, GC-content, gene expression in immune cells, the density of mammal-wide conserved elements, and the distance to the nearest virus-interacting gene. These results support that strong positive selection was relatively common in recent human evolution and highlight MDRs as a powerful tool to make sense of signals of recent genomic adaptation.

Positive natural selection in primate genes of the type I interferon response

BMC Ecology and Evolution ◽

10.1186/s12862-021-01783-z ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Elena N. Judd ◽

Alison R. Gilchrist ◽

Nicholas R. Meyerson ◽

Sara L. Sawyer

Keyword(s):

Natural Selection ◽

Positive Selection ◽

Type I Interferon ◽

Interferon Response ◽

Type I ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Interferon Stimulated Genes ◽

Interferon Induction

Abstract Background The Type I interferon response is an important first-line defense against viruses. In turn, viruses antagonize (i.e., degrade, mis-localize, etc.) many proteins in interferon pathways. Thus, hosts and viruses are locked in an evolutionary arms race for dominance of the Type I interferon pathway. As a result, many genes in interferon pathways have experienced positive natural selection in favor of new allelic forms that can better recognize viruses or escape viral antagonists. Here, we performed a holistic analysis of selective pressures acting on genes in the Type I interferon family. We initially hypothesized that the genes responsible for inducing the production of interferon would be antagonized more heavily by viruses than genes that are turned on as a result of interferon. Our logic was that viruses would have greater effect if they worked upstream of the production of interferon molecules because, once interferon is produced, hundreds of interferon-stimulated proteins would activate and the virus would need to counteract them one-by-one. Results We curated multiple sequence alignments of primate orthologs for 131 genes active in interferon production and signaling (herein, “induction” genes), 100 interferon-stimulated genes, and 100 randomly chosen genes. We analyzed each multiple sequence alignment for the signatures of recurrent positive selection. Counter to our hypothesis, we found the interferon-stimulated genes, and not interferon induction genes, are evolving significantly more rapidly than a random set of genes. Interferon induction genes evolve in a way that is indistinguishable from a matched set of random genes (22% and 18% of genes bear signatures of positive selection, respectively). In contrast, interferon-stimulated genes evolve differently, with 33% of genes evolving under positive selection and containing a significantly higher fraction of codons that have experienced selection for recurrent replacement of the encoded amino acid. Conclusion Viruses may antagonize individual products of the interferon response more often than trying to neutralize the system altogether.

Predicting pathogenic non-coding SVs disrupting the 3D genome in 1646 whole cancer genomes using multiple instance learning

Scientific Reports ◽

10.1038/s41598-021-93917-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Marleen M. Nieboer ◽

Luan Nguyen ◽

Jeroen de Ridder

Keyword(s):

Multiple Instance Learning ◽

Cancer Diagnostics ◽

Common Mechanism ◽

Open Chromatin ◽

Driver Genes ◽

3D Genome ◽

Whole Genomes ◽

Cancer Genomes ◽

Cancer Types ◽

The Impact

AbstractOver the past years, large consortia have been established to fuel the sequencing of whole genomes of many cancer patients. Despite the increased abundance in tools to study the impact of SNVs, non-coding SVs have been largely ignored in these data. Here, we introduce svMIL2, an improved version of our Multiple Instance Learning-based method to study the effect of somatic non-coding SVs disrupting boundaries of TADs and CTCF loops in 1646 cancer genomes. We demonstrate that svMIL2 predicts pathogenic non-coding SVs with an average AUC of 0.86 across 12 cancer types, and identifies non-coding SVs affecting well-known driver genes. The disruption of active (super) enhancers in open chromatin regions appears to be a common mechanism by which non-coding SVs exert their pathogenicity. Finally, our results reveal that the contribution of pathogenic non-coding SVs as opposed to driver SNVs may highly vary between cancers, with notably high numbers of genes being disrupted by pathogenic non-coding SVs in ovarian and pancreatic cancer. Taken together, our machine learning method offers a potent way to prioritize putatively pathogenic non-coding SVs and leverage non-coding SVs to identify driver genes. Moreover, our analysis of 1646 cancer genomes demonstrates the importance of including non-coding SVs in cancer diagnostics.

Aberrant Bcl-x splicing in cancer: from molecular mechanism to therapeutic modulation

Journal of Experimental & Clinical Cancer Research ◽

10.1186/s13046-021-02001-w ◽

2021 ◽

Vol 40 (1) ◽

Author(s):

Zhihui Dou ◽

Dapeng Zhao ◽

Xiaohua Chen ◽

Caipeng Xu ◽

Xiaodong Jin ◽

...

Keyword(s):

Cancer Cells ◽

Human Cancer ◽

Structural Characteristics ◽

Therapy Resistance ◽

Regulatory Elements ◽

Aberrant Splicing ◽

Clinical Role ◽

Splicing Isoforms ◽

The Impact ◽

Splicing Patterns

AbstractBcl-x pre-mRNA splicing serves as a typical example to study the impact of alternative splicing in the modulation of cell death. Dysregulation of Bcl-x apoptotic isoforms caused by precarious equilibrium splicing is implicated in genesis and development of multiple human diseases, especially cancers. Exploring the mechanism of Bcl-x splicing and regulation has provided insight into the development of drugs that could contribute to sensitivity of cancer cells to death. On this basis, we review the multiple splicing patterns and structural characteristics of Bcl-x. Additionally, we outline the cis-regulatory elements, trans-acting factors as well as epigenetic modifications involved in the splicing regulation of Bcl-x. Furthermore, this review highlights aberrant splicing of Bcl-x involved in apoptosis evade, autophagy, metastasis, and therapy resistance of various cancer cells. Last, emphasis is given to the clinical role of targeting Bcl-x splicing correction in human cancer based on the splice-switching oligonucleotides, small molecular modulators and BH3 mimetics. Thus, it is highlighting significance of aberrant splicing isoforms of Bcl-x as targets for cancer therapy.

Chromatin loop anchors contain core structural components of the gene expression machinery in maize

BMC Genomics ◽

10.1186/s12864-020-07324-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Stéphane Deschamps ◽

John A. Crow ◽

Nadia Chaidir ◽

Brooke Peterson-Burch ◽

Sunil Kumar ◽

...

Keyword(s):

Gene Expression ◽

High Resolution ◽

Transcriptional Activity ◽

Spatial Organization ◽

Regulatory Elements ◽

Open Chromatin ◽

Maize Genome ◽

Chromatin Loop ◽

Structural Components ◽

Chromatin Loops

Abstract Background Three-dimensional chromatin loop structures connect regulatory elements to their target genes in regions known as anchors. In complex plant genomes, such as maize, it has been proposed that loops span heterochromatic regions marked by higher repeat content, but little is known on their spatial organization and genome-wide occurrence in relation to transcriptional activity. Results Here, ultra-deep Hi-C sequencing of maize B73 leaf tissue was combined with gene expression and open chromatin sequencing for chromatin loop discovery and correlation with hierarchical topologically-associating domains (TADs) and transcriptional activity. A majority of all anchors are shared between multiple loops from previous public maize high-resolution interactome datasets, suggesting a highly dynamic environment, with a conserved set of anchors involved in multiple interaction networks. Chromatin loop interiors are marked by higher repeat contents than the anchors flanking them. A small fraction of high-resolution interaction anchors, fully embedded in larger chromatin loops, co-locate with active genes and putative protein-binding sites. Combinatorial analyses indicate that all anchors studied here co-locate with at least 81.5% of expressed genes and 74% of open chromatin regions. Approximately 38% of all Hi-C chromatin loops are fully embedded within hierarchical TAD-like domains, while the remaining ones share anchors with domain boundaries or with distinct domains. Those various loop types exhibit specific patterns of overlap for open chromatin regions and expressed genes, but no apparent pattern of gene expression. In addition, up to 63% of all unique variants derived from a prior public maize eQTL dataset overlap with Hi-C loop anchors. Anchor annotation suggests that < 7% of all loops detected here are potentially devoid of any genes or regulatory elements. The overall organization of chromatin loop anchors in the maize genome suggest a loop modeling system hypothesized to resemble phase separation of repeat-rich regions. Conclusions Sets of conserved chromatin loop anchors mapping to hierarchical domains contains core structural components of the gene expression machinery in maize. The data presented here will be a useful reference to further investigate their function in regard to the formation of transcriptional complexes and the regulation of transcriptional activity in the maize genome.

EPCO-31. EPIGENOMIC INTRATUMORAL HETEROGENEITY OF GLIOBLASTOMA IN THREE-DIMENSIONAL SPACE

Neuro-Oncology ◽

10.1093/neuonc/noaa215.310 ◽

2020 ◽

Vol 22 (Supplement_2) ◽

pp. ii76-ii76

Author(s):

Radhika Mathur ◽

Sriranga Iyyanki ◽

Stephanie Hilz ◽

Chibo Hong ◽

Joanna Phillips ◽

...

Keyword(s):

Spatial Organization ◽

Dimensional Space ◽

Three Dimensional ◽

Spatial Location ◽

Therapy Resistance ◽

Regulatory Elements ◽

Chromatin Accessibility ◽

Intratumoral Heterogeneity ◽

Tumor Evolution ◽

Open Chromatin

Abstract Treatment failure in glioblastoma is often attributed to intratumoral heterogeneity (ITH), which fosters tumor evolution and generation of therapy-resistant clones. While ITH in glioblastoma has been well-characterized at the genomic and transcriptomic levels, the extent of ITH at the epigenomic level and its biological and clinical significance are not well understood. In collaboration with neurosurgeons, neuropathologists, and biomedical imaging experts, we have established a novel topographical approach towards characterizing epigenomic ITH in three-dimensional (3-D) space. We utilize pre-operative MRI scans to define tumor volume and then utilize 3-D surgical neuro-navigation to intra-operatively acquire 10+ samples representing maximal anatomical diversity. The precise spatial location of each sample is mapped by 3-D coordinates, enabling tumors to be visualized in 360-degrees and providing unprecedented insight into their spatial organization and patterning. For each sample, we conduct assay for transposase-accessible chromatin using sequencing (ATAC-Seq), which provides information on the genomic locations of open chromatin, DNA-binding proteins, and individual nucleosomes at nucleotide resolution. We additionally conduct whole-exome sequencing and RNA sequencing for each spatially mapped sample. Integrative analysis of these datasets reveals distinct patterns of chromatin accessibility within glioblastoma tumors, as well as their associations with genetically defined clonal expansions. Our analysis further reveals how differences in chromatin accessibility within tumors reflect underlying transcription factor activity at gene regulatory elements, including both promoters and enhancers, and drive expression of particular gene expression sets, including neuronal and immune programs. Collectively, this work provides the most comprehensive characterization of epigenomic ITH to date, establishing its importance for driving tumor evolution and therapy resistance in glioblastoma. As a resource for further investigation, we have provided our datasets on an interactive data sharing platform – The 3D Glioma Atlas – that enables 360-degree visualization of both genomic and epigenomic ITH.

Comprehensive analysis of single cell ATAC-seq data with SnapATAC

Nature Communications ◽

10.1038/s41467-021-21583-9 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Rongxin Fang ◽

Sebastian Preissl ◽

Yang Li ◽

Xiaomeng Hou ◽

Jacinta Lucero ◽

...

Keyword(s):

Single Cell ◽

Single Cell Analysis ◽

Expression Patterns ◽

Regulatory Elements ◽

Cellular Heterogeneity ◽

Specific Gene ◽

Open Chromatin ◽

Cell Type ◽

Process Data ◽

Cell Type Specific

AbstractIdentification of the cis-regulatory elements controlling cell-type specific gene expression patterns is essential for understanding the origin of cellular diversity. Conventional assays to map regulatory elements via open chromatin analysis of primary tissues is hindered by sample heterogeneity. Single cell analysis of accessible chromatin (scATAC-seq) can overcome this limitation. However, the high-level noise of each single cell profile and the large volume of data pose unique computational challenges. Here, we introduce SnapATAC, a software package for analyzing scATAC-seq datasets. SnapATAC dissects cellular heterogeneity in an unbiased manner and map the trajectories of cellular states. Using the Nyström method, SnapATAC can process data from up to a million cells. Furthermore, SnapATAC incorporates existing tools into a comprehensive package for analyzing single cell ATAC-seq dataset. As demonstration of its utility, SnapATAC is applied to 55,592 single-nucleus ATAC-seq profiles from the mouse secondary motor cortex. The analysis reveals ~370,000 candidate regulatory elements in 31 distinct cell populations in this brain region and inferred candidate cell-type specific transcriptional regulators.

Using a GTR+Γ substitution model for dating sequence divergence when stationarity and time-reversibility assumptions are violated

Bioinformatics ◽

10.1093/bioinformatics/btaa820 ◽

2020 ◽

Vol 36 (Supplement_2) ◽

pp. i884-i894

Author(s):

Jose Barba-Montoya ◽

Qiqing Tao ◽

Sudhir Kumar

Keyword(s):

Divergence Time ◽

Sequence Divergence ◽

Molecular Dating ◽

Divergence Times ◽

Time Reversibility ◽

Sequence Alignments ◽

Divergence Time Estimates ◽

Time Estimates ◽

Substitution Process ◽

The Impact

Abstract Motivation As the number and diversity of species and genes grow in contemporary datasets, two common assumptions made in all molecular dating methods, namely the time-reversibility and stationarity of the substitution process, become untenable. No software tools for molecular dating allow researchers to relax these two assumptions in their data analyses. Frequently the same General Time Reversible (GTR) model across lineages along with a gamma (+Γ) distributed rates across sites is used in relaxed clock analyses, which assumes time-reversibility and stationarity of the substitution process. Many reports have quantified the impact of violations of these underlying assumptions on molecular phylogeny, but none have systematically analyzed their impact on divergence time estimates. Results We quantified the bias on time estimates that resulted from using the GTR + Γ model for the analysis of computer-simulated nucleotide sequence alignments that were evolved with non-stationary (NS) and non-reversible (NR) substitution models. We tested Bayesian and RelTime approaches that do not require a molecular clock for estimating divergence times. Divergence times obtained using a GTR + Γ model differed only slightly (∼3% on average) from the expected times for NR datasets, but the difference was larger for NS datasets (∼10% on average). The use of only a few calibrations reduced these biases considerably (∼5%). Confidence and credibility intervals from GTR + Γ analysis usually contained correct times. Therefore, the bias introduced by the use of the GTR + Γ model to analyze datasets, in which the time-reversibility and stationarity assumptions are violated, is likely not large and can be reduced by applying multiple calibrations. Availability and implementation All datasets are deposited in Figshare: https://doi.org/10.6084/m9.figshare.12594638.