scholarly journals A novel k-mer set memory (KSM) motif representation improves regulatory variant prediction

2017 ◽  
Author(s):  
Yuchun Guo ◽  
Kevin Tian ◽  
Haoyang Zeng ◽  
Xiaoyun Guo ◽  
David Kenneth Gifford

ABSTRACTThe representation and discovery of transcription factor (TF) sequence binding specificities is critical for understanding gene regulatory networks and interpreting the impact of disease-associated non-coding genetic variants. We present a novel TF binding motif representation, the K-mer Set Memory (KSM), which consists of a set of aligned k-mers that are over-represented at TF binding sites, and a new method called KMAC for de novo discovery of KSMs. We find that KSMs more accurately predict in vivo binding sites than position weight matrix models (PWMs) and other more complex motif models across a large set of ChIP-seq experiments. KMAC also identifies correct motifs in more experiments than four state-of-the-art motif discovery methods. In addition, KSM derived features outperform both PWM and deep learning model derived sequence features in predicting differential regulatory activities of expression quantitative trait loci (eQTL) alleles. Finally, we have applied KMAC to 1488 ENCODE TF ChIP-seq datasets and created a public resource of KSM and PWM motifs. We expect that the KSM representation and KMAC method will be valuable in characterizing TF binding specificities and in interpreting the effects of non-coding genetic variations.

2021 ◽  
Author(s):  
Meghana Kshirsagar ◽  
Han Yuan ◽  
Juan Lavista Ferres ◽  
Christina Leslie

AbstractDetermining the cell type-specific and genome-wide binding locations of transcription factors (TFs) is an important step towards decoding gene regulatory programs. Profiling by the assay for transposase-accessible chromatin using sequencing (ATAC-seq) reveals open chromatin sites that are potential binding sites for TFs but does not identify which TFs occupy a given site. We present a novel unsupervised deep learning approach called BindVAE, based on Dirichlet variational autoencoders, for jointly decoding multiple TF binding signals from open chromatin regions. Our approach automatically learns distinct groups of kmer patterns that correspond to cell type-specific in vivo binding signals. Latent factors found by BindVAE generally map to TFs that are expressed in the input cell type. BindVAE finds different TF binding sites in different cell types and can learn composite patterns for TFs involved in co-operative binding. BindVAE therefore provides a novel unsupervised approach to deconvolve the complex TF binding signals in chromatin accessible sites.


2020 ◽  
Vol 22 (Supplement_3) ◽  
pp. iii316-iii316
Author(s):  
Tatsuya Ozawa ◽  
Syuzo Kaneko ◽  
Mutsumi Takadera ◽  
Eric Holland ◽  
Ryuji Hamamoto ◽  
...  

Abstract A majority of supratentorial ependymoma is associated with recurrent C11orf95-RELA fusion (RELAFUS). The presence of RELA as one component of the RELAFUS leads to the suggestion that NF-kB activity is involved in the ependymoma formation, thus being a viable therapeutic target in these tumors. However, the oncogenic role of another C11orf95 component in the tumorigenesis is not still determined. In this study, to clarify the molecular mechanism underlying tumorigenesis of RELAFUS, we performed RELAFUS-ChIP-Seq analysis in cultured cells expressing the RELAFUS protein. Genomic profiling of RELAFUS binding sites pinpointed the transcriptional target genes directly regulated by RELAFUS. We then identified a unique DNA binding motif of the RELAFUS different from the canonical NF-kB motif in de novo motif discovery analysis. Significant responsiveness of RELAFUS but not RELA to the motif was confirmed in the reporter assay. An N-terminal portion of C11orf95 was sufficient to localize in the nucleus and recognizes the unique motif. Interestingly, the RELAFUS peaks concomitant with the unique motif were identified around the transcription start site in the RELAFUS target genes as previously reported. These observations suggested that C11orf95 might have served as a key determinant for the DNA binding sites of RELAFUS, thereby induced aberrant gene expression necessary for ependymoma formation. Our results will give insights into the development of new ependymoma therapy.


2001 ◽  
Vol 21 (23) ◽  
pp. 8117-8128 ◽  
Author(s):  
Simona Grossi ◽  
Alessandro Bianchi ◽  
Pascal Damay ◽  
David Shore

ABSTRACT Rap1p, the major telomere repeat binding protein in yeast, has been implicated in both de novo telomere formation and telomere length regulation. To characterize the role of Rap1p in these processes in more detail, we studied the generation of telomeres in vivo from linear DNA substrates containing defined arrays of Rap1p binding sites. Consistent with previous work, our results indicate that synthetic Rap1p binding sites within the internal half of a telomeric array are recognized as an integral part of the telomere complex in an orientation-independent manner that is largely insensitive to the precise spacing between adjacent sites. By extending the lengths of these constructs, we found that several different Rap1p site arrays could never be found at the very distal end of a telomere, even when correctly oriented. Instead, these synthetic arrays were always followed by a short (≈100-bp) “cap” of genuine TG repeat sequence, indicating a remarkably strict sequence requirement for an end-specific function(s) of the telomere. Despite this fact, even misoriented Rap1p site arrays promote telomere formation when they are placed at the distal end of a telomere-healing substrate, provided that at least a single correctly oriented site is present within the array. Surprisingly, these heterogeneous arrays of Rap1p binding sites generate telomeres through a RAD52-dependent fusion resolution reaction that results in an inversion of the original array. Our results provide new insights into the nature of telomere end capping and reveal one way by which recombination can resolve a defect in this process.


Blood ◽  
2005 ◽  
Vol 106 (6) ◽  
pp. 1938-1947 ◽  
Author(s):  
Tomohiko Tamura ◽  
Pratima Thotakura ◽  
Tetsuya S. Tanaka ◽  
Minoru S. H. Ko ◽  
Keiko Ozato

Abstract Interferon regulatory factor-8 (IRF-8)/interferon consensus sequence–binding protein (ICSBP) is a transcription factor that controls myeloid-cell development. Microarray gene expression analysis of Irf-8-/- myeloid progenitor cells expressing an IRF-8/estrogen receptor chimera (which differentiate into macrophages after addition of estradiol) was used to identify 69 genes altered by IRF-8 during early differentiation (62 up-regulated and 7 down-regulated). Among them, 4 lysosomal/endosomal enzyme-related genes (cystatin C, cathepsin C, lysozyme, and prosaposin) did not require de novo protein synthesis for induction, suggesting that they were direct targets of IRF-8. We developed a reporter assay system employing a self-inactivating retrovirus and analyzed the cystatin C and cathepsin C promoters. We found that a unique cis element mediates IRF-8–induced activation of both promoters. Similar elements were also found in other IRF-8 target genes with a consensus sequence (GAAANN[N]GGAA) comprising a core IRF-binding motif and an Ets-binding motif; this sequence is similar but distinct from the previously reported Ets/IRF composite element. Chromatin immunoprecipitation assays demonstrated that IRF-8 and the PU.1 Ets transcription factor bind to this element in vivo. Collectively, these data indicate that IRF-8 stimulates transcription of target genes through a novel cis element to specify macrophage differentiation.


2018 ◽  
Author(s):  
Doris Bachtrog ◽  
Chris Ellison

The repeatability or predictability of evolution is a central question in evolutionary biology, and most often addressed in experimental evolution studies. Here, we infer how genetically heterogeneous natural systems acquire the same molecular changes, to address how genomic background affects adaptation in natural populations. In particular, we take advantage of independently formed neo-sex chromosomes in Drosophila species that have evolved dosage compensation by co-opting the dosage compensation (MSL) complex, to study the mutational paths that have led to the acquisition of 100s of novel binding sites for the MSL complex in different species. This complex recognizes a conserved 21-bp GA-rich sequence motif that is enriched on the X chromosome, and newly formed X chromosomes recruit the MSL complex by de novo acquisition of this binding motif. We identify recently formed sex chromosomes in the Drosophila repleta and robusta species groups by genome sequencing, and generate genomic occupancy maps of the MSL complex to infer the location of novel binding sites. We find that diverse mutational paths were utilized in each species to evolve 100s of de novo binding motifs along the neo-X, including expansions of microsatellites and transposable element insertions. However, the propensity to utilize a particular mutational path differs between independently formed X chromosomes, and appears to be contingent on genomic properties of that species, such as simple repeat or transposable element density. This establishes the “genomic environment” as an important determinant in predicting the outcome of evolutionary adaptations.


2003 ◽  
Vol 23 (7) ◽  
pp. 2379-2394 ◽  
Author(s):  
Hisashi Tamaru ◽  
Eric U. Selker

ABSTRACT Most 5-methylcytosine in Neurospora crassa occurs in A:T-rich sequences high in TpA dinucleotides, hallmarks of repeat-induced point mutation. To investigate how such sequences induce methylation, we developed a sensitive in vivo system. Tests of various 25- to 100-bp synthetic DNA sequences revealed that both T and A residues were required on a given strand to induce appreciable methylation. Segments composed of (TAAA) n or (TTAA) n were the most potent signals; 25-mers induced robust methylation at the special test site, and a 75-mer induced methylation elsewhere. G:C base pairs inhibited methylation, and cytosines 5′ of ApT dinucleotides were particularly inhibitory. Weak signals could be strengthened by extending their lengths. A:T tracts as short as two were found to cooperate to induce methylation. Distamycin, which, like the AT-hook DNA binding motif found in proteins such as mammalian HMG-I, binds to the minor groove of A:T-rich sequences, suppressed DNA methylation and gene silencing. We also found a correlation between the strength of methylation signals and their binding to an AT-hook protein (HMG-I) and to activities in a Neurospora extract. We propose that de novo DNA methylation in Neurospora cells is triggered by cooperative recognition of the minor groove of multiple short A:T tracts. Similarities between sequences subjected to repeat-induced point mutation in Neurospora crassa and A:T-rich repeated sequences in heterochromatin in other organisms suggest that related mechanisms control silent chromatin in fungi, plants, and animals.


2008 ◽  
Vol 9 (S7) ◽  
Author(s):  
Victor Jin ◽  
Alina Rabinovich ◽  
Henny O'Geen ◽  
Sushma Iyengar ◽  
Peggy Farnham

2018 ◽  
Vol 15 (138) ◽  
pp. 20170809 ◽  
Author(s):  
Zhipeng Wang ◽  
Davit A. Potoyan ◽  
Peter G. Wolynes

Gene regulatory networks must relay information from extracellular signals to downstream genes in an efficient, timely and coherent manner. Many complex functional tasks such as the immune response require system-wide broadcasting of information not to one but to many genes carrying out distinct functions whose dynamical binding and unbinding characteristics are widely distributed. In such broadcasting networks, the intended target sites are also often dwarfed in number by the even more numerous non-functional binding sites. Taking the genetic regulatory network of NF κ B as an exemplary system we explore the impact of having numerous distributed sites on the stochastic dynamics of oscillatory broadcasting genetic networks pointing out how resonances in binding cycles control the network's specificity and performance. We also show that active kinetic regulation of binding and unbinding through molecular stripping of DNA bound transcription factors can lead to a higher coherence of gene-co-expression and synchronous clearance.


Cells ◽  
2020 ◽  
Vol 9 (12) ◽  
pp. 2690
Author(s):  
Mónica Fernández-Cortés ◽  
Eduardo Andrés-León ◽  
Francisco Javier Oliver

In highly metastatic tumors, vasculogenic mimicry (VM) involves the acquisition by tumor cells of endothelial-like traits. Poly-(ADP-ribose) polymerase (PARP) inhibitors are currently used against tumors displaying BRCA1/2-dependent deficient homologous recombination, and they may have antimetastatic activity. Long non-coding RNAs (lncRNAs) are emerging as key species-specific regulators of cellular and disease processes. To evaluate the impact of olaparib treatment in the context of non-coding RNA, we have analyzed the expression of lncRNA after performing unbiased whole-transcriptome profiling of human uveal melanoma cells cultured to form VM. RNAseq revealed that the non-coding transcriptomic landscape differed between olaparib-treated and non-treated cells: olaparib significantly modulated the expression of 20 lncRNAs, 11 lncRNAs being upregulated, and 9 downregulated. We subjected the data to different bioinformatics tools and analysis in public databases. We found that copy-number variation alterations in some olaparib-modulated lncRNAs had a statistically significant correlation with alterations in some key tumor suppressor genes. Furthermore, the lncRNAs that were modulated by olaparib appeared to be regulated by common transcription factors: ETS1 had high-score binding sites in the promoters of all olaparib upregulated lncRNAs, while MZF1, RHOXF1 and NR2C2 had high-score binding sites in the promoters of all olaparib downregulated lncRNAs. Finally, we predicted that olaparib-modulated lncRNAs could further regulate several transcription factors and their subsequent target genes in melanoma, suggesting that olaparib may trigger a major shift in gene expression mediated by the regulation lncRNA. Globally, olaparib changed the lncRNA expression landscape during VM affecting angiogenesis-related genes.


Blood ◽  
2012 ◽  
Vol 120 (21) ◽  
pp. 650-650
Author(s):  
Cailin Collins ◽  
Jingya Wang ◽  
Joel Bronstein ◽  
Jay L. Hess

Abstract Abstract 650 HOXA9 is a homeodomain-containing transcription factor that plays important roles in both development and hematopoiesis. Deregulation of HOXA9 occurs in a variety of acute lymphoid and myeloid leukemias and plays a key role in their pathogenesis. More than 50% of acute myeloid leukemia (AML) cases show up-regulation of HOXA9, which correlates strongly with poor prognosis. Nearly all cases of AML with mixed lineage leukemia (MLL) translocations have increased HOXA9 expression, as well as cases with mutation of the nucleophosmin gene NPM1, overexpression of CDX2, and fusions of NUP98. Despite the crucial role that HOXA9 plays in development, hematopoiesis and leukemia, its transcriptional targets and mechanisms of action are poorly understood. Previously we identified Hoxa9 and Meis1 binding sites in myeloblastic cells, profiled their epigenetic modifications, and identified the target genes regulated by Hoxa9. Hoxa9 and Meis1 co-bind at hundreds of promoter distal, highly evolutionarily conserved sites showing high levels of histone H3K4 monomethylation and CBP/p300 binding characteristic of enhancers. Hoxa9 association at these sites correlates strongly with increases in histone H3K27 acetylation and activation of downstream target genes, including many proleukemic gene loci. De novo motif analysis of Hoxa9 binding sites shows a marked enrichment of motifs for the transcription factors in the C/EBP and ETS families, and C/ebpα and the ETS transcription factor Pu.1 were found to cobind at Hoxa9-regulated enhancers. Both C/ebpα and Pu.1 are known to play critical roles in the establishment of functional enhancers during normal myeloid development and are mutated or otherwise deregulated in various myeloid leukemias. To determine the importance of co-association of Hoxa9, C/ebpα and Pu.1 at myeloid enhancers, we generated cell lines from C/ebpα and Pu.1 conditional knockout mice (kindly provided by Dr. Daniel Tenen, Harvard University) by immortalization with Hoxa9 and Meis1. In addition we transformed bone marrow with a tamoxifen-regulated form of Hoxa9. Strikingly, loss of C/ebpα or Pu.1, or inactivation of Hoxa9, blocks proliferation and leads to myeloid differentiation. ChIP experiments show that both C/ebpα and Pu.1 remain bound to Hoxa9 binding sites in the absence of Hoxa9. After the loss of Pu.1, both Hoxa9 and C/ebpα dissociate from Hoxa9 binding sites with a corresponding decrease in target gene expression. In contrast, loss of C/ebpα does not lead to an immediate decrease in either Hoxa9 or Pu.1 binding, suggesting that C/ebpα may be playing a regulatory as opposed to a scaffolding role at enhancers. Current work focuses on performing ChIP-seq analysis to assess how C/ebpα and Pu.1 affect Hoxa9 and Meis1 binding and epigenetic modifications genome-wide, and in vivo leukemogenesis assays to confirm the requirement of both Pu.1 and C/ebpα in the establishment and maintenance of leukemias with high levels of Hoxa9. Collectively, our findings implicate C/ebpα and Pu.1 as members of a critical transcription factor network required for Hoxa9-mediated transcriptional regulation in leukemia. Disclosures: No relevant conflicts of interest to declare.


Sign in / Sign up

Export Citation Format

Share Document