Exploiting XGBoost For Predicting Enhancer-Promoter Interactions.

2020 ◽  
Vol 15 ◽  
Author(s):  
Xiaojuan Yu ◽  
Jianguo Zhou ◽  
Mingming Zhao ◽  
Chao Yi ◽  
Qing Duan ◽  
...  

: It is well-known that gene expression and disease control are co-regulated by the interaction between the distal enhancer and the proximal promoter, and the study of enhancer promoter interactions (EPIs) can help us to gain insight into the genetic basis of diseases. Although the recent emergence of some high-throughput sequencing methods have given us a deeper understanding of EPIs, accurate prediction of EPIs still have some limitations. In this paper, we trained a XGBoost based model and introduced two sets of features (i.e. epigenomic and sequence feature) to predict the interactions between the enhancer and the promoter in different cell lines. We compared XGBoost with the other four methods. Extensive experimental results have shown that XGBoost based method is effective in predicting EPIs across three cell lines. Especially epigenomic and sequence features can boost prediction.

2007 ◽  
Vol 19 (1) ◽  
pp. 227 ◽  
Author(s):  
J. A. Dahl ◽  
C. K. Taranger ◽  
P. Collas

Interactions between proteins and DNA are essential for cellular functions such as genomic stability, DNA replication and repair, chromosome segregation, transcription, and epigenetic silencing of gene expression. Chromatin immunoprecipitation (ChIP) is a key technique for mapping histone modifications and transcription factor binding on DNA and thereby unraveling the role of epigenetics in the regulation of gene expression. Current ChIP protocols require extensive sample handling and large numbers of cells (5-10 million). primarily owing to ample loss of material during the procedure. We altered critical steps of conventional ChIP to develop a quick and quantitative (Q2) ChIP assay suitable for cell numbers 100- to 1000-fold lower than those required for conventional ChIP. Key modifications of the ChIP procedure include (i) formaldehyde DNA–protein cross-linking in suspended cells, (ii) cross-linking in the presence of 20 mM sodium butyrate to enhance specificity of precipitation of acetylated histones, (iii) transfer of washed precipitated immune complexes to a clean tube ('tube shift') to increase ChIP specificity by virtually eliminating nonspecifically bound chromatin, and (iv) combination of cross-link reversal, protein digestion, and DNA elution into a single 2-h step. We used Q2ChIP to monitor changes in 6 histone H3 modifications on the human developmentally regulated genes OCT4 (POU5F1), NANOG, and LMNA (lamin A) in the context of retinoic acid (RA)-mediated differentiation of embryonal carcinoma cells and upon reprogramming of kidney epithelial 293T cells to pluripotency in carcinoma cell extract (Taranger et al. 2005 Mol. Biol. Cell 16, 5719–5735). Real-time PCR analysis of precipitated DNA unravels an unexpected two-step heterochromatin assembly elicited by RA on the OCT4 proximal promoter, proximal enhancer, and distal enhancer, and on the NANOG promoter, whereby methylation of H3K9 and H3K27 is followed by H3K9 deacetylation. H3K4 di- and trimethylation remain relatively unaffected by RA treatment. In contrast, reprogramming of 293T cells in carcinoma extract promotes assembly of histone marks characteristic of transcriptional induction of OCT4 and NANOG, such as acetylation and demethylation of H3K9. The results argue toward ordered chromatin repackaging at developmentally regulated promoters upon differentiation or, conversely, nuclear reprogramming to pluripotency.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Wei Song ◽  
Roded Sharan ◽  
Ivan Ovcharenko

Abstract Background Robustness and evolutionary stability of gene expression in the human genome are established by an array of redundant enhancers. Results Using Hi-C data in multiple cell lines, we report a comprehensive map of promoters and active enhancers connected by chromatin contacts, spanning 9000 enhancer chains in 4 human cell lines associated with 2600 human genes. We find that the first enhancer in a chain that directly contacts the target promoter is commonly located at a greater genomic distance from the promoter than the second enhancer in a chain, 96 kb vs. 45 kb, respectively. The first enhancer also features higher similarity to the promoter in terms of tissue specificity and higher enrichment of loop factors, suggestive of a stable primary contact with the promoter. In contrast, a chain of enhancers which connects to the target promoter through a neutral DNA segment instead of an enhancer is associated with a significant decrease in target gene expression, suggesting an important role of the first enhancer in initiating transcription using the target promoter and bridging the promoter with other regulatory elements in the locus. Conclusions The widespread chained structure of gene enhancers in humans reveals that the primary, critical enhancer is distal, commonly located further away than other enhancers. This first, distal enhancer establishes contacts with multiple regulatory elements and safeguards a complex regulatory program of its target gene.


Blood ◽  
2009 ◽  
Vol 113 (10) ◽  
pp. 2145-2153 ◽  
Author(s):  
Christine M. Hartford ◽  
Shiwei Duan ◽  
Shannon M. Delaney ◽  
Shuangli Mi ◽  
Emily O. Kistner ◽  
...  

Abstract Cytarabine arabinoside (ara-C) is an antimetabolite used to treat hematologic malignancies. Resistance is a common reason for treatment failure with adverse side effects contributing to morbidity and mortality. Identification of genetic factors important in susceptibility to ara-C cytotoxicity may allow for individualization of treatment. We used an unbiased whole-genome approach using lymphoblastoid cell lines derived from persons of European (CEU) or African (YRI) ancestry to identify these genetic factors. We interrogated more than 2 million single nucleotide polymorphisms (SNPs) for association with susceptibility to ara-C and narrowed our focus by concentrating on SNPs that affected gene expression. We identified a unique pharmacogenetic signature consisting of 4 SNPs explaining 51% of the variability in sensitivity to ara-C among the CEU and 5 SNPs explaining 58% of the variation among the YRI. Population-specific signatures were secondary to either (1) polymorphic SNPs in one population but monomorphic in the other, or (2) significant associations of SNPs with cytotoxicity or gene expression in one population but not the other. We validated the gene expression-cytotoxicity relationship for a subset of genes in a separate group of lymphoblastoid cell lines. These unique genetic signatures comprise novel genes that can now be studied further in functional studies.


2020 ◽  
Author(s):  
Qi Jiang ◽  
Guifang Du ◽  
Junting Wang ◽  
XiaoHan Tang ◽  
Xuejun wang ◽  
...  

Abstract Background:Angiotensin-converting enzyme 2 (ACE2) has been confirmed to be a receptor for the newly discovered severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). However, cell surface ACE2 expression is reported to be inconsistent with clinical tissue tropism of SARS-CoV-2, which complicates understanding of the pathogenesis of 2019 novel coronavirus disease (COVID-19). The consumption of ACE2 by internalization and shedding processes may explain this discordance. Results:To understand the discordance between ACE2 expression and the tissue tropism of SARS-CoV-2, we examined the chromatin accessibility of ACE2 promoter in hundreds of tissues and cell lines using public DNase-seq and assay for transposase-accessible chromatin with high throughput sequencing (ATAC-seq) data. We find that ACE2 promoter is only accessible in three tissues including lung, large intestine and placenta. Also, we examined tumors tissues and ACE2 promoter is observed accessible in five tumors with reported SARS-CoV-2 susceptibility. We confirmed the susceptibility by performing SARS-CoV-2 pseudovirus infection in several cell lines. Conclusions:We propose that open chromatin at the promoter mediates the ACE2 supplementary effect and ensures the entry of SARS-CoV-2. This hypothesis provides a new view and potential clues for further investigation of COVID-19 pathogenesis.


Development ◽  
2002 ◽  
Vol 129 (19) ◽  
pp. 4571-4580 ◽  
Author(s):  
Lydia Teboul ◽  
Juliette Hadchouel ◽  
Philippe Daubas ◽  
Dennis Summerbell ◽  
Margaret Buckingham ◽  
...  

Vertebrate myogenesis is controlled by four transcription factors known as the myogenic regulatory factors (MRFs): Myf5, Mrf4, myogenin and MyoD. During mouse development Myf5 is the first MRF to be expressed and it acts by integrating multiple developmental signals to initiate myogenesis. Numerous discrete regulatory elements are involved in the activation and maintenance of Myf5 gene expression in the various muscle precursor populations, reflecting the diversity of the signals that control myogenesis. Here we focus on the enhancer that recapitulates the first phase of Myf5 expression in the epaxial domain of the somite, in order to identify the subset of cells that first transcribes the gene and therefore gain insight into molecular, cellular and anatomical facets of early myogenesis. Deletion of this enhancer from a YAC reporter construct that recapitulates the Myf5 expression pattern demonstrates that this regulatory element is necessary for expression in the early epaxial somite but in no other site of myogenesis. Importantly, Myf5 is subsequently expressed in the epaxial myotome under the control of other elements located far upstream of the gene. Our data suggest that the inductive signals that control Myf5 expression switch rapidly from those that impinge on the early epaxial enhancer to those that impinge on the other enhancers that act later in the epaxial somite, indicating that there are significant changes in either the signalling environment or the responsiveness of the cells along the rostrocaudal axis. We propose that the first phase of Myf5 epaxial expression, driven by the early epaxial enhancer in the dermomyotome, is necessary for early myotome formation, while the subsequent phases are associated with cytodifferentiation within the myotome.


Blood ◽  
2010 ◽  
Vol 116 (21) ◽  
pp. 4643-4643
Author(s):  
Maria Adamaki ◽  
George I Lambrou ◽  
Aspasia Divane ◽  
Maria Moschovi

Abstract Abstract 4643 Introduction: Childhood AL is characterized by abnormal chromosomal translocations and gene fusions. The most common cytogenetic aberrations are: TEL/AML1 (25%), E2A/PBX1 (4.8%), BCR/ABL (1.6%), MLL rearrangements (1.6%) and AML1/ETO (exclusively found in acute myeloid leukemia (AML)). Their high prognostic significance determines the intensification (eg. for MLL rearrangements) or de-intensification (eg. for TEL/AML1) of therapy, even though the mechanisms through which these fusions affect the prognostic outcome are largely unknown. However, the majority (~70%) of childhood patients is negative for these translocations and, even though most present with numeric cytogenetic abnormalities (eg. aneuploidies) at diagnosis, they are treated with the same therapeutic protocols as those with known translocations. Only BCR/ABL+ AL patients receive individualized treatment (addition of imatinib mesylate) that offers a better prognostic outcome, thus representing an example of how childhood leukemia can benefit from individualized therapy. The study of the genes implicated in leukemic lesions can give insight into the pathogenesis of leukemia and hence provide new diagnostic markers and therapeutic targets. It is therefore fundamental to make scientific hypotheses based on the behaviour of such genes and then perform experiments to validate or reject them. In the present study we investigate the expression of 4 such genes: HOXA9 and MEIS1, implicated in both normal hematopoiesis and MLL+ leukemias; AML1, an important regulator of B-cell differentiation and the most frequent target of chromosomal translocations; and IRF4, a transcription factor of lymphocyte differentiation, also implicated in leukemias and lymphomas. The aim of this study is to examine whether there is a correlation between the expression of the above genes and the leukemic cytogenetic profile. Materials and Methods: RNA was extracted with Trizol (Invitrogen Inc.) from bone marrow samples of 43 newly diagnosed children with leukemia (39 acute lymphoblastic leukemia (ALL) and 4 AML), 20 healthy children (controls) and from 4 cell lines, each representing a different type of childhood leukemia: CCRF-CEM (T-cell ALL), CCRF-SB (B-cell ALL), Reh (non-T, non-B ALL) and THP-1 (acute monocytic leukemia). Gene expression was investigated with Real-Time Reverse Transcription PCR (qRT-PCR), using the Plexor one-step qRT-PCR kit (Promega Inc.). Fluorescence in situ hybridization (FISH) was performed with dual-color probes (Abbott Molecular, Inc.). Results: Cytogenetic data were available for 41 patients (pts): TEL/AML1+: 5 pts (ALL) (12.2%), MLL+: 4 pts (3 ALL and 1 AML) (9.7%), BCR/ABL+: 3 pts (ALL) (7.3%), E2A/PBX+: 2 pts (ALL) (4.8%), AML1/ETO+: 1 pt (AML); the remaining 26 patients (63%) were negative for these fusions but 6 (14.6%) had extra copies of the AML1 gene. Gene expression was as follows: HOXA9: up-regulated in 20 patients (46.51%) and the CCRF-SB, Reh and THP-1 cell lines; MEIS1: up-regulated in 19 patients (44.19%); AML1: up-regulated in 21 patients (48.84%) and all 4 cell lines; IRF4: up-regulated in 25 patients (58.14%) and down-regulated in the CCRF-SB cell line. Simultaneous co-expression of HOXA9 and MEIS1 was present in 15 patients (34.88%). Of the 6 patients having extra copies of AML1, only one showed over-expression of the gene. Conclusions: In our group of patients the frequency of the TEL/AML1 fusion appears to be less, whereas the frequency of the MLL rearrangement appears to be higher, than that reported for western European countries. We find all of the 4 genes studied significantly up-regulated in certain groups of patients, as compared to controls, regardless of leukemic subtype. Up-regulation of HOXA9 and MEIS1 did not appear to be an exclusive characteristic of the MLL+ group. In the case of the AML1 gene, we find that gene amplification does not correlate with over-expression of the gene. In addition, IRF4, a known tumor suppressor gene which would be expected to be down-regulated in newly diagnosed leukemias, seems to be the one most frequently up-regulated (58.14% of patients). Our findings lead us to suggest that there might be common patterns of aberrant gene expression among childhood leukemia patients regardless of cytogenetic subtype. Such patterns could be further investigated in an attempt to gain insight into the etiology of leukemogenesis. Disclosures: No relevant conflicts of interest to declare.


2007 ◽  
Vol 35 (6) ◽  
pp. 1629-1633 ◽  
Author(s):  
W.A. Miller ◽  
Z. Wang ◽  
K. Treder

Many plant viral RNAs lack the 5′-cap structure that is required on all host mRNAs for interacting with essential translation initiation factors. Instead, uncapped viral RNAs take over the host translation machinery by harbouring sequences that functionally replace the 5′-cap. Recent reports reveal at least eight different classes of CITE (cap-independent translation element) located in the 3′-UTRs (untranslated regions) of various viruses. We describe how the structure and behaviour of each class of element differs from the other classes, suggesting that they recruit translation factors and, ultimately, the ribosome by diverse mechanisms. These results greatly expand our understanding of ways in which mRNAs can recruit ribosomes, and they provide insight into the regulation of virus gene expression.


Author(s):  
Monique Rijnkels ◽  
Elena Kabotyanski ◽  
Amy Shore ◽  
Jeffrey M. Rosen

AbstractFor several decades, the regulation of casein gene expression by the lactogenic hormones, prolactin and glucocorticoids, has provided an excellent model system in which to study how steroid and peptide hormones regulate gene expression. Early studies of casein gene regulation defined conserved sequence elements in the 5′ flanking region of these genes, including one of which was identified as a γ-interferon activation sequence (GAS). Although this site was thought to interact with a mammary gland-specific factor, purification and cloning of this factor by Bernd Groner and his colleagues revealed it was instead a new member of the signal transducers and activators of transcription family, Stat5, which was expressed in many tissues. The exquisite tissue-specific expression of the casein genes was subsequently shown to depend not on a single transcription factor but on composite response elements that interacted with a number of ubiquitous transcription factors in response to the combinatorial effects of peptide and steroid hormone signaling. More recent studies have defined cooperative effects of prolactin and glucocorticoids as well as antagonistic effects of progesterone on the chromatin structure of both the casein gene proximal promoter region as well as a distal enhancer. Local chromatin modifications as well as long-range interactions facilitated by DNA looping are required for the hormonal regulation of β-


Author(s):  
Yuka Ono ◽  
Kohsuke Kataoka

Glucose transporter type 2 (GLUT2), encoded by the SLC2A2 gene, is an essential component of glucose-stimulated insulin secretion in pancreatic islet β-cells. Like that of the gene encoding insulin, expression of the SLC2A2 gene expression is closely linked to β-cell functionality in rodents, but the mechanism by which β-cell-specific expression of SLC2A2 is controlled remains unclear. In this report, to identify putative enhancer elements of the mouse Slc2a2 gene, we examined evolutional conservation of the nucleotide sequence of its genomic locus, together with ChIP-seq data of histone modifications and various transcription factors published in previous studies. Using luciferase reporter assays, we found that an evolutionarily conserved region located approximately 40 kbp downstream of the transcription start site of Slc2a2 functions as an active enhancer in the MIN6 β-cell line. We also found that three β-cell-enriched transcription factors, MafA, NeuroD1, and HNF1β, synergistically activate transcription through this 3’ downstream distal enhancer (ECR3’) and the proximal promoter region of the gene. Our data also indicate that the simultaneous binding of HNF1β to its target sites within the promoter and ECR3’ of Slc2a2 is indispensable for transcriptional activation, and that binding of MafA and NeuroD1 to their respective target sites within the ECR3’ enhances transcription. Co-immunoprecipitation experiments suggested that MafA, NeuroD1, and HNF1β interact with each other. Overall, these results suggest that promoter-enhancer communication through MafA, NeuroD1, and HNF1β is critical for Slc2a2 gene expression. These findings provide clues to help elucidate the mechanism of regulation of Slc2a2 gene expression in β-cells.


2019 ◽  
Author(s):  
Thomas Yssing Michaelsen ◽  
Jakob Brandt ◽  
Caitlin Singleton ◽  
Rasmus Hansen Kirkegaard ◽  
Nicola Segata ◽  
...  

AbstractHigh-throughput sequencing has allowed unprecedented insight into the composition and function of complex microbial communities. With the onset of metatranscriptomics, it is now possible to interrogate the transcriptome of multiple organisms simultaneously to get an overview of the gene expression of the entire community. Studies have successfully used metatranscriptomics to identify and describe relationships between gene expression levels and community characteristics. However, metatranscriptomic datasets contain a rich suite of additional information which is just beginning to be explored. In this minireview we discuss the different computational strategies for handling antisense expression in metatranscriptomic samples and highlight their potentially detrimental effects on downstream analysis and interpretation. We also surveyed the antisense transcriptome of multiple genomes and metagenome-assembled genomes (MAGs) from five different datasets and found high variability in the level of antisense transcription for individual species which were consistent across samples. Importantly, we tested the hypothesis that antisense transcription is primarily the product of transcriptional noise and found mixed support, suggesting that the total observed antisense RNA in complex communities arises from a compounded effect of both random, biological and technical factors. Antisense transcription can provide a rich set of information, from technical details about data quality to novel insight into the biology of complex microbial communities.Key pointsSeveral fundamentally different approaches are used to handle antisense RNAPrevalence of antisense RNA is highly variable between communities, genomes, and genes.Antisense RNA is likely an opaque mixture of technical, biological and random effects


Sign in / Sign up

Export Citation Format

Share Document