Genome-wide oscillations in G + C density and sequence conservation

2021 ◽  
Author(s):  
Zarmik Moqtaderi ◽  
Susan Brown ◽  
Welcome Bender

Eukaryotic genomes typically show a uniform G + C content among chromosomes, but on smaller scales, many species have a G + C density that fluctuates with a characteristic wavelength. This oscillation is evident in many insect species, with wavelengths ranging between 700 bp and 4 kb. Measures of evolutionary conservation oscillate in phase with G + C content, with conserved regions having higher G + C. Loci with large regulatory regions show more regular oscillations; coding sequences and heterochromatic regions show little or no oscillation. There is little oscillation in vertebrate genomes in regions with densely distributed mobile repetitive elements. However, species with few repeats show oscillation in both G + C density and sequence conservation. These oscillations may reflect optimal spacing of cis-regulatory elements.

Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 654-654
Author(s):  
Salima Benbarche ◽  
Cécile K Lopez ◽  
Thomas Mercher ◽  
Camille Lobry

Abstract In the recent years, massively parallel sequencing approaches allowed the identification of hundreds of mutated genes in Leukemia. Although these data gave unprecedented amount of information about mechanisms of leukemia cell maintenance and/or progression, the functional characterization of genes that are key player in regulating cancer development remain laborious. Analysis at the single gene level often fails to identify gene or pathway collaborations leading to transformation. Studies aimed at depicting new oncogene cooperation would involve the generation of challenging mouse models or the deployment of tedious screening pipelines, which would be inadequate to depict new oncogene circuitry in cancer. Genome wide mapping of epigenetic modifications on histone tails or binding of factors such as MED1 and BRD4 allowed identification of clusters of regulatory elements, also termed as Super Enhancers. Functional annotation of these regions revealed their high relevance during normal hematopoiesis and Leukemogenesis. We hypothesized that these regulatory regions could regulate simultaneously expression of genes cooperating to promote Leukemia development. We thus developed a novel genome-wide CRISPRi-based screening approach to directly target these regulatory regions. CRISPRi technology relies on the use of deactivated Cas9 that can't cut the DNA and that is fused to the repressive KRAB domain (dCas9-KRAB). Therefore, properly targeted dCas9-KRAB by single guide RNAs will recruit chromatin modifying factors and trigger generation of heterochromatin thus inhibiting enhancer function. We performed this screen using acute megakaryoblastic leukemia model driven by the CBFA2T3-GLIS2 fusion, the most frequent fusion oncogene in this disease that we recently identified as being associated with Super Enhancers (Thirant et al, Cancer Cell 2017). To inhibit Super Enhancer activity we integrated ChIP-seq data of H3K27ac and ATAC-seq data to define open chromatin regions located in Super Enhancers. We designed a library of 7995 single guide RNAs targeting 450 Super Enhancer regions found active in CBFA2T3-GLIS2 bearing cell line M07e and primary AMKL patient samples. This screening methodology allowed us to nominate Super Enhancer regions, which are functionally linked to leukemia progression. In particular, we pinpointed a novel Super Enhancer region, induced by CBFA2T3-GLIS2 fusion, regulating the expression of both tyrosine kinases associated receptors KIT and PDGFRA. We were able to show that this Super Enhancer region is normally not active in normal megakaryocytic development and aberrantly induced by CBFA2T3-GLIS2 expression. RNA-sequencing experiments and 4C-seq experiments (chromatin conformation capture) showed that this Super Enhancer is directly regulating KIT and PDGFRA expression. Whereas single inhibition of these genes using shRNA or small molecule inhibitors affects modestly leukemic cell growth, concomitant inhibition of these two receptors synergizes to impair AMKL cell lines and primary patient cells growth and survival. In vivo targeting of this Super Enhancer activity in patient-derived xenograft models using CRISPRi showed significant reduction of tumor burden and increased overall survival. Our results demonstrate that genome-wide screening of regulatory DNA elements can identify co-regulated genes collaborating to promote leukemia progression and could open new avenues for the design of combination therapies. Reference: Thirant C, Ignacimouttou C, Lopez CK, Diop M, Le Mouël L, Thiollier C, Siret A, Dessen P, Aid Z, Rivière J, Rameau P, Lefebvre C, Khaled M, Leverger G, Ballerini P, Petit A, Raslova H, Carmichael CL, Kile BT, Soler E, Crispino JD, Wichmann C, Pflumio F, Schwaller J, Vainchenker W, Lobry C, Droin N, Bernard OA, Malinge S, Mercher T (2017). ETO2-GLIS2 Hijacks Transcriptional Complexes to Drive Cellular Identity and Self-Renewal in Pediatric Acute Megakaryoblastic Leukemia. Cancer Cell. 31(3):452-465. Disclosures No relevant conflicts of interest to declare.


2021 ◽  
Author(s):  
Ramzan Umarov ◽  
Yu Li ◽  
Takahiro Arakawa ◽  
Satoshi Takizawa ◽  
Xin Gao ◽  
...  

Regulatory elements control gene expression through transcription initiation (promoters) and by enhancing transcription at distant regions (enhancers). Accurate identification of regulatory elements is fundamental for annotating genomes and understanding gene expression patterns. While there are many attempts to develop computational promoter and enhancer identification methods, reliable tools to analyze long genomic sequences are still lacking. Prediction methods often perform poorly on the genome-wide scale because the number of negatives is much higher than that in the training sets. To address this issue, we propose a dynamic negative set updating scheme with a two-model approach, using one model for scanning the genome and the other one for testing candidate positions. The developed method achieves good genome-level performance and maintains robust performance when applied to other species, without re-training. Moreover, the unannotated predicted regulatory regions made on the human genome are enriched for disease-associated variants, suggesting them to be potentially true regulatory elements rather than false positives. We validated high scoring "false positive" predictions using reporter assay and all tested candidates were successfully validated, demonstrating the ability of our method to discover novel human regulatory regions.


2017 ◽  
Author(s):  
Xinchen Wang ◽  
Liang He ◽  
Sarah Goggin ◽  
Alham Saadat ◽  
Li Wang ◽  
...  

AbstractGenome-wide epigenomic maps revealed millions of regions showing signatures of enhancers, promoters, and other gene-regulatory elements1. However, high-throughput experimental validation of their function and high-resolution dissection of their driver nucleotides remain limited in their scale and length of regions tested. Here, we present a new method, HiDRA (High-Definition Reporter Assay), that overcomes these limitations by combining components of Sharpr-MPRA2 and STARR-Seq3 with genome-wide selection of accessible regions from ATAC-Seq4. We used HiDRA to test ~7 million DNA fragments preferentially selected from accessible chromatin in the GM12878 lymphoblastoid cell line. By design, accessibility-selected fragments were highly overlapping (up to 370 per region), enabling us to pinpoint driver regulatory nucleotides by exploiting subtle differences in reporter activity between partially-overlapping fragments, using a new machine learning model SHARPR2. Our resulting maps include ~65,000 regions showing significant enhancer function and enriched for endogenous active histone marks (including H3K9ac, H3K27ac), regulatory sequence motifs, and regions bound by immune regulators. Within them, we discover ~13,000 high-resolution driver elements enriched for regulatory motifs and evolutionarily-conservednucleotides, and help predict causal genetic variants underlying disease from genome-wide association studies. Overall, HiDRA provides a general, scalable, high-throughput, and high-resolution approach for experimental dissection of regulatory regions and driver nucleotides in the context of human biology and disease.


2021 ◽  
Vol 17 (9) ◽  
pp. e1009376
Author(s):  
Ramzan Umarov ◽  
Yu Li ◽  
Takahiro Arakawa ◽  
Satoshi Takizawa ◽  
Xin Gao ◽  
...  

Regulatory elements control gene expression through transcription initiation (promoters) and by enhancing transcription at distant regions (enhancers). Accurate identification of regulatory elements is fundamental for annotating genomes and understanding gene expression patterns. While there are many attempts to develop computational promoter and enhancer identification methods, reliable tools to analyze long genomic sequences are still lacking. Prediction methods often perform poorly on the genome-wide scale because the number of negatives is much higher than that in the training sets. To address this issue, we propose a dynamic negative set updating scheme with a two-model approach, using one model for scanning the genome and the other one for testing candidate positions. The developed method achieves good genome-level performance and maintains robust performance when applied to other vertebrate species, without re-training. Moreover, the unannotated predicted regulatory regions made on the human genome are enriched for disease-associated variants, suggesting them to be potentially true regulatory elements rather than false positives. We validated high scoring “false positive” predictions using reporter assay and all tested candidates were successfully validated, demonstrating the ability of our method to discover novel human regulatory regions.


Author(s):  
Yanrong Ji ◽  
Zhihan Zhou ◽  
Han Liu ◽  
Ramana V Davuluri

Abstract Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. Results To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. Availability and implementation The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT). Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Sarah E. Pierce ◽  
Jeffrey M. Granja ◽  
William J. Greenleaf

AbstractChromatin accessibility profiling can identify putative regulatory regions genome wide; however, pooled single-cell methods for assessing the effects of regulatory perturbations on accessibility are limited. Here, we report a modified droplet-based single-cell ATAC-seq protocol for perturbing and evaluating dynamic single-cell epigenetic states. This method (Spear-ATAC) enables simultaneous read-out of chromatin accessibility profiles and integrated sgRNA spacer sequences from thousands of individual cells at once. Spear-ATAC profiling of 104,592 cells representing 414 sgRNA knock-down populations reveals the temporal dynamics of epigenetic responses to regulatory perturbations in cancer cells and the associations between transcription factor binding profiles.


Lab Animal ◽  
2020 ◽  
Vol 50 (1) ◽  
pp. 17-17
Author(s):  
Alexandra Le Bras

Cancers ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 2547
Author(s):  
Keunsoo Kang ◽  
Yoonjung Choi ◽  
Hyeonjin Moon ◽  
Chaelin You ◽  
Minjin Seo ◽  
...  

RAD51 is a recombinase that plays a pivotal role in homologous recombination. Although the role of RAD51 in homologous recombination has been extensively studied, it is unclear whether RAD51 can be involved in gene regulation as a co-factor. In this study, we found evidence that RAD51 may contribute to the regulation of genes involved in the autophagy pathway with E-box proteins such as USF1, USF2, and/or MITF in GM12878, HepG2, K562, and MCF-7 cell lines. The canonical USF binding motif (CACGTG) was significantly identified at RAD51-bound cis-regulatory elements in all four cell lines. In addition, genome-wide USF1, USF2, and/or MITF-binding regions significantly coincided with the RAD51-associated cis-regulatory elements in the same cell line. Interestingly, the promoters of genes associated with the autophagy pathway, such as ATG3 and ATG5, were significantly occupied by RAD51 and regulated by RAD51 in HepG2 and MCF-7 cell lines. Taken together, these results unveiled a novel role of RAD51 and provided evidence that RAD51-associated cis-regulatory elements could possibly be involved in regulating autophagy-related genes with E-box binding proteins.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ruifeng Cui ◽  
Xiaoge Wang ◽  
Waqar Afzal Malik ◽  
Xuke Lu ◽  
Xiugui Chen ◽  
...  

Abstract Background The Raffinose synthetase (RAFS) genes superfamily is critical for the synthesis of raffinose, which accumulates in plant leaves under abiotic stress. However, it remains unclear whether RAFS contributes to resistance to abiotic stress in plants, specifically in the Gossypium species. Results In this study, we identified 74 RAFS genes from G. hirsutum, G. barbadense, G. arboreum and G. raimondii by using a series of bioinformatic methods. Phylogenetic analysis showed that the RAFS gene family in the four Gossypium species could be divided into four major clades; the relatively uniform distribution of the gene number in each species ranged from 12 to 25 based on species ploidy, most likely resulting from an ancient whole-genome polyploidization. Gene motif analysis showed that the RAFS gene structure was relatively conservative. Promoter analysis for cis-regulatory elements showed that some RAFS genes might be regulated by gibberellins and abscisic acid, which might influence their expression levels. Moreover, we further examined the functions of RAFS under cold, heat, salt and drought stress conditions, based on the expression profile and co-expression network of RAFS genes in Gossypium species. Transcriptome analysis suggested that RAFS genes in clade III are highly expressed in organs such as seed, root, cotyledon, ovule and fiber, and under abiotic stress in particular, indicating the involvement of genes belonging to clade III in resistance to abiotic stress. Gene co-expressed network analysis showed that GhRFS2A-GhRFS6A, GhRFS6D, GhRFS7D and GhRFS8A-GhRFS11A were key genes, with high expression levels under salt, drought, cold and heat stress. Conclusion The findings may provide insights into the evolutionary relationships and expression patterns of RAFS genes in Gossypium species and a theoretical basis for the identification of stress resistance materials in cotton.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Zihan Cheng ◽  
Xuemei Zhang ◽  
Wenjing Yao ◽  
Kai Zhao ◽  
Lin Liu ◽  
...  

Abstract Background The Late Embryogenesis-Abundant (LEA) gene families, which play significant roles in regulation of tolerance to abiotic stresses, widely exist in higher plants. Poplar is a tree species that has important ecological and economic values. But systematic studies on the gene family have not been reported yet in poplar. Results On the basis of genome-wide search, we identified 88 LEA genes from Populus trichocarpa and renamed them as PtrLEA. The PtrLEA genes have fewer introns, and their promoters contain more cis-regulatory elements related to abiotic stress tolerance. Our results from comparative genomics indicated that the PtrLEA genes are conserved and homologous to related genes in other species, such as Eucalyptus robusta, Solanum lycopersicum and Arabidopsis. Using RNA-Seq data collected from poplar under two conditions (with and without salt treatment), we detected 24, 22 and 19 differentially expressed genes (DEGs) in roots, stems and leaves, respectively. Then we performed spatiotemporal expression analysis of the four up-regulated DEGs shared by the tissues, constructed gene co-expression-based networks, and investigated gene function annotations. Conclusion Lines of evidence indicated that the PtrLEA genes play significant roles in poplar growth and development, as well as in responses to salt stress.


Sign in / Sign up

Export Citation Format

Share Document