scholarly journals Genome-wide cis-decoding for expression designing in tomato using cistrome data and explainable deep learning

2021 ◽  
Author(s):  
Takashi Akagi ◽  
Kanae Masuda ◽  
Eriko Kuwada ◽  
Kouki Takeshita ◽  
Taiji Kawakatsu ◽  
...  

In the evolutionary paths of plants, variations of the cis-regulatory elements (CREs) resulting in expression diversification have played a central role in driving the establishment of lineage-specific traits. However, it is difficult to predict expression behaviors from the CRE patterns to properly harness them, mainly because the biological processes are complex. In this study, we used cistrome datasets and explainable convolutional neural network (CNN) frameworks to predict genome-wide expression patterns in tomato fruits from the DNA sequences in gene regulatory regions. By fixing the effects of trans-elements using single cell-type spatiotemporal transcriptome data for the response variables, we developed a prediction model of a key expression pattern for the initiation of tomato fruit ripening. Feature visualization of the CNNs identified nucleotide residues critical to the objective expression pattern in each gene and their effects, were validated experimentally in ripening tomato fruits. This cis-decoding framework will not only contribute to understanding the regulatory networks derived from CREs and transcription factor interactions, but also provide a flexible way of designing alleles with optimized expression.

Author(s):  
Yanrong Ji ◽  
Zhihan Zhou ◽  
Han Liu ◽  
Ramana V Davuluri

Abstract Motivation Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. Results To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. Availability and implementation The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT). Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ruifeng Cui ◽  
Xiaoge Wang ◽  
Waqar Afzal Malik ◽  
Xuke Lu ◽  
Xiugui Chen ◽  
...  

Abstract Background The Raffinose synthetase (RAFS) genes superfamily is critical for the synthesis of raffinose, which accumulates in plant leaves under abiotic stress. However, it remains unclear whether RAFS contributes to resistance to abiotic stress in plants, specifically in the Gossypium species. Results In this study, we identified 74 RAFS genes from G. hirsutum, G. barbadense, G. arboreum and G. raimondii by using a series of bioinformatic methods. Phylogenetic analysis showed that the RAFS gene family in the four Gossypium species could be divided into four major clades; the relatively uniform distribution of the gene number in each species ranged from 12 to 25 based on species ploidy, most likely resulting from an ancient whole-genome polyploidization. Gene motif analysis showed that the RAFS gene structure was relatively conservative. Promoter analysis for cis-regulatory elements showed that some RAFS genes might be regulated by gibberellins and abscisic acid, which might influence their expression levels. Moreover, we further examined the functions of RAFS under cold, heat, salt and drought stress conditions, based on the expression profile and co-expression network of RAFS genes in Gossypium species. Transcriptome analysis suggested that RAFS genes in clade III are highly expressed in organs such as seed, root, cotyledon, ovule and fiber, and under abiotic stress in particular, indicating the involvement of genes belonging to clade III in resistance to abiotic stress. Gene co-expressed network analysis showed that GhRFS2A-GhRFS6A, GhRFS6D, GhRFS7D and GhRFS8A-GhRFS11A were key genes, with high expression levels under salt, drought, cold and heat stress. Conclusion The findings may provide insights into the evolutionary relationships and expression patterns of RAFS genes in Gossypium species and a theoretical basis for the identification of stress resistance materials in cotton.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Zhixuan Du ◽  
Qitao Su ◽  
Zheng Wu ◽  
Zhou Huang ◽  
Jianzhong Bao ◽  
...  

AbstractMultidrug and toxic compound extrusion (MATE) proteins are involved in many physiological functions of plant growth and development. Although an increasing number of MATE proteins have been identified, the understanding of MATE proteins is still very limited in rice. In this study, 46 MATE proteins were identified from the rice (Oryza sativa) genome by homology searches and domain prediction. The rice MATE family was divided into four subfamilies based on the phylogenetic tree. Tandem repeats and fragment replication contribute to the expansion of the rice MATE gene family. Gene structure and cis-regulatory elements reveal the potential functions of MATE genes. Analysis of gene expression showed that most of MATE genes were constitutively expressed and the expression patterns of genes in different tissues were analyzed using RNA-seq. Furthermore, qRT-PCR-based analysis showed differential expression patterns in response to salt and drought stress. The analysis results of this study provide comprehensive information on the MATE gene family in rice and will aid in understanding the functional divergence of MATE genes.


2019 ◽  
Author(s):  
Robin A. Sorg ◽  
Clement Gallay ◽  
Jan-Willem Veening

AbstractStreptococcus pneumoniae can cause disease in various human tissues and organs, including the ear, the brain, the blood and the lung, and thus in highly diverse and dynamic environments. It is challenging to study how pneumococci control virulence factor expression, because cues of natural environments and the presence of an immune system are difficult to simulate in vitro. Here, we apply synthetic biology methods to reverse-engineer gene expression control in S. pneumoniae. A selection platform is described that allows for straightforward identification of transcriptional regulatory elements out of combinatorial libraries. We present TetR- and LacI-regulated promoters that show expression ranges of four orders of magnitude. Based on these promoters, regulatory networks of higher complexity are assembled, such as logic AND and IMPLY gates. Finally, we demonstrate single-copy genome-integrated toggle switches that give rise to bimodal population distributions. The tools described here can be used to mimic complex expression patterns, such as the ones found for pneumococcal virulence factors, paving the way for in vivo investigations of the importance of gene expression control on the pathogenicity of S. pneumoniae.


2020 ◽  
Vol 21 (17) ◽  
pp. 5947 ◽  
Author(s):  
Hao Zhang ◽  
Shuang Li ◽  
Mengyao Shi ◽  
Sheliang Wang ◽  
Lei Shi ◽  
...  

NITRATE TRANSPORTER 1 (NRT1)/PEPTIDE TRANSPORTER (PTR) family (NPF) proteins can transport various substrates, and play crucial roles in governing plant nitrogen (N) uptake and distribution. However, little is known about the NPF genes in Brassica napus. Here, a comprehensive genome-wide systematic characterization of the NPF family led to the identification of 193 NPF genes in the whole genome of B. napus. The BnaNPF family exhibited high levels of genetic diversity among sub-families but this was conserved within each subfamily. Whole-genome duplication and segmental duplication played a major role in BnaNPF evolution. The expression analysis indicated that a broad range of expression patterns for individual gene occurred in response to multiple nutrient stresses, including N, phosphorus (P) and potassium (K) deficiencies, as well as ammonium toxicity. Furthermore, 10 core BnaNPF genes in response to N stress were identified. These genes contained 6–13 transmembrane domains, located in plasma membrane, that respond discrepantly to N deficiency in different tissues. Robust cis-regulatory elements were identified within the promoter regions of the core genes. Taken together, our results suggest that BnaNPFs are versatile transporters that might evolve new functions in B. napus. Our findings benefit future research on this gene family.


2019 ◽  
Vol 70 (15) ◽  
pp. 3867-3879 ◽  
Author(s):  
Anneke Frerichs ◽  
Julia Engelhorn ◽  
Janine Altmüller ◽  
Jose Gutierrez-Marcos ◽  
Wolfgang Werr

Abstract Fluorescence-activated cell sorting (FACS) and assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) were combined to analyse the chromatin state of lateral organ founder cells (LOFCs) in the peripheral zone of the Arabidopsis apetala1-1 cauliflower-1 double mutant inflorescence meristem. On a genome-wide level, we observed a striking correlation between transposase hypersensitive sites (THSs) detected by ATAC-seq and DNase I hypersensitive sites (DHSs). The mostly expanded DHSs were often substructured into several individual THSs, which correlated with phylogenetically conserved DNA sequences or enhancer elements. Comparing chromatin accessibility with available RNA-seq data, THS change configuration was reflected by gene activation or repression and chromatin regions acquired or lost transposase accessibility in direct correlation with gene expression levels in LOFCs. This was most pronounced immediately upstream of the transcription start, where genome-wide THSs were abundant in a complementary pattern to established H3K4me3 activation or H3K27me3 repression marks. At this resolution, the combined application of FACS/ATAC-seq is widely applicable to detect chromatin changes during cell-type specification and facilitates the detection of regulatory elements in plant promoters.


2013 ◽  
Vol 368 (1632) ◽  
pp. 20130022 ◽  
Author(s):  
Noboru Jo Sakabe ◽  
Marcelo A. Nobrega

The complex expression patterns observed for many genes are often regulated by distal transcription enhancers. Changes in the nucleotide sequences of enhancers may therefore lead to changes in gene expression, representing a central mechanism by which organisms evolve. With the development of the experimental technique of chromatin immunoprecipitation (ChIP), in which discrete regions of the genome bound by specific proteins can be identified, it is now possible to identify transcription factor binding events (putative cis -regulatory elements) in entire genomes. Comparing protein–DNA binding maps allows us, for the first time, to attempt to identify regulatory differences and infer global patterns of change in gene expression across species. Here, we review studies that used genome-wide ChIP to study the evolution of enhancers. The trend is one of high divergence of cis -regulatory elements between species, possibly compensated by extensive creation and loss of regulatory elements and rewiring of their target genes. We speculate on the meaning of the differences observed and discuss that although ChIP experiments identify the biochemical event of protein–DNA interaction, it cannot determine whether the event results in a biological function, and therefore more studies are required to establish the effect of divergence of binding events on species-specific gene expression.


2014 ◽  
Vol 35 (5) ◽  
pp. 770-777 ◽  
Author(s):  
Sharon Schlesinger ◽  
Stephen P. Goff

Retroviruses have evolved complex transcriptional enhancers and promoters that allow their replication in a wide range of tissue and cell types. Embryonic stem (ES) cells, however, characteristically suppress transcription of proviruses formed after infection by exogenous retroviruses and also of most members of the vast array of endogenous retroviruses in the genome. These cells have unusual profiles of transcribed genes and are poised to make rapid changes in those profiles upon induction of differentiation. Many of the transcription factors in ES cells control both host and retroviral genes coordinately, such that retroviral expression patterns can serve as markers of ES cell pluripotency. This overlap is not coincidental; retrovirus-derived regulatory sequences are often used to control cellular genes important for pluripotency. These sequences specify the temporal control and perhaps “noisy” control of cellular genes that direct proper cell gene expression in primitive cells and their differentiating progeny. The evidence suggests that the viral elements have been domesticated for host needs, reflecting the wide-ranging exploitation of any and all available DNA sequences in assembling regulatory networks.


2020 ◽  
Author(s):  
Duo Lv ◽  
Gang Wang ◽  
Yue Chen ◽  
Liang-Rong Xiong ◽  
Jing-Xian Sun ◽  
...  

Abstract Background Lectin receptor-like kinases (LecRLKs) are a class of membrane proteins found in plants that are involved in diverse functions, including plant development and stress responses. Although LecRLK families have been identified in a variety of plants, a comprehensive analysis has not yet been undertaken in cucumber ( Cucumis sativus L.).Results In this study, 46 putative LecRLK genes were identified in cucumber genome, including 23 G-type, 22 L-type and 1 C-type LecRLK genes. They unequally distributed on all 7 chromosomes with a clustering trendency. Most of the genes in the cucumber LecRLK (Cs LecRLK) gene family lacked introns. In addition, there were many regulatory elements associated with phytohormone and stress on these genes’ promoters. Transcriptome data demonstrated that distinct expression patterns of CsLecRLK genes in various tissues. Furthermore, we found that each member of the CsLecRLK family had its own unique expression pattern under hormone and stress treatment by the quantitative real time PCR (qRT-PCR) analysis.Conclusion This study provides a better understanding of the evolution and function of LecRLK gene family in cucumber, and opens the possibility to explore the roles that LecRLK s might play in the life cycle of cucumber.


2021 ◽  
Author(s):  
Jakub Jankowski ◽  
Hye Kyung Lee ◽  
Julia Wilflingseder ◽  
Lothar Hennighausen

SummaryRecently, a short, interferon-inducible isoform of Angiotensin-Converting Enzyme 2 (ACE2), dACE2 was identified. ACE2 is a SARS-Cov-2 receptor and changes in its renal expression have been linked to several human nephropathies. These changes were never analyzed in context of dACE2, as its expression was not investigated in the kidney. We used Human Primary Proximal Tubule (HPPT) cells to show genome-wide gene expression patterns after cytokine stimulation, with emphasis on the ACE2/dACE2 locus. Putative regulatory elements controlling dACE2 expression were identified using ChIP-seq and RNA-seq. qRT-PCR differentiating between ACE2 and dACE2 revealed 300- and 600-fold upregulation of dACE2 by IFNα and IFNβ, respectively, while full length ACE2 expression was almost unchanged. JAK inhibitor ruxolitinib ablated STAT1 and dACE2 expression after interferon treatment. Finally, with RNA-seq, we identified a set of genes, largely immune-related, induced by cytokine treatment. These gene expression profiles provide new insights into cytokine response of proximal tubule cells.


Sign in / Sign up

Export Citation Format

Share Document