scholarly journals Predicting gene expression level in E. coli from mRNA sequence information

2016 ◽  
Author(s):  
Linlin Zhao ◽  
Nima Abedpour ◽  
Christopher Blum ◽  
Petra Kolkhof ◽  
Mathias Beller ◽  
...  

Motivation: The accurate characterization of the translational mechanism is crucial for enhancing our understanding of the relationship between genotype and phenotype. In particular, predicting the impact of the genetic variants on gene expression will allow to optimize specific pathways and functions for engineering new biological systems. In this context, the development of accurate methods for predicting translation efficiency from the nucleotide sequence is a key challenge in computational biology. Methods: In this work we present PGExpress, a binary classifier to discriminate between mRNA sequences with low and high translation efficiency in E. coli. PGExpress algorithm takes as input 12 features corresponding to RNA folding and anti-Shine-Dalgarno hybridization free energies. The method was trained on a set of 1,772 sequence variants (WT-High) of 137 essential E. coli genes. For each gene, we considered 13 sequence variants of the first 33 nucleotides encoding for the same amino acids followed by the superfolder GFP. Each gene variant is represented sequence blocks that include the Ribosome Binding Site (RBS), the first 33 nucleotides of the coding region (C33), the remaining part of the coding region (CC), and their combinations. Results: Our logistic regression-based tool (PGExpress) was trained using a 20-fold gene-based cross-validation procedure on the WT-High dataset. In this test PGExpress achieved an overall accu-racy of 74%, a Matthews correlation coefficient 0.49 and an Area Under the Receiver Operating Characteristic Curve (AUC) of 0.81. Tested on 3 sets of sequences with different Ribosome Binding Sites, PGExpress reaches similar AUC. Finally, we validated our method by performing in-house experiments on five newly generated mRNA sequence variants. The predictions of the expression level of the new variants are in agreement with our experimental results in E. coli.

2019 ◽  
Author(s):  
Mariana H. Moreira ◽  
Géssica C. Barros ◽  
Rodrigo D. Requião ◽  
Silvana Rossetto ◽  
Tatiana Domitrovic ◽  
...  

ABSTRACTTranslation initiation is a critical step in the regulation of protein synthesis, and it is subjected to different control mechanisms, such as 5’ UTR secondary structure and initiation codon context, that can influence the rates at which initiation and consequentially translation occur. For some genes, translation elongation also affects the rate of protein synthesis. With a GFP library containing nearly all possible combinations of nucleotides from the 3rd to the 5th codon positions in the protein coding region of the mRNA, it was previously demonstrated that some nucleotide combinations increased GFP expression up to four orders of magnitude. While it is clear that the codon region from positions 3 to 5 can influence protein expression levels of artificial constructs, its impact on endogenous proteins is still unknown. Through bioinformatics analysis, we identified the nucleotide combinations of the GFP library in Escherichia coli genes and examined the correlation between the expected levels of translation according to the GFP data with the experimental measures of protein expression. We observed that E. coli genes were enriched with the nucleotide compositions that enhanced protein expression in the GFP library, but surprisingly, it seemed to affect the translation efficiency only marginally. Nevertheless, our data indicate that different enterobacteria present similar nucleotide composition enrichment as E. coli, suggesting an evolutionary pressure towards the conservation of short translational enhancer sequences.


Gene ◽  
2021 ◽  
pp. 145862
Author(s):  
Lu-Qiang Zhang ◽  
Jun-Jie Liu ◽  
Li Liu ◽  
Guo-Liang Fan ◽  
Yan-Nan Li ◽  
...  

2020 ◽  
Vol 202 (9) ◽  
Author(s):  
Tien G. Nguyen ◽  
Diego A. Vargas-Blanco ◽  
Louis A. Roberts ◽  
Scarlet S. Shell

ABSTRACT Regulation of gene expression is critical for Mycobacterium tuberculosis to tolerate stressors encountered during infection and for nonpathogenic mycobacteria such as Mycobacterium smegmatis to survive environmental stressors. Unlike better-studied models, mycobacteria express ∼14% of their genes as leaderless transcripts. However, the impacts of leaderless transcript structures on mRNA half-life and translation efficiency in mycobacteria have not been directly tested. For leadered transcripts, the contributions of 5′ untranslated regions (UTRs) to mRNA half-life and translation efficiency are similarly unknown. In M. tuberculosis and M. smegmatis, the essential sigma factor, SigA, is encoded by a transcript with a relatively short half-life. We hypothesized that the long 5′ UTR of sigA causes this instability. To test this, we constructed fluorescence reporters and measured protein abundance, mRNA abundance, and mRNA half-life and calculated relative transcript production rates. The sigA 5′ UTR conferred an increased transcript production rate, shorter mRNA half-life, and decreased apparent translation rate compared to a synthetic 5′ UTR commonly used in mycobacterial expression plasmids. Leaderless transcripts appeared to be translated with similar efficiency as those with the sigA 5′ UTR but had lower predicted transcript production rates. A global comparison of M. tuberculosis mRNA and protein abundances failed to reveal systematic differences in protein/mRNA ratios for leadered and leaderless transcripts, suggesting that variability in translation efficiency is largely driven by factors other than leader status. Our data are also discussed in light of an alternative model that leads to different conclusions and suggests leaderless transcripts may indeed be translated less efficiently. IMPORTANCE Tuberculosis, caused by Mycobacterium tuberculosis, is a major public health problem killing 1.5 million people globally each year. During infection, M. tuberculosis must alter its gene expression patterns to adapt to the stress conditions it encounters. Understanding how M. tuberculosis regulates gene expression may provide clues for ways to interfere with the bacterium’s survival. Gene expression encompasses transcription, mRNA degradation, and translation. Here, we used Mycobacterium smegmatis as a model organism to study how 5′ untranslated regions affect these three facets of gene expression in multiple ways. We furthermore provide insight into the expression of leaderless mRNAs, which lack 5′ untranslated regions and are unusually prevalent in mycobacteria.


Viruses ◽  
2020 ◽  
Vol 12 (7) ◽  
pp. 749 ◽  
Author(s):  
Melanie Hiltbrunner ◽  
Gerald Heckel

Research on the ecology and evolution of viruses is often hampered by the limitation of sequence information to short parts of the genomes or single genomes derived from cultures. In this study, we use hybrid sequence capture enrichment in combination with high-throughput sequencing to provide efficient access to full genomes of European hantaviruses from rodent samples obtained in the field. We applied this methodology to Tula (TULV) and Puumala (PUUV) orthohantaviruses for which analyses from natural host samples are typically restricted to partial sequences of their tri-segmented RNA genome. We assembled a total of ten novel hantavirus genomes de novo with very high coverage (on average >99%) and sequencing depth (average >247×). A comparison with partial Sanger sequences indicated an accuracy of >99.9% for the assemblies. An analysis of two common vole (Microtus arvalis) samples infected with two TULV strains each allowed for the de novo assembly of all four TULV genomes. Combining the novel sequences with all available TULV and PUUV genomes revealed very similar patterns of sequence diversity along the genomes, except for remarkably higher diversity in the non-coding region of the S-segment in PUUV. The genomic distribution of polymorphisms in the coding sequence was similar between the species, but differed between the segments with the highest sequence divergence of 0.274 for the M-segment, 0.265 for the S-segment, and 0.248 for the L-segment (overall 0.258). Phylogenetic analyses showed the clustering of genome sequences consistent with their geographic distribution within each species. Genome-wide data yielded extremely high node support values, despite the impact of strong mutational saturation that is expected for hantavirus sequences obtained over large spatial distances. We conclude that genome sequencing based on capture enrichment protocols provides an efficient means for ecological and evolutionary investigations of hantaviruses at an unprecedented completeness and depth.


Author(s):  
Huang Yaoxing ◽  
Yu Danchun ◽  
Sun Xiaojuan ◽  
Jiang Shuman ◽  
Yan Qingqing ◽  
...  

Gastric cancer (GC) is one of the most common causes of cancer-related deaths in the world. This cancer has been regarded as a biological and genetically heterogeneous disease with a poorly understood carcinogenesis at the molecular level. Thousands of biomarkers and susceptible loci have been explored via experimental and computational methods, but their effects on disease outcome are still unknown. Genome-wide association studies (GWAS) have identified multiple susceptible loci for GC, but due to the linkage disequilibrium (LD), single-nucleotide polymorphisms (SNPs) may fall within the non-coding region and exert their biological function by modulating the gene expression level. In this study, we collected 1,091 cases and 410,350 controls from the GWAS catalog database. Integrating with gene expression level data obtained from stomach tissue, we conducted a machine learning-based method to predict GC-susceptible genes. As a result, we identified 787 novel susceptible genes related to GC, which will provide new insight into the genetic and biological basis for the mechanism and pathology of GC development.


2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Zhiliang Yu ◽  
Ning Zhou ◽  
Hua Qiao ◽  
Juanping Qiu

L-amino acid oxidase (LAAO) is attracting more attentions due to its broad and important biological functions. Recently, an LAAO-producing marine microorganism (strain B3) was isolated from the intertidal zone of Dinghai sea area, China. Physiological, biochemical, and molecular identifications together with phylogenetic analysis congruously suggested that it belonged to the genusPseudoalteromonas. Therefore, it was designated asPseudoalteromonassp. B3. Its capability of LAAO production was crossly confirmed by measuring the products of H2O2, a-keto acids, andNH4+in oxidization reaction. Two rounds of PCR were performed to gain the entire B3-LAAO gene sequence of 1608 bps in length encoding for 535 amino acid residues. This deduced amino acid sequence showed 60 kDa of the calculated molecular mass, supporting the SDS-PAGE result. Like most of flavoproteins, B3-LAAO also contained two conserved typical motifs, GG-motif andβαβ-dinucleotide-binding domain motif. On the other hand, its unique substrate spectra and sequence information suggested that B3-LAAO was a novel LAAO. Our results revealed that it could be functionally expressed inE. coliBL21(DE3) using vectors, pET28b(+) and pET20b(+). However, compared with the native LAAO, the expression level of the recombinant one was relatively low, most probably due to the formation of inclusion bodies. Several solutions are currently being conducted in our lab to increase its expression level.


2017 ◽  
Author(s):  
Olivier Borkowski ◽  
Carlos Bricio ◽  
Michaela Murgiano ◽  
Guy-Bart Stan ◽  
Tom Ellis

Translating heterologous proteins places significant burden on host cells, consuming expression resources leading to slower cell growth and productivity. Yet predicting the cost of protein production for any gene is a major challenge, as multiple processes and factors determine translation efficiency. Here, to enable prediction of the cost of gene expression in bacteria, we describe a standard cell-free lysate assay that determines the relationship betweenin vivoand cell-free measurements and γ, a relative measure of the resource consumption when a given protein is expressed. When combined with a computational model of translation, this enables prediction of thein vivoburden placed on growingE. colicells for a variety of proteins of different functions and lengths. Using this approach, we can predict the burden of expressing multigene operons of different designs and differentiate between the fraction of burden related to gene expression compared to action of a metabolic pathway.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Jacek Pietrzak ◽  
Marek Mirowski ◽  
Rafał Świechowski ◽  
Damian Wodziński ◽  
Agnieszka Wosiak ◽  
...  

Acute myeloid leukemia is a group of hematological neoplasms characterized by a heterogeneous course and high mortality. The important factor in the neoplastic process is metalloproteinases, proteolytic enzymes capable of degrading various components of the extracellular matrix, which take an active part in modifying the functioning of the cell, including transformation to cancer cell. They interact with numerous signaling pathways responsible for the process of cell growth, proliferation, or apoptosis. In the present study, changes in the expression of MMP2, MMP9, and MMP16 genes between patients with AML and people without cancer were examined. The impact of cytogenetic changes in neoplastic cells on the expression level of MMP2, MMP9, and MMP16 was also assessed, as well as the impact of the altered expression on the effectiveness of the first cycle of remission-inducing therapy. To evaluate the expression of all studied genes MMP2, MMP9, and MMP16, SYBR Green-based real-time PCR method was used; the reference gene was GAPDH. For two investigated genes MMP2 and MMP16, the lower expression level was observed in patients with AML when compared to healthy people. The MMP9 gene expression level did not differ between patients with AML and healthy individuals which may indicate a different regulation of gene expression in acute myeloid leukemia. However, no correlation was observed between the genes’ expression of all tested metalloproteinases and the result of cytoreductive treatment or the presence of cytogenetic changes. The obtained results show that the expression of MMP2 and MMP16 genes is reduced while the expression of MMP9 is unchanged in patients with acute myeloid leukemia. This may indicate a different regulation of the expression of these genes, and possible disruptions in gene transcription or posttranscriptional mechanisms in the MMP2 and MMP16 genes, however, do not affect the level of MMP9 expression. Obtained results in AML patients are in contrary to various types of solid tumors where increased expression is usually observed.


2016 ◽  
Vol 94 (2) ◽  
pp. 95-100 ◽  
Author(s):  
S. Austin Hammond ◽  
Christopher J. Nelson ◽  
Caren C. Helbing

Herpetofauna (amphibians and reptiles) and fish represent important sentinel and indicator species for environmental and ecosystem health. It is widely accepted that the epigenome plays an important role in gene expression regulation. Environmental stimuli, including temperature and pollutants, influence gene activity, and there is growing evidence demonstrating that an important mechanism is through modulation of the epigenome. This has been primarily studied in human and mammalian models; relatively little is known about the impact of environmental conditions or pollutants on herpetofauna or fish epigenomes and the regulatory consequences of these changes on gene expression. Herein we review recent studies that have begun to address this deficiency, which have mainly focused on limited specific epigenetic marks and individual genes or large-scale global changes in DNA methylation, owing to the comparative ease of measurement. Greater understanding of the epigenetic influences of these environmental factors will depend on increased availability of relevant species-specific genomic sequence information to facilitate chromatin immunoprecipitation and DNA methylation experiments.


Sign in / Sign up

Export Citation Format

Share Document