scholarly journals Predicting gene expression level in E. coli from mRNA sequence information

Author(s):  
Linlin Zhao ◽  
Nima Abedpour ◽  
Christopher Blum ◽  
Petra Kolkhof ◽  
Mathias Beller ◽  
...  
2016 ◽  
Author(s):  
Linlin Zhao ◽  
Nima Abedpour ◽  
Christopher Blum ◽  
Petra Kolkhof ◽  
Mathias Beller ◽  
...  

Motivation: The accurate characterization of the translational mechanism is crucial for enhancing our understanding of the relationship between genotype and phenotype. In particular, predicting the impact of the genetic variants on gene expression will allow to optimize specific pathways and functions for engineering new biological systems. In this context, the development of accurate methods for predicting translation efficiency from the nucleotide sequence is a key challenge in computational biology. Methods: In this work we present PGExpress, a binary classifier to discriminate between mRNA sequences with low and high translation efficiency in E. coli. PGExpress algorithm takes as input 12 features corresponding to RNA folding and anti-Shine-Dalgarno hybridization free energies. The method was trained on a set of 1,772 sequence variants (WT-High) of 137 essential E. coli genes. For each gene, we considered 13 sequence variants of the first 33 nucleotides encoding for the same amino acids followed by the superfolder GFP. Each gene variant is represented sequence blocks that include the Ribosome Binding Site (RBS), the first 33 nucleotides of the coding region (C33), the remaining part of the coding region (CC), and their combinations. Results: Our logistic regression-based tool (PGExpress) was trained using a 20-fold gene-based cross-validation procedure on the WT-High dataset. In this test PGExpress achieved an overall accu-racy of 74%, a Matthews correlation coefficient 0.49 and an Area Under the Receiver Operating Characteristic Curve (AUC) of 0.81. Tested on 3 sets of sequences with different Ribosome Binding Sites, PGExpress reaches similar AUC. Finally, we validated our method by performing in-house experiments on five newly generated mRNA sequence variants. The predictions of the expression level of the new variants are in agreement with our experimental results in E. coli.


2021 ◽  
Author(s):  
Wen Sun ◽  
Ke Yang ◽  
Risheng Li ◽  
Tianqing Chen ◽  
Longfei Xia ◽  
...  

Abstract Using samples collected in Shahe Reservoir in the upper North Canal in China, this research analyzes the structure of a microorganism group in sediment and the gene expression levels of two typical pathogenic bacteria (Escherichia coli and Enterococcus), and their relationship with environmental factors including total nitrogen (TN) and total phosphorus (TP). The study of samples collected from the surface (0–20 cm) and sediment cores shows that the absolute gene expression level of E. coli in in horizontal distribution in the sediment is higher than the relative gene expression level in the downstream of the reservoir and contaminated area. In vertical distribution, the absolute gene expression level of the two pathogenic bacteria in the sediment tends to decrease with increasing depth, although the relative gene expression level has its highest value at 10–30 cm depth. The relative gene expression level of the two pathogenic bacteria is much greater in the sediment of Shahe Reservoir with the structure of horizontal groups including Clortridium sensu stricto, unclassified Anaeroineaceae, and Povalibacter, while Anaeroineaceae is much more abundant in the group structure of the vertical distribution. Pearson correlation analysis suggests positive correlation in horizontal distribution for E. coli and TN and TP (P < 0.05) and for Enterococcus and TP (P < 0.05). The results clearly show that the amount of pathogenic bacteria in the sediment in Shahe Reservoir is most likely due to water eutrophication.


2010 ◽  
Vol 27 ◽  
pp. S66
Author(s):  
M. Piechota ◽  
A. Banaszewska ◽  
E. Guzniczak ◽  
G. Rosinski ◽  
T. Siminiak ◽  
...  

Gene ◽  
2021 ◽  
pp. 145862
Author(s):  
Lu-Qiang Zhang ◽  
Jun-Jie Liu ◽  
Li Liu ◽  
Guo-Liang Fan ◽  
Yan-Nan Li ◽  
...  

Author(s):  
Rajnics P ◽  
◽  
Kellner A ◽  
Nagy F ◽  
Alföldi V ◽  
...  

Purpose: Elevated level of Lipocalin-2 (LCN2), a new acute phase adipokine, was described after ischemic stroke. A number of researchers feel as though that LCN2 originated from the infiltrating neutrophils and other cells in brain after stroke. Others measured elevated LCN2 expression in arteriosclerotic plaque. Therefore we have investigated LCN2 relative gene expression level of blood neutrophil granulocytes in patients with ischemic stroke to assess if elevated LCN2 is the cause or consequence of ischemic stroke. Methods: Laboratory and anamnestic data were collected, which could have a role in development of thrombo-embolic events in patients with ischemic stroke. RNA based method was used to evaluate the relative gene expression level of LCN2. We calculated Odds Ratio (OR) and Confidence Interval (CI) for the association between LCN2 and ischemic stroke. Results: 34 samples were available for evaluation. The LCN 2 relative gene expression level was decreased in 12 cases. In this group, 91% of patients have Atrial Fibrillation (AF) at the time of hospitalisation. The mean LCN2 relative gene expression value was 64.25% (ranges: 34%-115%) in patients with AF. It was significantly lower than in patients with normal sinus rhythm (409.2%; ranges: 127%-1127%; p=0.0003). The elevated LCN2 relative gene expression level significantly (p=0.012) increases the risk of stroke (OR: 12.6) independently from other factors. Conclusions: High LCN2 expression level seems to have strong positive predictive value on ischemic stroke, and may be useful in thrombotic risk stratification of plaque vulnerability in these patients.


Sign in / Sign up

Export Citation Format

Share Document