scholarly journals SAPPHIRE: a neural network based classifier for σ70 promoter prediction in Pseudomonas

2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Lucas Coppens ◽  
Rob Lavigne

Abstract Background In silico promoter prediction represents an important challenge in bioinformatics as it provides a first-line approach to identifying regulatory elements to support wet-lab experiments. Historically, available promoter prediction software have focused on sigma factor-associated promoters in the model organism E. coli. As a consequence, traditional promoter predictors yield suboptimal predictions when applied to other prokaryotic genera, such as Pseudomonas, a Gram-negative bacterium of crucial medical and biotechnological importance. Results We developed SAPPHIRE, a promoter predictor for σ70 promoters in Pseudomonas. This promoter prediction relies on an artificial neural network that evaluates sequences on their similarity to the − 35 and − 10 boxes of σ70 promoters found experimentally in P. aeruginosa and P. putida. SAPPHIRE currently outperforms established predictive software when classifying Pseudomonas σ70 promoters and was built to allow further expansion in the future. Conclusions SAPPHIRE is the first predictive tool for bacterial σ70 promoters in Pseudomonas. SAPPHIRE is free, publicly available and can be accessed online at www.biosapphire.com. Alternatively, users can download the tool as a Python 3 script for local application from this site.

2021 ◽  
Vol 7 ◽  
pp. e365
Author(s):  
Nikita Bhandari ◽  
Satyajeet Khare ◽  
Rahee Walambe ◽  
Ketan Kotecha

Gene promoters are the key DNA regulatory elements positioned around the transcription start sites and are responsible for regulating gene transcription process. Various alignment-based, signal-based and content-based approaches are reported for the prediction of promoters. However, since all promoter sequences do not show explicit features, the prediction performance of these techniques is poor. Therefore, many machine learning and deep learning models have been proposed for promoter prediction. In this work, we studied methods for vector encoding and promoter classification using genome sequences of three distinct higher eukaryotes viz. yeast (Saccharomyces cerevisiae), A. thaliana (plant) and human (Homo sapiens). We compared one-hot vector encoding method with frequency-based tokenization (FBT) for data pre-processing on 1-D Convolutional Neural Network (CNN) model. We found that FBT gives a shorter input dimension reducing the training time without affecting the sensitivity and specificity of classification. We employed the deep learning techniques, mainly CNN and recurrent neural network with Long Short Term Memory (LSTM) and random forest (RF) classifier for promoter classification at k-mer sizes of 2, 4 and 8. We found CNN to be superior in classification of promoters from non-promoter sequences (binary classification) as well as species-specific classification of promoter sequences (multiclass classification). In summary, the contribution of this work lies in the use of synthetic shuffled negative dataset and frequency-based tokenization for pre-processing. This study provides a comprehensive and generic framework for classification tasks in genomic applications and can be extended to various classification problems.


2021 ◽  
Author(s):  
Vineet Thumuluri ◽  
Hannah-Marie Martiny ◽  
Jose J. Almagro Armenteros ◽  
Jesper Salomon ◽  
Henrik Nielsen ◽  
...  

Solubility and expression levels of proteins can be a limiting factor for large-scale studies and industrial production. By determining the solubility and expression directly from the protein sequence, the success rate of wet-lab experiments can be increased. In this study, we focus on predicting the solubility and usability for purification of proteins expressed in Escherichia coli directly from the sequence. Our model NetSolP is based on deep-learning protein language models called transformers and we show that it achieves state-of-the-art performance and improves extrapolation across datasets. As we find current methods are built on biased datasets, we curate existing datasets by using strict sequence-identity partitioning and ensure that there is minimal bias in the sequences.


2016 ◽  
Vol 39 (1) ◽  
Author(s):  
Sharda Choudhary ◽  
Ravindra Singh ◽  
R. S. Meena ◽  
Geetika Jethra

<italic>In-silico</italic> development of protein models in fenugreek (<italic>Trigonella foenum-graecum</italic>) has opened up new vistas using the modern computational tools. The identification of protein in fenugreek having homology with the protein domain of humans and E-coli shows that the database available on Apiaceae family are very low. The estimated molecular weight of identified for fenugreek AM-1 protein was 11372.8 and was predicted as basic. A channel in fenugreek transmembrane protein has been identified through which a ligand (an ion or a small molecule) might pass. The present finding may be a valuable addition to the proteomic information available on fenugreek. Further validation can be performed using wet lab experiments.


2021 ◽  
Vol 22 (3) ◽  
pp. 1015
Author(s):  
Vu Thu Thuy Nguyen ◽  
Jason Sallbach ◽  
Malena dos Santos Guilherme ◽  
Kristina Endres

Four drugs are currently approved for the treatment of Alzheimer’s disease (AD) by the FDA. Three of these drugs—donepezil, rivastigmine, and galantamine—belong to the class of acetylcholine esterase inhibitors. Memantine, a NMDA receptor antagonist, represents the fourth and a combination of donepezil and memantine the fifth treatment option. Recently, the gut and its habitants, its microbiome, came into focus of AD research and added another important factor to therapeutic considerations. While the first data provide evidence that AD patients might carry an altered microbiome, the influence of administered drugs on gut properties and commensals have been largely ignored so far. However, the occurrence of digestive side effects with these drugs and the knowledge that cholinergic transmission is crucial for several gut functions enforces the question if, and how, this medication influences the gastrointestinal system and its microbial stocking. Here, we investigated aspects such as microbial viability, colonic propulsion, and properties of enteric neurons, affected by assumed intestinal concentration of the four drugs using the mouse as a model organism. All ex vivo administered drugs revealed no direct effect on fecal bacteria viability and only a high dosage of memantine resulted in reduced biofilm formation of E. coli. Memantine was additionally the only compound that elevated calcium influx in enteric neurons, while all acetylcholine esterase inhibitors significantly reduced esterase activity in colonic tissue specimen and prolonged propulsion time. Both, acetylcholine esterase inhibitors and memantine, had no effect on general viability and neurite outgrowth of enteric neurons. In sum, our findings indicate that all AD symptomatic drugs have the potential to affect distinct intestinal functions and with this—directly or indirectly—microbial commensals.


Author(s):  
Yahui Long ◽  
Min Wu ◽  
Yong Liu ◽  
Jie Zheng ◽  
Chee Keong Kwoh ◽  
...  

Abstract Motivation Synthetic Lethality (SL) plays an increasingly critical role in the targeted anticancer therapeutics. In addition, identifying SL interactions can create opportunities to selectively kill cancer cells without harming normal cells. Given the high cost of wet-lab experiments, in silico prediction of SL interactions as an alternative can be a rapid and cost-effective way to guide the experimental screening of candidate SL pairs. Several matrix factorization-based methods have recently been proposed for human SL prediction. However, they are limited in capturing the dependencies of neighbors. In addition, it is also highly challenging to make accurate predictions for new genes without any known SL partners. Results In this work, we propose a novel graph contextualized attention network named GCATSL to learn gene representations for SL prediction. First, we leverage different data sources to construct multiple feature graphs for genes, which serve as the feature inputs for our GCATSL method. Second, for each feature graph, we design node-level attention mechanism to effectively capture the importance of local and global neighbors and learn local and global representations for the nodes, respectively. We further exploit multi-layer perceptron (MLP) to aggregate the original features with the local and global representations and then derive the feature-specific representations. Third, to derive the final representations, we design feature-level attention to integrate feature-specific representations by taking the importance of different feature graphs into account. Extensive experimental results on three datasets under different settings demonstrated that our GCATSL model outperforms 14 state-of-the-art methods consistently. In addition, case studies further validated the effectiveness of our proposed model in identifying novel SL pairs. Availability Python codes and dataset are freely available on GitHub (https://github.com/longyahui/GCATSL) and Zenodo (https://zenodo.org/record/4522679) under the MIT license.


Author(s):  
Katalin Csiszár ◽  
Tamás Lukacsovich ◽  
Pál Venetianer
Keyword(s):  

2018 ◽  
Vol 85 (2) ◽  
Author(s):  
Shireen M. Kotay ◽  
Rodney M. Donlan ◽  
Christine Ganim ◽  
Katie Barry ◽  
Bryan E. Christensen ◽  
...  

ABSTRACT An alarming rise in hospital outbreaks implicating hand-washing sinks has led to widespread acknowledgment that sinks are a major reservoir of antibiotic-resistant pathogens in patient care areas. An earlier study using green fluorescent protein (GFP)-expressing Escherichia coli (GFP-E. coli) as a model organism demonstrated dispersal from drain biofilms in contaminated sinks. The present study further characterizes the dispersal of microorganisms from contaminated sinks. Replicate hand-washing sinks were inoculated with GFP-E. coli, and dispersion was measured using qualitative (settle plates) and quantitative (air sampling) methods. Dispersal caused by faucet water was captured with settle plates and air sampling methods when bacteria were present on the drain. In contrast, no dispersal was captured without or in between faucet events, amending an earlier theory that bacteria aerosolize from the P-trap and disperse. Numbers of dispersed GFP-E. coli cells diminished substantially within 30 minutes after faucet usage, suggesting that the organisms were associated with larger droplet-sized particles that are not suspended in the air for long periods. IMPORTANCE Among the possible environmental reservoirs in a patient care environment, sink drains are increasingly recognized as a potential reservoir to hospitalized patients of multidrug-resistant health care-associated pathogens. With increasing antimicrobial resistance limiting therapeutic options for patients, a better understanding of how pathogens disseminate from sink drains is urgently needed. Once this knowledge gap has decreased, interventions can be engineered to decrease or eliminate transmission from hospital sink drains to patients. The current study further defines the mechanisms of transmission for bacteria that colonize sink drains.


Sign in / Sign up

Export Citation Format

Share Document