scholarly journals Computational prediction and interpretation of both general and specific types of promoters in Escherichia coli by exploiting a stacked ensemble-learning framework

Author(s):  
Fuyi Li ◽  
Jinxiang Chen ◽  
Zongyuan Ge ◽  
Ya Wen ◽  
Yanwei Yue ◽  
...  

Abstract Promoters are short consensus sequences of DNA, which are responsible for transcription activation or the repression of all genes. There are many types of promoters in bacteria with important roles in initiating gene transcription. Therefore, solving promoter-identification problems has important implications for improving the understanding of their functions. To this end, computational methods targeting promoter classification have been established; however, their performance remains unsatisfactory. In this study, we present a novel stacked-ensemble approach (termed SELECTOR) for identifying both promoters and their respective classification. SELECTOR combined the composition of k-spaced nucleic acid pairs, parallel correlation pseudo-dinucleotide composition, position-specific trinucleotide propensity based on single-strand, and DNA strand features and using five popular tree-based ensemble learning algorithms to build a stacked model. Both 5-fold cross-validation tests using benchmark datasets and independent tests using the newly collected independent test dataset showed that SELECTOR outperformed state-of-the-art methods in both general and specific types of promoter prediction in Escherichia coli. Furthermore, this novel framework provides essential interpretations that aid understanding of model success by leveraging the powerful Shapley Additive exPlanation algorithm, thereby highlighting the most important features relevant for predicting both general and specific types of promoters and overcoming the limitations of existing ‘Black-box’ approaches that are unable to reveal causal relationships from large amounts of initially encoded features.

2019 ◽  
Vol 20 (5) ◽  
pp. 565-578 ◽  
Author(s):  
Lidong Wang ◽  
Ruijun Zhang

Ubiquitination is an important post-translational modification (PTM) process for the regulation of protein functions, which is associated with cancer, cardiovascular and other diseases. Recent initiatives have focused on the detection of potential ubiquitination sites with the aid of physicochemical test approaches in conjunction with the application of computational methods. The identification of ubiquitination sites using laboratory tests is especially susceptible to the temporality and reversibility of the ubiquitination processes, and is also costly and time-consuming. It has been demonstrated that computational methods are effective in extracting potential rules or inferences from biological sequence collections. Up to the present, the computational strategy has been one of the critical research approaches that have been applied for the identification of ubiquitination sites, and currently, there are numerous state-of-the-art computational methods that have been developed from machine learning and statistical analysis to undertake such work. In the present study, the construction of benchmark datasets is summarized, together with feature representation methods, feature selection approaches and the classifiers involved in several previous publications. In an attempt to explore pertinent development trends for the identification of ubiquitination sites, an independent test dataset was constructed and the predicting results obtained from five prediction tools are reported here, together with some related discussions.


Synlett ◽  
2020 ◽  
Author(s):  
Akira Yada ◽  
Kazuhiko Sato ◽  
Tarojiro Matsumura ◽  
Yasunobu Ando ◽  
Kenji Nagata ◽  
...  

AbstractThe prediction of the initial reaction rate in the tungsten-catalyzed epoxidation of alkenes by using a machine learning approach is demonstrated. The ensemble learning framework used in this study consists of random sampling with replacement from the training dataset, the construction of several predictive models (weak learners), and the combination of their outputs. This approach enables us to obtain a reasonable prediction model that avoids the problem of overfitting, even when analyzing a small dataset.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 3675-3693 ◽  
Author(s):  
Salman Salloum ◽  
Joshua Zhexue Huang ◽  
Yulin He ◽  
Xiaojun Chen

2000 ◽  
Vol 182 (18) ◽  
pp. 5225-5230 ◽  
Author(s):  
Eliana Schlosser-Silverman ◽  
Maya Elgrably-Weiss ◽  
Ilan Rosenshine ◽  
Ron Kohen ◽  
Shoshy Altuvia

ABSTRACT Macrophages are armed with multiple oxygen-dependent and -independent bactericidal properties. However, the respiratory burst, generating reactive oxygen species, is believed to be a major cause of bacterial killing. We exploited the susceptibility of Escherichia coli in macrophages to characterize the effects of the respiratory burst on intracellular bacteria. We show that E. coli strains recovered from J774 macrophages exhibit high rates of mutations. We report that the DNA damage generated inside macrophages includes DNA strand breaks and the modification 8-oxo-2′-deoxyguanosine, which are typical oxidative lesions. Interestingly, we found that under these conditions, early in the infection the majority of E. coli cells are viable but gene expression is inhibited. Our findings demonstrate that macrophages can cause severe DNA damage to intracellular bacteria. Our results also suggest that protection against the macrophage-induced DNA damage is an important component of the bacterial defense mechanism within macrophages.


Sign in / Sign up

Export Citation Format

Share Document