scholarly journals Predictingβ-Turns in Protein Using Kernel Logistic Regression

2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Murtada Khalafallah Elbashir ◽  
Yu Sheng ◽  
Jianxin Wang ◽  
FangXiang Wu ◽  
Min Li

Aβ-turn is a secondary protein structure type that plays a significant role in protein configuration and function. On average 25% of amino acids in protein structures are located inβ-turns. It is very important to develope an accurate and efficient method forβ-turns prediction. Most of the current successfulβ-turns prediction methods use support vector machines (SVMs) or neural networks (NNs). The kernel logistic regression (KLR) is a powerful classification technique that has been applied successfully in many classification problems. However, it is often not found inβ-turns classification, mainly because it is computationally expensive. In this paper, we used KLR to obtain sparseβ-turns prediction in short evolution time. Secondary structure information and position-specific scoring matrices (PSSMs) are utilized as input features. We achievedQtotalof 80.7% and MCC of 50% on BT426 dataset. These results show that KLR method with the right algorithm can yield performance equivalent to or even better than NNs and SVMs inβ-turns prediction. In addition, KLR yields probabilistic outcome and has a well-defined extension to multiclass case.

mBio ◽  
2020 ◽  
Vol 11 (3) ◽  
Author(s):  
Begüm D. Topçuoğlu ◽  
Nicholas A. Lesniak ◽  
Mack T. Ruffin ◽  
Jenna Wiens ◽  
Patrick D. Schloss

ABSTRACT Machine learning (ML) modeling of the human microbiome has the potential to identify microbial biomarkers and aid in the diagnosis of many diseases such as inflammatory bowel disease, diabetes, and colorectal cancer. Progress has been made toward developing ML models that predict health outcomes using bacterial abundances, but inconsistent adoption of training and evaluation methods call the validity of these models into question. Furthermore, there appears to be a preference by many researchers to favor increased model complexity over interpretability. To overcome these challenges, we trained seven models that used fecal 16S rRNA sequence data to predict the presence of colonic screen relevant neoplasias (SRNs) (n = 490 patients, 261 controls and 229 cases). We developed a reusable open-source pipeline to train, validate, and interpret ML models. To show the effect of model selection, we assessed the predictive performance, interpretability, and training time of L2-regularized logistic regression, L1- and L2-regularized support vector machines (SVM) with linear and radial basis function kernels, a decision tree, random forest, and gradient boosted trees (XGBoost). The random forest model performed best at detecting SRNs with an area under the receiver operating characteristic curve (AUROC) of 0.695 (interquartile range [IQR], 0.651 to 0.739) but was slow to train (83.2 h) and not inherently interpretable. Despite its simplicity, L2-regularized logistic regression followed random forest in predictive performance with an AUROC of 0.680 (IQR, 0.625 to 0.735), trained faster (12 min), and was inherently interpretable. Our analysis highlights the importance of choosing an ML approach based on the goal of the study, as the choice will inform expectations of performance and interpretability. IMPORTANCE Diagnosing diseases using machine learning (ML) is rapidly being adopted in microbiome studies. However, the estimated performance associated with these models is likely overoptimistic. Moreover, there is a trend toward using black box models without a discussion of the difficulty of interpreting such models when trying to identify microbial biomarkers of disease. This work represents a step toward developing more-reproducible ML practices in applying ML to microbiome research. We implement a rigorous pipeline and emphasize the importance of selecting ML models that reflect the goal of the study. These concepts are not particular to the study of human health but can also be applied to environmental microbiology studies.


2015 ◽  
Vol 1 (9) ◽  
pp. e1501188 ◽  
Author(s):  
Andrew E. Brereton ◽  
P. Andrew Karplus

During protein folding and as part of some conformational changes that regulate protein function, the polypeptide chain must traverse high-energy barriers that separate the commonly adopted low-energy conformations. How distortions in peptide geometry allow these barrier-crossing transitions is a fundamental open question. One such important transition involves the movement of a non-glycine residue between the left side of the Ramachandran plot (that is, ϕ < 0°) and the right side (that is, ϕ > 0°). We report that high-energy conformations with ϕ ~ 0°, normally expected to occur only as fleeting transition states, are stably trapped in certain highly resolved native protein structures and that an analysis of these residues provides a detailed, experimentally derived map of the bond angle distortions taking place along the transition path. This unanticipated information lays to rest any uncertainty about whether such transitions are possible and how they occur, and in doing so lays a firm foundation for theoretical studies to better understand the transitions between basins that have been little studied but are integrally involved in protein folding and function. Also, the context of one such residue shows that even a designed highly stable protein can harbor substantial unfavorable interactions.


2019 ◽  
Author(s):  
Begüm D. Topçuoğlu ◽  
Nicholas A. Lesniak ◽  
Mack Ruffin ◽  
Jenna Wiens ◽  
Patrick D. Schloss

AbstractMachine learning (ML) modeling of the human microbiome has the potential to identify microbial biomarkers and aid in the diagnosis of many diseases such as inflammatory bowel disease, diabetes, and colorectal cancer. Progress has been made towards developing ML models that predict health outcomes using bacterial abundances, but inconsistent adoption of training and evaluation methods call the validity of these models into question. Furthermore, there appears to be a preference by many researchers to favor increased model complexity over interpretability. To overcome these challenges, we trained seven models that used fecal 16S rRNA sequence data to predict the presence of colonic screen relevant neoplasias (SRNs; n=490 patients, 261 controls and 229 cases). We developed a reusable open-source pipeline to train, validate, and interpret ML models. To show the effect of model selection, we assessed the predictive performance, interpretability, and training time of L2-regularized logistic regression, L1 and L2-regularized support vector machines (SVM) with linear and radial basis function kernels, decision trees, random forest, and gradient boosted trees (XGBoost). The random forest model performed best at detecting SRNs with an AUROC of 0.695 [IQR 0.651-0.739] but was slow to train (83.2 h) and not inherently interpretable. Despite its simplicity, L2-regularized logistic regression followed random forest in predictive performance with an AUROC of 0.680 [IQR 0.625-0.735], trained faster (12 min), and was inherently interpretable. Our analysis highlights the importance of choosing an ML approach based on the goal of the study, as the choice will inform expectations of performance and interpretability.ImportanceDiagnosing diseases using machine learning (ML) is rapidly being adopted in microbiome studies. However, the estimated performance associated with these models is likely over-optimistic. Moreover, there is a trend towards using black box models without a discussion of the difficulty of interpreting such models when trying to identify microbial biomarkers of disease. This work represents a step towards developing more reproducible ML practices in applying ML to microbiome research. We implement a rigorous pipeline and emphasize the importance of selecting ML models that reflect the goal of the study. These concepts are not particular to the study of human health but can also be applied to environmental microbiology studies.


2018 ◽  
Vol 8 (12) ◽  
pp. 2540 ◽  
Author(s):  
Wei Chen ◽  
Himan Shahabi ◽  
Shuai Zhang ◽  
Khabat Khosravi ◽  
Ataollah Shirzadi ◽  
...  

Landslides cause a considerable amount of damage around the world every year. Landslide susceptibility assessments are useful for the mitigation of the associated potential risks to local economic development, land use planning, and decision makers. The main aim of this study was to present a novel hybrid approach of bagging (B)-based kernel logistic regression (KLR), named the BKLR model, for spatial prediction of landslides in the Shangnan County, China. We first selected 15 conditioning factors for landslide susceptibility modeling. Then, the prediction capability of all conditioning factors was evaluated using the least square support vector machine method. Model validation and comparison were performed based on the area under the receiver operating characteristic curve and several statistical-based indexes, including positive predictive rate, negative predictive rate, sensitivity, specificity, kappa index, and root mean square error. Results indicated that the BKLR ensemble model outperformed and outclassed the KLR and the benchmark support vector machine model. Our findings overall confirmed that a combination of the meta model with a decision tree classifier based on a functional algorithm can decrease the over-fitting and variance problems of data, which could enhance the prediction power of the landslide model. The resultant susceptibility maps could be useful for hazard mitigation in the study area and other similar landslide-prone areas.


2013 ◽  
Vol 2013 ◽  
pp. 1-6 ◽  
Author(s):  
Ahmed A. M. Hamed ◽  
Renfa Li ◽  
Zhang Xiaoming ◽  
Cheng Xu

Due to the widening semantic gap of videos, computational tools to classify these videos into different genre are highly needed to narrow it. Classifying videos accurately demands good representation of video data and an efficient and effective model to carry out the classification task. Kernel Logistic Regression (KLR), kernel version of logistic regression (LR), proves its efficiency as a classifier, which can naturally provide probabilities and extend to multiclass classification problems. In this paper, Weighted Kernel Logistic Regression (WKLR) algorithm is implemented for video genre classification to obtain significant accuracy, and it shows accurate and faster good results.


Author(s):  
Anantvir Singh Romana

Accurate diagnostic detection of the disease in a patient is critical and may alter the subsequent treatment and increase the chances of survival rate. Machine learning techniques have been instrumental in disease detection and are currently being used in various classification problems due to their accurate prediction performance. Various techniques may provide different desired accuracies and it is therefore imperative to use the most suitable method which provides the best desired results. This research seeks to provide comparative analysis of Support Vector Machine, Naïve bayes, J48 Decision Tree and neural network classifiers breast cancer and diabetes datsets.


Sign in / Sign up

Export Citation Format

Share Document