TP-MV: Therapeutic protein prediction by multi-view learning

2021 ◽  
Vol 17 ◽  
Author(s):  
Ke Yan ◽  
Hongwu Lv ◽  
Yichen Guo ◽  
Jie Wen ◽  
Bin Liu

Background: Therapeutic peptide prediction is critical for drug development and therapy. Researchers have been studying this essential task, developing several computational methods to identify different therapeutic peptide types. Objective: Most predictors are the specific methods for certain peptides. Currently, developing methods to predict the presence of multiple peptides remains a challenging problem. Moreover, it is still challenging to combine different features to make the therapeutic prediction. Method: In this paper, we proposed a new ensemble method TP-MV for general therapeutic peptide recognition. TP-MV is developed using the stacking framework in conjunction with the KNN, SVM, ET, RF, and XGB. Then TP-MV constructs a multi-view learning model as meta-classifiers to extract the discriminative feature for different peptides. Results: In the experiment, the proposed method outperforms the other existing methods on the benchmark datasets, indicating that the proposed method has the ability to predict multiple therapeutic peptides simultaneously. Conclusion: The TP-MV is a useful tool for predicting therapeutic peptides.

2020 ◽  
Vol 36 (13) ◽  
pp. 3982-3987 ◽  
Author(s):  
Yu P Zhang ◽  
Quan Zou

Abstract Motivation Peptide is a promising candidate for therapeutic and diagnostic development due to its great physiological versatility and structural simplicity. Thus, identifying therapeutic peptides and investigating their properties are fundamentally important. As an inexpensive and fast approach, machine learning-based predictors have shown their strength in therapeutic peptide identification due to excellences in massive data processing. To date, no reported therapeutic peptide predictor can perform high-quality generic prediction and informative physicochemical properties (IPPs) identification simultaneously. Results In this work, Physicochemical Property-based Therapeutic Peptide Predictor (PPTPP), a Random Forest-based prediction method was presented to address this issue. A novel feature encoding and learning scheme were initiated to produce and rank physicochemical property-related features. Besides being capable of predicting multiple therapeutics peptides with high comparability to established predictors, the presented method is also able to identify peptides’ informative IPP. Results presented in this work not only illustrated the soundness of its working capacity but also demonstrated its potential for investigating other therapeutic peptides. Availability and implementation https://github.com/YPZ858/PPTPP. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Lijun Cai ◽  
Li Wang ◽  
Xiangzheng Fu ◽  
Chenxing Xia ◽  
Xiangxiang Zeng ◽  
...  

Abstract The peptide therapeutics market is providing new opportunities for the biotechnology and pharmaceutical industries. Therefore, identifying therapeutic peptides and exploring their properties are important. Although several studies have proposed different machine learning methods to predict peptides as being therapeutic peptides, most do not explain the decision factors of model in detail. In this work, an Interpretable Therapeutic Peptide Prediction (ITP-Pred) model based on efficient feature fusion was developed. First, we proposed three kinds of feature descriptors based on sequence and physicochemical property encoded, namely amino acid composition (AAC), group AAC and coding autocorrelation, and concatenated them to obtain the feature representation of therapeutic peptide. Then, we input it into the CNN-Bi-directional Long Short-Term Memory (BiLSTM) model to automatically learn recognition of therapeutic peptides. The cross-validation and independent verification experiments results indicated that ITP-Pred has a higher prediction performance on the benchmark dataset than other comparison methods. Finally, we analyzed the output of the model from two aspects: sequence order and physical and chemical properties, mining important features as guidance for the design of better models that can complement existing methods.


2019 ◽  
Vol 15 (5) ◽  
pp. 472-485 ◽  
Author(s):  
Kuo-Chen Chou ◽  
Xiang Cheng ◽  
Xuan Xiao

<P>Background/Objective: Information of protein subcellular localization is crucially important for both basic research and drug development. With the explosive growth of protein sequences discovered in the post-genomic age, it is highly demanded to develop powerful bioinformatics tools for timely and effectively identifying their subcellular localization purely based on the sequence information alone. Recently, a predictor called “pLoc-mEuk” was developed for identifying the subcellular localization of eukaryotic proteins. Its performance is overwhelmingly better than that of the other predictors for the same purpose, particularly in dealing with multi-label systems where many proteins, called “multiplex proteins”, may simultaneously occur in two or more subcellular locations. Although it is indeed a very powerful predictor, more efforts are definitely needed to further improve it. This is because pLoc-mEuk was trained by an extremely skewed dataset where some subset was about 200 times the size of the other subsets. Accordingly, it cannot avoid the biased consequence caused by such an uneven training dataset. </P><P> Methods: To alleviate such bias, we have developed a new predictor called pLoc_bal-mEuk by quasi-balancing the training dataset. Cross-validation tests on exactly the same experimentconfirmed dataset have indicated that the proposed new predictor is remarkably superior to pLocmEuk, the existing state-of-the-art predictor in identifying the subcellular localization of eukaryotic proteins. It has not escaped our notice that the quasi-balancing treatment can also be used to deal with many other biological systems. </P><P> Results: To maximize the convenience for most experimental scientists, a user-friendly web-server for the new predictor has been established at http://www.jci-bioinfo.cn/pLoc_bal-mEuk/. </P><P> Conclusion: It is anticipated that the pLoc_bal-Euk predictor holds very high potential to become a useful high throughput tool in identifying the subcellular localization of eukaryotic proteins, particularly for finding multi-target drugs that is currently a very hot trend trend in drug development.</P>


Author(s):  
Hezhen Hu ◽  
Wengang Zhou ◽  
Junfu Pu ◽  
Houqiang Li

Sign language recognition (SLR) is a challenging problem, involving complex manual features (i.e., hand gestures) and fine-grained non-manual features (NMFs) (i.e., facial expression, mouth shapes, etc .). Although manual features are dominant, non-manual features also play an important role in the expression of a sign word. Specifically, many sign words convey different meanings due to non-manual features, even though they share the same hand gestures. This ambiguity introduces great challenges in the recognition of sign words. To tackle the above issue, we propose a simple yet effective architecture called Global-Local Enhancement Network (GLE-Net), including two mutually promoted streams toward different crucial aspects of SLR. Of the two streams, one captures the global contextual relationship, while the other stream captures the discriminative fine-grained cues. Moreover, due to the lack of datasets explicitly focusing on this kind of feature, we introduce the first non-manual-feature-aware isolated Chinese sign language dataset (NMFs-CSL) with a total vocabulary size of 1,067 sign words in daily life. Extensive experiments on NMFs-CSL and SLR500 datasets demonstrate the effectiveness of our method.


Author(s):  
S Rao Chintalapudi ◽  
M. H. M. Krishna Prasad

Community Structure is one of the most important properties of social networks. Detecting such structures is a challenging problem in the area of social network analysis. Community is a collection of nodes with dense connections than with the rest of the network. It is similar to clustering problem in which intra cluster edge density is more than the inter cluster edge density. Community detection algorithms are of two categories, one is disjoint community detection, in which a node can be a member of only one community at most, and the other is overlapping community detection, in which a node can be a member of more than one community. This chapter reviews the state-of-the-art disjoint and overlapping community detection algorithms. Also, the measures needed to evaluate a disjoint and overlapping community detection algorithms are discussed in detail.


Pharmaceutics ◽  
2020 ◽  
Vol 12 (10) ◽  
pp. 993
Author(s):  
Mie Kristensen ◽  
Ragna Guldsmed Diedrichsen ◽  
Valeria Vetri ◽  
Vito Foderà ◽  
Hanne Mørck Nielsen

Oral delivery of therapeutic peptides is hampered by their large molecular size and labile nature, thus limiting their permeation across the intestinal epithelium. Promising approaches to overcome the latter include co-administration with carrier peptides. In this study, the cell-penetrating peptide penetratin was employed to investigate effects of co-administration with insulin and the pharmacologically active part of parathyroid hormone (PTH(1-34)) at pH 5, 6.5, and 7.4 with respect to complexation, enzymatic stability, and transepithelial permeation of the therapeutic peptide in vitro and in vivo. Complex formation between insulin or PTH(1-34) and penetratin was pH-dependent. Micron-sized complexes dominated in the samples prepared at pH-values at which penetratin interacts electrostatically with the therapeutic peptide. The association efficiency was more pronounced between insulin and penetratin than between PTH(1-34) and penetratin. Despite the high degree of complexation, penetratin retained its membrane activity when applied to liposomal structures. The enzymatic stability of penetratin during incubation on polarized Caco-2 cell monolayers was pH-dependent with a prolonged half-live determined at pH 5 when compared to pH 6.5 and 7.4. Also, the penetratin-mediated transepithelial permeation of insulin and PTH(1-34) was increased in vitro and in vivo upon lowering the sample pH from 7.4 or 6.5 to 5. Thus, the formation of penetratin-cargo complexes with several molecular entities is not prerequisite for penetratin-mediated transepithelial permeation a therapeutic peptide. Rather, a sample pH, which improves the penetratin stability, appears to optimize the penetratin-mediated transepithelial permeation of insulin and PTH(1-34).


2019 ◽  
Vol 20 (S25) ◽  
Author(s):  
Zhengwei Li ◽  
Ru Nie ◽  
Zhuhong You ◽  
Chen Cao ◽  
Jiashu Li

Abstract Background The interactions among proteins act as crucial roles in most cellular processes. Despite enormous effort put for identifying protein-protein interactions (PPIs) from a large number of organisms, existing firsthand biological experimental methods are high cost, low efficiency, and high false-positive rate. The application of in silico methods opens new doors for predicting interactions among proteins, and has been attracted a great deal of attention in the last decades. Results Here we present a novelty computational model with the adoption of our proposed Discriminative Vector Machine (DVM) model and a 2-Dimensional Principal Component Analysis (2DPCA) descriptor to identify candidate PPIs only based on protein sequences. To be more specific, a 2DPCA descriptor is employed to capture discriminative feature information from Position-Specific Scoring Matrix (PSSM) of amino acid sequences by the tool of PSI-BLAST. Then, a robust and powerful DVM classifier is employed to infer PPIs. When applied on both gold benchmark datasets of Yeast and H. pylori, our model obtained mean prediction accuracies as high as of 97.06 and 92.89%, respectively, which demonstrates a noticeable improvement than some state-of-the-art methods. Moreover, we constructed Support Vector Machines (SVM) based predictive model and made comparison it with our model on Human benchmark dataset. In addition, to further demonstrate the predictive reliability of our proposed method, we also carried out extensive experiments for identifying cross-species PPIs on five other species datasets. Conclusions All the experimental results indicate that our method is very effective for identifying potential PPIs and could serve as a practical approach to aid bioexperiment in proteomics research.


2015 ◽  
Vol 2015 ◽  
pp. 1-12 ◽  
Author(s):  
Xiaosong Zhao ◽  
Yao Huang

Missing data is an inevitable problem when measuring CO2, water, and energy fluxes between biosphere and atmosphere by eddy covariance systems. To find the optimum gap-filling method for short vegetations, we review three-methods mean diurnal variation (MDV), look-up tables (LUT), and nonlinear regression (NLR) for estimating missing values of net ecosystem CO2exchange (NEE) in eddy covariance time series and evaluate their performance for different artificial gap scenarios based on benchmark datasets from marsh and cropland sites in China. The cumulative errors for three methods have no consistent bias trends, which ranged between −30 and +30 mgCO2 m−2from May to October at three sites. To reduce sum bias in maximum, combined gap-filling methods were selected for short vegetation. The NLR or LUT method was selected after plant rapidly increasing in spring and before the end of plant growing, and MDV method was used to the other stage. The sum relative error (SRE) of optimum method ranged between −2 and +4% for four-gap level at three sites, except for 55% gaps at soybean site, which also obviously reduced standard deviation of error.


2017 ◽  
Vol 2017 ◽  
pp. 1-10 ◽  
Author(s):  
Rianon Zaman ◽  
Shahana Yasmin Chowdhury ◽  
Mahmood A. Rashid ◽  
Alok Sharma ◽  
Abdollah Dehzangi ◽  
...  

DNA-binding proteins often play important role in various processes within the cell. Over the last decade, a wide range of classification algorithms and feature extraction techniques have been used to solve this problem. In this paper, we propose a novel DNA-binding protein prediction method called HMMBinder. HMMBinder uses monogram and bigram features extracted from the HMM profiles of the protein sequences. To the best of our knowledge, this is the first application of HMM profile based features for the DNA-binding protein prediction problem. We applied Support Vector Machines (SVM) as a classification technique in HMMBinder. Our method was tested on standard benchmark datasets. We experimentally show that our method outperforms the state-of-the-art methods found in the literature.


2014 ◽  
Vol 281 (1797) ◽  
pp. 20141861 ◽  
Author(s):  
Nathan S. McClure ◽  
Troy Day

Drug resistance is a serious public health problem that threatens to thwart our ability to treat many infectious diseases. Repeatedly, the introduction of new drugs has been followed by the evolution of resistance. In principle, there are two complementary ways to address this problem: (i) enhancing drug development and (ii) slowing the evolution of drug resistance through evolutionary management. Although these two strategies are not mutually exclusive, it is nevertheless worthwhile considering whether one might be inherently more effective than the other. We present a simple mathematical model that explores how interventions aimed at these two approaches affect the availability of effective drugs. Our results identify an interesting feature of evolution management that, all else equal, tends to make it more effective than enhancing drug development. Thus, although enhancing drug development will necessarily be a central part of addressing the problem of resistance, our results lend support to the idea that evolution management is probably a very significant component of the solution as well.


Sign in / Sign up

Export Citation Format

Share Document