dipeptide composition
Recently Published Documents


TOTAL DOCUMENTS

37
(FIVE YEARS 8)

H-INDEX

12
(FIVE YEARS 3)

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Alvaro Ras-Carmona ◽  
Marta Gomez-Perosanz ◽  
Pedro A. Reche

Abstract Motivation In eukaryotes, proteins targeted for secretion contain a signal peptide, which allows them to proceed through the conventional ER/Golgi-dependent pathway. However, an important number of proteins lacking a signal peptide can be secreted through unconventional routes, including that mediated by exosomes. Currently, no method is available to predict protein secretion via exosomes. Results Here, we first assembled a dataset including the sequences of 2992 proteins secreted by exosomes and 2961 proteins that are not secreted by exosomes. Subsequently, we trained different random forests models on feature vectors derived from the sequences in this dataset. In tenfold cross-validation, the best model was trained on dipeptide composition, reaching an accuracy of 69.88% ± 2.08 and an area under the curve (AUC) of 0.76 ± 0.03. In an independent dataset, this model reached an accuracy of 75.73% and an AUC of 0.840. After these results, we developed ExoPred, a web-based tool that uses random forests to predict protein secretion by exosomes. Conclusion ExoPred is available for free public use at http://imath.med.ucm.es/exopred/. Datasets are available at http://imath.med.ucm.es/exopred/datasets/.


2021 ◽  
Vol 22 (S3) ◽  
Author(s):  
Shunfang Wang ◽  
Lin Deng ◽  
Xinnan Xia ◽  
Zicheng Cao ◽  
Yu Fei

Abstract Background Antifreeze proteins (AFPs) are a group of proteins that inhibit body fluids from growing to ice crystals and thus improve biological antifreeze ability. It is vital to the survival of living organisms in extremely cold environments. However, little research is performed on sequences feature extraction and selection for antifreeze proteins classification in the structure and function prediction, which is of great significance. Results In this paper, to predict the antifreeze proteins, a feature representation of weighted generalized dipeptide composition (W-GDipC) and an ensemble feature selection based on two-stage and multi-regression method (LRMR-Ri) are proposed. Specifically, four feature selection algorithms: Lasso regression, Ridge regression, Maximal information coefficient and Relief are used to select the feature sets, respectively, which is the first stage of LRMR-Ri method. If there exists a common feature subset among the above four sets, it is the optimal subset; otherwise we use Ridge regression to select the optimal subset from the public set pooled by the four sets, which is the second stage of LRMR-Ri. The LRMR-Ri method combined with W-GDipC was performed both on the antifreeze proteins dataset (binary classification), and on the membrane protein dataset (multiple classification). Experimental results show that this method has good performance in support vector machine (SVM), decision tree (DT) and stochastic gradient descent (SGD). The values of ACC, RE and MCC of LRMR-Ri and W-GDipC with antifreeze proteins dataset and SVM classifier have reached as high as 95.56%, 97.06% and 0.9105, respectively, much higher than those of each single method: Lasso, Ridge, Mic and Relief, nearly 13% higher than single Lasso for ACC. Conclusion The experimental results show that the proposed LRMR-Ri and W-GDipC method can significantly improve the accuracy of antifreeze proteins prediction compared with other similar single feature methods. In addition, our method has also achieved good results in the classification and prediction of membrane proteins, which verifies its widely reliability to a certain extent.


2021 ◽  
Vol 11 (5) ◽  
pp. 2316
Author(s):  
Anum Rauf ◽  
Aqsa Kiran ◽  
Malik Tahir Hassan ◽  
Sajid Mahmood ◽  
Ghulam Mustafa ◽  
...  

Heart attack and other heart-related diseases are among the main causes of fatalities in the world. These diseases and some other severe problems like kidney failure and paralysis are mainly caused by hypertension. Since bioactive peptides extracted from naturally existing food substances possess antihypertensive activity, these antihypertensive peptides (AHTP) can function as prospective replacements for existing pharmacological drugs with no or fewer side effects. Such naturally existing peptides can be identified using in-silico approaches. The in-silico methods have been proven to save huge amounts of time and money in the identification of effective peptides. The proposed methodology is a deep learning-based in-silico approach for the identification of antihypertensive peptides (AHTPs). An ensemble method is proposed that combines convolutional neural network (CNN) and support vector machine (SVM) classifiers. Amino acid composition (AAC) and g-gap dipeptide composition (DPC) techniques are used for feature extraction. The proposed methodology has been evaluated on two standard antihypertensive peptide sequence datasets. The model yields 95% accuracy on the benchmarking dataset and 88.9% accuracy on the independent dataset. Comparative analysis is provided to demonstrate that the proposed method outperforms existing state-of-the-art methods on both of the benchmarking and independent datasets.


2020 ◽  
Vol 23 (6) ◽  
pp. 536-545
Author(s):  
Haoyue Zhang ◽  
Qilemuge Xi ◽  
Shenghui Huang ◽  
Lei Zheng ◽  
Wuritu Yang ◽  
...  

Background: As the pathogen of malaria, malaria parasite secretes a variety of proteins for its growth and reproduction. Objective: The identification of the secretory proteins of malaria parasite has crucial reference significance for the development of anti-malaria vaccines as well as medicine. Methods: In this study, a computational classification method was developed to identify the secreted proteins of Plasmodium. Amino acid composition, dipeptide composition, and tripeptide composition as well as reduced amino acids alphabets were proposed to illuminate protein sequences. We further used SVM to train and predict respectively and optimized the features. Results: 74 types of reduced amino acids alphabets were employed to predict secretory proteins. The results showed that the accuracy improved to 91.67% with 0.84 Mathew’s correlation coefficient (MCC) by dipeptide composition, and the highest prediction accuracy reached 92.26% after feature selection, which demonstrated that our method is prominent and reliable in the field of malaria parasite secreted proteins prediction. Conclusion: A intuitive web server iSP-RAAC (http://bioinfor.imu.edu.cn/isppseraac) was established for the convenience of most experimental scientists.


2020 ◽  
Vol 2020 ◽  
pp. 1-8 ◽  
Author(s):  
Feng-Min Li ◽  
Xiao-Wei Gao

There are a lot of bacteria in the environment, and Gram-positive bacteria are the most common ones. Some Gram-positive bacteria are very harmful to the human body, so it is significant to predict Gram-positive bacterial protein subcellular location. And identification of Gram-positive bacterial protein subcellular location is important for developing effective drugs. In this paper, a new Gram-positive bacterial protein subcellular location dataset was established. The amino acid composition, the gene ontology annotation information, the hydropathy dipeptide composition information, the amino acid dipeptide composition information, and the autocovariance average chemical shift information were selected as characteristic parameters, then these parameters were combined. The locations of Gram-positive bacterial proteins were predicted by the Support Vector Machine (SVM) algorithm, and the overall accuracy (OA) reached 86.1% under the Jackknife test. The overall accuracy (OA) in our predictive model was higher than those in existing methods. This improved method may be helpful for protein function prediction.


Cells ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 353 ◽  
Author(s):  
Phasit Charoenkwan ◽  
Sakawrat Kanthawong ◽  
Nalini Schaduangrat ◽  
Janchai Yana ◽  
Watshara Shoombuatong

Although, existing methods have been successful in predicting phage (or bacteriophage) virion proteins (PVPs) using various types of protein features and complex classifiers, such as support vector machine and naïve Bayes, these two methods do not allow interpretability. However, the characterization and analysis of PVPs might be of great significance to understanding the molecular mechanisms of bacteriophage genetics and the development of antibacterial drugs. Hence, we herein proposed a novel method (PVPred-SCM) based on the scoring card method (SCM) in conjunction with dipeptide composition to identify and characterize PVPs. In PVPred-SCM, the propensity scores of 400 dipeptides were calculated using the statistical discrimination approach. Rigorous independent validation test showed that PVPred-SCM utilizing only dipeptide composition yielded an accuracy of 77.56%, indicating that PVPred-SCM performed well relative to the state-of-the-art method utilizing a number of protein features. Furthermore, the propensity scores of dipeptides were used to provide insights into the biochemical and biophysical properties of PVPs. Upon comparison, it was found that PVPred-SCM was superior to the existing methods considering its simplicity, interpretability, and implementation. Finally, in an effort to facilitate high-throughput prediction of PVPs, we provided a user-friendly web-server for identifying the likelihood of whether or not these sequences are PVPs. It is anticipated that PVPred-SCM will become a useful tool or at least a complementary existing method for predicting and analyzing PVPs.


2019 ◽  
Vol 26 (5) ◽  
pp. 339-347 ◽  
Author(s):  
Dilani G. Gamage ◽  
Ajith Gunaratne ◽  
Gopal R. Periyannan ◽  
Timothy G. Russell

Background: The dipeptide composition-based Instability Index (II) is one of the protein primary structure-dependent methods available for in vivo protein stability predictions. As per this method, proteins with II value below 40 are stable proteins. Intracellular protein stability principles guided the original development of the II method. However, the use of the II method for in vitro protein stability predictions raises questions about the validity of applying the II method under experimental conditions that are different from the in vivo setting. Objective: The aim of this study is to experimentally test the validity of the use of II as an in vitro protein stability predictor. Methods: A representative protein CCM (CCM - Caulobacter crescentus metalloprotein) that rapidly degrades under in vitro conditions was used to probe the dipeptide sequence-dependent degradation properties of CCM by generating CCM mutants to represent stable and unstable II values. A comparative degradation analysis was carried out under in vitro conditions using wildtype CCM, CCM mutants and two other candidate proteins: metallo-β-lactamase L1 and α -S1- casein representing stable, borderline stable/unstable, and unstable proteins as per the II predictions. The effect of temperature and a protein stabilizing agent on CCM degradation was also tested. Results: Data support the dipeptide composition-dependent protein stability/instability in wt-CCM and mutants as predicted by the II method under in vitro conditions. However, the II failed to accurately represent the stability of other tested proteins. Data indicate the influence of protein environmental factors on the autoproteolysis of proteins. Conclusion: Broader application of the II method for the prediction of protein stability under in vitro conditions is questionable as the stability of the protein may be dependent not only on the intrinsic nature of the protein but also on the conditions of the protein milieu.


2019 ◽  
Vol 16 (4) ◽  
pp. 325-331 ◽  
Author(s):  
Xianfang Wang ◽  
Hongfei Li ◽  
Peng Gao ◽  
Yifeng Liu ◽  
Wenjing Zeng

The catalytic activity of the enzyme is different from that of the inorganic catalyst. In a high-temperature, over-acid or over-alkaline environment, the structure of the enzyme is destroyed and then loses its activity. Although the biochemistry experiments can measure the optimal PH environment of the enzyme, these methods are inefficient and costly. In order to solve these problems, computational model could be established to determine the optimal acidic or alkaline environment of the enzyme. Firstly, in this paper, we introduced a new feature called dual g-gap dipeptide composition to formulate enzyme samples. Subsequently, the best feature was selected by using the F value calculated from analysis of variance. Finally, support vector machine was utilized to build prediction model for distinguishing acidic from alkaline enzyme. The overall accuracy of 95.9% was achieved with Jackknife cross-validation, which indicates that our method is professional and efficient in terms of acid and alkaline enzyme predictions. The feature proposed in this paper could also be applied in other fields of bioinformatics.


Sign in / Sign up

Export Citation Format

Share Document