scholarly journals Self-Interacting Proteins Prediction from PSSM Based on Evolutionary Information

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Zheng Wang ◽  
Yang Li ◽  
Li-Ping Li ◽  
Zhu-Hong You ◽  
Wen-Zhun Huang

Self-interacting proteins (SIPs) play an influential role in regulating cell structure and function. Thus, it is critically important to identify whether proteins themselves interact with each other. Although there are some existing experimental methods for self-interaction recognition, the limitations of these methods are both expensive and time-consuming. Therefore, it is very necessary to develop an efficient and stable computational method for predicting SIPs. In this study, we develop an effective computational method for predicting SIPs based on rotation forest (RF) classifier, combined with histogram of oriented gradients (HOG) and synthetic minority oversampling technique (SMOTE). When performing SIPs prediction on yeast and human datasets, the proposed method achieves superior accuracies of 97.28% and 89.41%, respectively. In addition, the proposed approach was compared with the state-of-the-art support vector machine (SVM) classifiers and other different methods on the same datasets. The experimental results demonstrate that our method has good robustness and effectiveness and can be regarded as a useful tool for SIPs prediction.

2020 ◽  
Vol 16 ◽  
pp. 117693432092467
Author(s):  
Ji-Yong An ◽  
Yong Zhou ◽  
Zi-Ji Yan ◽  
Yu-Jun Zhao

Self-interacting proteins (SIPs) play crucial roles in biological activities of organisms. Many high-throughput methods can be used to identify SIPs. However, these methods are both time-consuming and expensive. How to develop effective computational approaches for identifying SIPs is a challenging task. In the article, we present a novel computational method called RRN-SIFT, which combines the recurrent neural network (RNN) with scale invariant feature transform (SIFT) to predict SIPs based on protein evolutionary information. The main advantage of the proposed RNN-SIFT model is that it uses SIFT for extracting key feature by exploring the evolutionary information embedded in Position-Specific Iterated BLAST–constructed position-specific scoring matrix and employs an RNN classifier to perform classification based on extracted features. Extensive experiments show that the RRN-SIFT obtained average accuracy of 94.34% and 97.12% on the yeast and human dataset, respectively. We also compared our performance with the back propagation neural network (BPNN), the state-of-the-art support vector machine (SVM), and other existing methods. By comparing with experimental results, the performance of RNN-SIFT is significantly better than that of the BPNN, SVM, and other previous methods in the domain. Therefore, we conclude that the proposed RNN-SIFT model is a useful tool for predicting SIPs, as well to solve other bioinformatics tasks. To facilitate widely studies and encourage future proteomics research, a freely available web server called RNN-SIFT-SIPs was developed at http://219.219.62.123:8888/RNNSIFT/ including the source code and the SIP datasets.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Ji-Yong An ◽  
Fan-Rong Meng ◽  
Zi-Ji Yan

Abstract Background Prediction of novel Drug–Target interactions (DTIs) plays an important role in discovering new drug candidates and finding new proteins to target. In consideration of the time-consuming and expensive of experimental methods. Therefore, it is a challenging task that how to develop efficient computational approaches for the accurate predicting potential associations between drug and target. Results In the paper, we proposed a novel computational method called WELM-SURF based on drug fingerprints and protein evolutionary information for identifying DTIs. More specifically, for exploiting protein sequence feature, Position Specific Scoring Matrix (PSSM) is applied to capturing protein evolutionary information and Speed up robot features (SURF) is employed to extract sequence key feature from PSSM. For drug fingerprints, the chemical structure of molecular substructure fingerprints was used to represent drug as feature vector. Take account of the advantage that the Weighted Extreme Learning Machine (WELM) has short training time, good generalization ability, and most importantly ability to efficiently execute classification by optimizing the loss function of weight matrix. Therefore, the WELM classifier is used to carry out classification based on extracted features for predicting DTIs. The performance of the WELM-SURF model was evaluated by experimental validations on enzyme, ion channel, GPCRs and nuclear receptor datasets by using fivefold cross-validation test. The WELM-SURF obtained average accuracies of 93.54, 90.58, 85.43 and 77.45% on enzyme, ion channels, GPCRs and nuclear receptor dataset respectively. We also compared our performance with the Extreme Learning Machine (ELM), the state-of-the-art Support Vector Machine (SVM) on enzyme and ion channels dataset and other exiting methods on four datasets. By comparing with experimental results, the performance of WELM-SURF is significantly better than that of ELM, SVM and other previous methods in the domain. Conclusion The results demonstrated that the proposed WELM-SURF model is competent for predicting DTIs with high accuracy and robustness. It is anticipated that the WELM-SURF method is a useful computational tool to facilitate widely bioinformatics studies related to DTIs prediction.


Author(s):  
Arianna Filntisi ◽  
Nikitas Papangelopoulos ◽  
Elena Bencurova ◽  
Ioannis Kasampalidis ◽  
George Matsopoulos ◽  
...  

Artificial neural networks (ANNs) are a well-established computational method inspired by the structure and function of biological central nervous systems. Since their conception, ANNs have been utilized in a vast variety of applications due to their impressive information processing abilities. A vibrant field, ANNs have been utilized in bioinformatics, a general term for describing the combination of informatics, biology and medicine. This article is an effort to investigate recent advances in the area of bioinformatical applications of ANNs, with emphasis in disease diagnosis, genetics, proteomics, and chemoinformatics. The combination of neural networks and game theory in some of these application is also discussed.


2016 ◽  
Vol 2016 ◽  
pp. 1-8 ◽  
Author(s):  
Xin-Xin Chen ◽  
Hua Tang ◽  
Wen-Chao Li ◽  
Hao Wu ◽  
Wei Chen ◽  
...  

Owing to the abuse of antibiotics, drug resistance of pathogenic bacteria becomes more and more serious. Therefore, it is interesting to develop a more reasonable way to solve this issue. Because they can destroy the bacterial cell structure and then kill the infectious bacterium, the bacterial cell wall lyases are suitable candidates of antibacteria sources. Thus, it is urgent to develop an accurate and efficient computational method to predict the lyases. Based on the consideration, in this paper, a set of objective and rigorous data was collected by searching through the Universal Protein Resource (the UniProt database), whereafter a feature selection technique based on the analysis of variance (ANOVA) was used to acquire optimal feature subset. Finally, the support vector machine (SVM) was used to perform prediction. The jackknife cross-validated results showed that the optimal average accuracy of 84.82% was achieved with the sensitivity of 76.47% and the specificity of 93.16%. For the convenience of other scholars, we built a free online server calledLypred. We believe thatLypredwill become a practical tool for the research of cell wall lyases and development of antimicrobial agents.


2020 ◽  
Author(s):  
Ji-Yong An

Abstract Self-interactions Protein (SIPs) play crucial roles in biological activities of organisms. Many high-throughput methods can be used to identify SIPs. However, these methods are both time-consuming and expensive. How to develop effective computational approaches for identifying SIPs is a challenging task. In the paper, we presented a novelty computational method called RRN-SIFT, which combines the Recurrent Neural Network (RNN) with Scale Invariant Feature Transform (SIFT) to predict SIPs based on protein evolutionary information. The main advantage of the proposed RNN-SIFT model is that it used SIFT for extracting key feature by exploring the evolutionary information embedded in PSI-BLAST-constructed position specific scoring matrix (PSSM) and employed RNN classifier to carry out classification based on extracted features. Extensive experiments show that the RRN-SIFT obtained average accuracy of 94.34% and 97.12% on yeast and human dataset. We also compared our performance with the Back Propagation Neural Network (BPNN), the state-of-the-art support vector machine (SVM) and other exiting methods. By comparing with experimental results, the performance of RNN-SIFT is significantly better than those of the BPNN, SVM and other previous methods in the domain. Therefore, we can come to the conclusion that the proposed RNN-SIFT model is useful tools and can execute incredibly well for predicting SIPs, as well as other bioinformatics tasks. In order to facilitate widely studies and encourage future proteomics research, a freely available web server called RNN-SIFT-SIPs was developed, and is available at http://219.219.62.123:8888/RNNSIFT/ and includes source code and SIPs datasets.


Author(s):  
Liang Kong ◽  
◽  
Lingfu Kong ◽  
Rong Jing ◽  

Protein structural class prediction is beneficial to study protein function, regulation and interactions. However, protein structural class prediction for low-similarity sequences (i.e., below 40% in pairwise sequence similarity) remains a challenging problem at present. In this study, a novel computational method is proposed to accurately predict protein structural class for low-similarity sequences. This method is based on support vector machine in conjunction with integrated features from evolutionary information generated with position specific iterative basic local alignment search tool (PSI-BLAST) and predicted secondary structure. Various prediction accuracies evaluated by the jackknife tests are reported on two widely-used low-similarity benchmark datasets (25PDB and 1189), reaching overall accuracies 89.3% and 87.9%, which are significantly higher than those achieved by state-of-the-art in protein structural class prediction. The experimental results suggest that our method could serve as an effective alternative to existing methods in protein structural classification, especially for low-similarity sequences.


Genes ◽  
2018 ◽  
Vol 9 (9) ◽  
pp. 449 ◽  
Author(s):  
JiaRui Li ◽  
Lei Chen ◽  
Yu-Hang Zhang ◽  
XiangYin Kong ◽  
Tao Huang ◽  
...  

Tissue-specific gene expression has long been recognized as a crucial key for understanding tissue development and function. Efforts have been made in the past decade to identify tissue-specific expression profiles, such as the Human Proteome Atlas and FANTOM5. However, these studies mainly focused on “qualitatively tissue-specific expressed genes” which are highly enriched in one or a group of tissues but paid less attention to “quantitatively tissue-specific expressed genes”, which are expressed in all or most tissues but with differential expression levels. In this study, we applied machine learning algorithms to build a computational method for identifying “quantitatively tissue-specific expressed genes” capable of distinguishing 25 human tissues from their expression patterns. Our results uncovered the expression of 432 genes as optimal features for tissue classification, which were obtained with a Matthews Correlation Coefficient (MCC) of more than 0.99 yielded by a support vector machine (SVM). This constructed model was superior to the SVM model using tissue enriched genes and yielded MCC of 0.985 on an independent test dataset, indicating its good generalization ability. These 432 genes were proven to be widely expressed in multiple tissues and a literature review of the top 23 genes found that most of them support their discriminating powers. As a complement to previous studies, our discovery of these quantitatively tissue-specific genes provides insights into the detailed understanding of tissue development and function.


Molecules ◽  
2019 ◽  
Vol 24 (16) ◽  
pp. 2999 ◽  
Author(s):  
Yang Li ◽  
Yu-An Huang ◽  
Zhu-Hong You ◽  
Li-Ping Li ◽  
Zheng Wang

The identification of drug-target interactions (DTIs) is a critical step in drug development. Experimental methods that are based on clinical trials to discover DTIs are time-consuming, expensive, and challenging. Therefore, as complementary to it, developing new computational methods for predicting novel DTI is of great significance with regards to saving cost and shortening the development period. In this paper, we present a novel computational model for predicting DTIs, which uses the sequence information of proteins and a rotation forest classifier. Specifically, all of the target protein sequences are first converted to a position-specific scoring matrix (PSSM) to retain evolutionary information. We then use local phase quantization (LPQ) descriptors to extract evolutionary information in the PSSM. On the other hand, substructure fingerprint information is utilized to extract the features of the drug. We finally combine the features of drugs and protein together to represent features of each drug-target pair and use a rotation forest classifier to calculate the scores of interaction possibility, for a global DTI prediction. The experimental results indicate that the proposed model is effective, achieving average accuracies of 89.15%, 86.01%, 82.20%, and 71.67% on four datasets (i.e., enzyme, ion channel, G protein-coupled receptors (GPCR), and nuclear receptor), respectively. In addition, we compared the prediction performance of the rotation forest classifier with another popular classifier, support vector machine, on the same dataset. Several types of methods previously proposed are also implemented on the same datasets for performance comparison. The comparison results demonstrate the superiority of the proposed method to the others. We anticipate that the proposed method can be used as an effective tool for predicting drug-target interactions on a large scale, given the information of protein sequences and drug fingerprints.


2020 ◽  
Author(s):  
Ji-Yong An ◽  
Yong Zhou ◽  
Zi-Ji Yan ◽  
Yu-Jun Zhao

Abstract Background: Self-interaction Proteins (SIPs) play a key role in a variety of biological activities of organisms. In consideration of the time-consuming and expensive of high-throughput methods, and the number of positive and negative samples is very imbalanced in SIPs datasets. How to develop accurate and efficient computational approaches for assisting and accelerating the study of identifying SIPs is a challenging task.Results:In the work, we proposed a new computational method called WELM-SURF for predicting SIPs. More specifically, for exploiting protein sequence feature, Position Specific Scoring Matrix (PSSM) is applied to capturing protein evolutionary information and Speed up robot features (SURF) is employed to extract key feature of protein sequence from PSSM. Take account of the advantage that the Weighted Extreme Learning Machine (WELM) has short training time, good generalization ability, and most importantly ability to efficiently execute classification for imbalanced class samples by optimizing the loss function of weight matrix. Therefore, the WELM classifier is used to perform classification based on extracted features for predicting SIPs. A large number of experiments show that the average accuracy of WELM-SURF is 95.25% and 98.79% on yeast and human dataset, respectively. We also compared our performance with Extreme Learning Machine (ELM), the state-of-the-art Support Vector Machine (SVM), and other existing methods. Compared with the experimental results, the performance of WELM-SURF in this domain is obviously better than ELM, SVM and other previous methods.Conclusion: These experimental results proved that the proposed WELM-SURF model is competent for predicting SIPs with high accuracy and robustness. It is anticipated that the WELM-SURF method is a useful computational tool to facilitate widely bioinformatics studies related to SIPs prediction. For further encouraging future proteomics research, we developed a freely available web server called WELM-SURF-SIPs. It is available at http://219.219.62.123:8888/WELMSURF/ and includes SIPs datasets and source code.


2020 ◽  
Author(s):  
An Ji Yong ◽  
Meng Fan-Rong ◽  
Yan Zi-Ji

Abstract Background: Prediction of novel Drug–Target interactions (DTIs) plays an important role in discovering new drug candidates and finding new proteins to target. In consideration of the time-consuming and expensive of experimental methods. Therefore, it is a challenging task that how to develop efficient computational approaches for the accurate predicting potential associations between drug and target. Results: In the paper, we proposed a novel computational method called WELM-SURF based on drug fingerprints and protein evolutionary information for identifying DTIs. More specifically, for exploiting protein sequence feature, Position Specific Scoring Matrix (PSSM) is applied to capturing protein evolutionary information and Speed up robot features (SURF) is employed to extract sequence key feature from PSSM. For drug fingerprints, the chemical structure of molecular substructure fingerprints was used to represent drug as feature vector. Take account of the advantage that the Weighted Extreme Learning Machine (WELM) has short training time, good generalization ability, and most importantly ability to efficiently execute classification by optimizing the loss function of weight matrix. Therefore, the WELM classifier is used to carry out classification based on extracted features for predicting DTIs. The performance of the WELM-SURF model was evaluated by experimental validations on enzyme, ion channel, GPCRs and nuclear receptor datasets by using fivefold cross-validation test. The WELM-SURF obtained average accuracies of 93.54%, 90.58%, 85.43% and 77.45% on enzyme, ion channels, GPCRs and nuclear receptor dataset respectively. We also compared our performance with the Extreme Learning Machine (ELM), the state-of-the-art Support Vector Machine (SVM) on enzyme and ion channels dataset and other exiting methods on four datasets. By comparing with experimental results, the performance of WELM-SURF is significantly better than that of ELM, SVM and other previous methods in the domain.Conclusion: The results demonstrated that the proposed WELM-SURF model is competent for predicting DTIs with high accuracy and robustness. It is anticipated that the WELM-SURF method is a useful computational tool to facilitate widely bioinformatics studies related to DTIs prediction.


Sign in / Sign up

Export Citation Format

Share Document