scholarly journals Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis

2013 ◽  
Vol 14 (S8) ◽  
Author(s):  
Zhu-Hong You ◽  
Ying-Ke Lei ◽  
Lin Zhu ◽  
Junfeng Xia ◽  
Bing Wang
2021 ◽  
Vol 3 (2) ◽  
pp. 25-34
Author(s):  
Hilmi Farhan Ramadhani ◽  
Annisa ◽  
Wisnu Ananta Kusuma

Coronavirus Disease 2019 (COVID-19) will cause disease complications and organ damage due to excessive inflammatory reactions if left untreated. Computational analysis of protein-protein interactions can be carried out in various ways, including topological analysis and clustering of protein-protein interaction networks. Topological analysis can identify significant proteins by measuring the most important nodes with centrality measurements. By using Principal Component Analysis (PCA), the types of centrality measures were extracted into the overall centrality value. The study aimed to found significant proteins in COVID-19 protein-protein interactions using PCA and ClusterONE. This study used 57 proteins associated with COVID-19 to obtain protein networks. All of these proteins are homo sapiens organism. The number of proteins and the number of interactions from 57 proteins were 357 proteins and 1686 interactions. The results of this study consisted of two clusters; the best cluster was the first cluster with a lower p-value but had an average overall centrality value that closed to the second clus-ter. There are twenty important proteins in that cluster, and all of these proteins are related to COVID-19. These proteins are expected to be used in the process of discovering medicinal compounds in COVID-19


Icarus ◽  
2003 ◽  
Vol 166 (2) ◽  
pp. 403-409 ◽  
Author(s):  
Evan D. Dorn ◽  
Gene D. McDonald ◽  
Michael C. Storrie-Lombardi ◽  
Kenneth H. Nealson

BMC Genomics ◽  
2019 ◽  
Vol 20 (S9) ◽  
Author(s):  
Alhadi Bustamam ◽  
Mohamad I. S. Musti ◽  
Susilo Hartomo ◽  
Shirley Aprilia ◽  
Patuan P. Tampubolon ◽  
...  

Abstract Background There are two significant problems associated with predicting protein-protein interactions using the sequences of amino acids. The first problem is representing each sequence as a feature vector, and the second is designing a model that can identify the protein interactions. Thus, effective feature extraction methods can lead to improved model performance. In this study, we used two types of feature extraction methods—global encoding and pseudo-substitution matrix representation (PseudoSMR)—to represent the sequences of amino acids in human proteins and Human Immunodeficiency Virus type 1 (HIV-1) to address the classification problem of predicting protein-protein interactions. We also compared principal component analysis (PCA) with independent principal component analysis (IPCA) as methods for transforming Rotation Forest. Results The results show that using global encoding and PseudoSMR as a feature extraction method successfully represents the amino acid sequence for the Rotation Forest classifier with PCA or with IPCA. This can be seen from the comparison of the results of evaluation metrics, which were >73% across the six different parameters. The accuracy of both methods was >74%. The results for the other model performance criteria, such as sensitivity, specificity, precision, and F1-score, were all >73%. The data used in this study can be accessed using the following link: https://www.dsc.ui.ac.id/research/amino-acid-pred/. Conclusions Both global encoding and PseudoSMR can successfully represent the sequences of amino acids. Rotation Forest (PCA) performed better than Rotation Forest (IPCA) in terms of predicting protein-protein interactions between HIV-1 and human proteins. Both the Rotation Forest (PCA) classifier and the Rotation Forest IPCA classifier performed better than other classifiers, such as Gradient Boosting, K-Nearest Neighbor, Logistic Regression, Random Forest, and Support Vector Machine (SVM). Rotation Forest (PCA) and Rotation Forest (IPCA) have accuracy, sensitivity, specificity, precision, and F1-score values >70% while the other classifiers have values <70%.


Sign in / Sign up

Export Citation Format

Share Document