scholarly journals Performance of rotation forest ensemble classifier and feature extractor in predicting protein interactions using amino acid sequences

BMC Genomics ◽  
2019 ◽  
Vol 20 (S9) ◽  
Author(s):  
Alhadi Bustamam ◽  
Mohamad I. S. Musti ◽  
Susilo Hartomo ◽  
Shirley Aprilia ◽  
Patuan P. Tampubolon ◽  
...  

Abstract Background There are two significant problems associated with predicting protein-protein interactions using the sequences of amino acids. The first problem is representing each sequence as a feature vector, and the second is designing a model that can identify the protein interactions. Thus, effective feature extraction methods can lead to improved model performance. In this study, we used two types of feature extraction methods—global encoding and pseudo-substitution matrix representation (PseudoSMR)—to represent the sequences of amino acids in human proteins and Human Immunodeficiency Virus type 1 (HIV-1) to address the classification problem of predicting protein-protein interactions. We also compared principal component analysis (PCA) with independent principal component analysis (IPCA) as methods for transforming Rotation Forest. Results The results show that using global encoding and PseudoSMR as a feature extraction method successfully represents the amino acid sequence for the Rotation Forest classifier with PCA or with IPCA. This can be seen from the comparison of the results of evaluation metrics, which were >73% across the six different parameters. The accuracy of both methods was >74%. The results for the other model performance criteria, such as sensitivity, specificity, precision, and F1-score, were all >73%. The data used in this study can be accessed using the following link: https://www.dsc.ui.ac.id/research/amino-acid-pred/. Conclusions Both global encoding and PseudoSMR can successfully represent the sequences of amino acids. Rotation Forest (PCA) performed better than Rotation Forest (IPCA) in terms of predicting protein-protein interactions between HIV-1 and human proteins. Both the Rotation Forest (PCA) classifier and the Rotation Forest IPCA classifier performed better than other classifiers, such as Gradient Boosting, K-Nearest Neighbor, Logistic Regression, Random Forest, and Support Vector Machine (SVM). Rotation Forest (PCA) and Rotation Forest (IPCA) have accuracy, sensitivity, specificity, precision, and F1-score values >70% while the other classifiers have values <70%.

2021 ◽  
Author(s):  
Babu Sudhamalla ◽  
Anirban Roy ◽  
Soumen Barman ◽  
Jyotirmayee Padhan

The site-specific installation of light-activable crosslinker unnatural amino acids offers a powerful approach to trap transient protein-protein interactions both in vitro and in vivo. Herein, we engineer a bromodomain to...


2004 ◽  
Vol 24 (12) ◽  
pp. 5521-5533 ◽  
Author(s):  
David A. Mangus ◽  
Matthew C. Evans ◽  
Nathan S. Agrin ◽  
Mandy Smith ◽  
Preetam Gongidi ◽  
...  

ABSTRACT PAN, a yeast poly(A) nuclease, plays an important nuclear role in the posttranscriptional maturation of mRNA poly(A) tails. The activity of this enzyme is dependent on its Pan2p and Pan3p subunits, as well as the presence of poly(A)-binding protein (Pab1p). We have identified and characterized the associated network of factors controlling the maturation of mRNA poly(A) tails in yeast and defined its relevant protein-protein interactions. Pan3p, a positive regulator of PAN activity, interacts with Pab1p, thus providing substrate specificity for this nuclease. Pab1p also regulates poly(A) tail trimming by interacting with Pbp1p, a factor that appears to negatively regulate PAN. Pan3p and Pbp1p both interact with themselves and with the C terminus of Pab1p. However, the domains required for Pan3p and Pbp1p binding on Pab1p are distinct. Single amino acid changes that disrupt Pan3p interaction with Pab1p have been identified and define a binding pocket in helices 2 and 3 of Pab1p's carboxy terminus. The importance of these amino acids for Pab1p-Pan3p interaction, and poly(A) tail regulation, is underscored by experiments demonstrating that strains harboring substitutions in these residues accumulate mRNAs with long poly(A) tails in vivo.


Molecules ◽  
2020 ◽  
Vol 25 (8) ◽  
pp. 1841 ◽  
Author(s):  
Da Xu ◽  
Hanxiao Xu ◽  
Yusen Zhang ◽  
Wei Chen ◽  
Rui Gao

Identification of protein-protein interactions (PPIs) plays an essential role in the understanding of protein functions and cellular biological activities. However, the traditional experiment-based methods are time-consuming and laborious. Therefore, developing new reliable computational approaches has great practical significance for the identification of PPIs. In this paper, a novel prediction method is proposed for predicting PPIs using graph energy, named PPI-GE. Particularly, in the process of feature extraction, we designed two new feature extraction methods, the physicochemical graph energy based on the ionization equilibrium constant and isoelectric point and the contact graph energy based on the contact information of amino acids. The dipeptide composition method was used for order information of amino acids. After multi-information fusion, principal component analysis (PCA) was implemented for eliminating noise and a robust weighted sparse representation-based classification (WSRC) classifier was applied for sample classification. The prediction accuracies based on the five-fold cross-validation of the human, Helicobacter pylori (H. pylori), and yeast data sets were 99.49%, 97.15%, and 99.56%, respectively. In addition, in five independent data sets and two significant PPI networks, the comparative experimental results also demonstrate that PPI-GE obtained better performance than the compared methods.


Genes ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 826
Author(s):  
Roman Matyášek ◽  
Kateřina Řehůřková ◽  
Kristýna Berta Marošiová ◽  
Aleš Kovařík

The genomic diversity of SARS-CoV-2 has been a focus during the ongoing COVID-19 pandemic. Here, we analyzed the distribution and character of emerging mutations in a data set comprising more than 95,000 virus genomes covering eight major SARS-CoV-2 lineages in the GISAID database, including genotypes arising during COVID-19 therapy. Globally, the C>U transitions and G>U transversions were the most represented mutations, accounting for the majority of single-nucleotide variations. Mutational spectra were not influenced by the time the virus had been circulating in its host or medical treatment. At the amino acid level, we observed about a 2-fold excess of substitutions in favor of hydrophobic amino acids over the reverse. However, most mutations constituting variants of interests of the S-protein (spike) lead to hydrophilic amino acids, counteracting the global trend. The C>U and G>U substitutions altered codons towards increased amino acid hydrophobicity values in more than 80% of cases. The bias is explained by the existing differences in the codon composition for amino acids bearing contrasting biochemical properties. Mutation asymmetries apparently influence the biochemical features of SARS CoV-2 proteins, which may impact protein–protein interactions, fusion of viral and cellular membranes, and virion assembly.


2005 ◽  
Vol 289 (5) ◽  
pp. H1941-H1950 ◽  
Author(s):  
Seth L. Robia ◽  
Misuk Kang ◽  
Jeffery W. Walker

The Z-line represents a critical link between the transverse tubule network and cytoskeleton of cardiac cells with a role in anchoring structural proteins, ion channels, and signaling molecules. Protein kinase C-ε (PKC-ε) regulates cardiac excitability, cardioprotection, and growth, possibly as a consequence of translocation to the Z-line/T tubule region. To investigate the mechanism of PKC-ε translocation, fragments of its NH2-terminal 144-amino acid variable domain, εV1, were fused with green fluorescent protein and evaluated by quantitative Fourier image analysis of decorated myocytes. Deletion of 23 amino acids from the NH2-terminus of εV1, including an EAVSLKPT motif important for binding to a receptor for activated C kinase (RACK2), reduced but did not abolish Z-line binding. Further deletions of up to 84 amino acids from the NH2-terminus of εV1 also did not prevent Z-line decoration. However, deletions of residues 85–144 from the COOH-terminus strongly reduced Z-line binding. COOH-terminal deletions caused 2.5-fold greater loss of binding energy (ΔΔG) than did NH2-terminal deletions. Synthetic peptides derived from these regions modulated εV1 binding and cardiac myocyte function, but also revealed considerable heterogeneity within populations of adult cardiac myocytes. The COOH-terminal subdomain important for Z-line anchoring maps to a surface in the εV1 crystal structure that complements the eight-amino acid RACK2 binding site and two previously identified membrane docking motifs. PKC-ε anchoring at the cardiac Z-line/T tubule appears to rely on multiple points of contact probably involving protein-lipid and protein-protein interactions.


2019 ◽  
Vol 15 ◽  
pp. 117693431987992 ◽  
Author(s):  
Ji-Yong An ◽  
Yong Zhou ◽  
Yu-Jun Zhao ◽  
Zi-Ji Yan

Background: Increasing evidence has indicated that protein-protein interactions (PPIs) play important roles in various aspects of the structural and functional organization of a cell. Thus, continuing to uncover potential PPIs is an important topic in the biomedical domain. Although various feature extraction methods with machine learning approaches have enhanced the prediction of PPIs. There remains room for improvement by developing novel and effective feature extraction methods and classifier approaches to identify PPIs. Method: In this study, we proposed a sequence-based feature extraction method called LCPSSMMF, which combined local coding position-specific scoring matrix (PSSM) with multifeatures fusion. First, we used a novel local coding method based on PSSM to build a new PSSM (CPSSM); the advantage of this method is that it incorporated global and local feature extraction, which can account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. Second, we adopted 2 different feature extraction methods (Local Average Group [LAG] and Bigram Probability [BP]) to capture multiple key feature information by employing the evolutionary information embedded in the CPSSM matrix. Finally, feature vectors were acquired by using multifeatures fusion method. Result: To evaluate the performance of the proposed feature extraction approach, we employed support vector machine (SVM) as a prediction classifier and applied this method to yeast and human PPI datasets. The prediction accuracies of LCPSSMMF were 93.43% and 90.41% on the yeast and human datasets, respectively. Moreover, we also compared the proposed method with the previous sequence-based approaches on the yeast datasets by using the same SVM classifier. The experimental results indicated that the performance of LCPSSMMF significantly exceeded that of several other state-of-the-art methods. It is proven that the LCPSSMMF approach can capture more local and global discriminatory information than almost all previous methods and can function remarkably well in identifying PPIs. To facilitate extensive research in future proteomics studies, we developed a LCPSSMMFSVM server, which is freely available for academic use at http://219.219.62.123:8888/LCPSSMMFSVM .


Sign in / Sign up

Export Citation Format

Share Document