scholarly journals Using discriminative vector machine model with 2DPCA to predict interactions among proteins

2019 ◽  
Vol 20 (S25) ◽  
Author(s):  
Zhengwei Li ◽  
Ru Nie ◽  
Zhuhong You ◽  
Chen Cao ◽  
Jiashu Li

Abstract Background The interactions among proteins act as crucial roles in most cellular processes. Despite enormous effort put for identifying protein-protein interactions (PPIs) from a large number of organisms, existing firsthand biological experimental methods are high cost, low efficiency, and high false-positive rate. The application of in silico methods opens new doors for predicting interactions among proteins, and has been attracted a great deal of attention in the last decades. Results Here we present a novelty computational model with the adoption of our proposed Discriminative Vector Machine (DVM) model and a 2-Dimensional Principal Component Analysis (2DPCA) descriptor to identify candidate PPIs only based on protein sequences. To be more specific, a 2DPCA descriptor is employed to capture discriminative feature information from Position-Specific Scoring Matrix (PSSM) of amino acid sequences by the tool of PSI-BLAST. Then, a robust and powerful DVM classifier is employed to infer PPIs. When applied on both gold benchmark datasets of Yeast and H. pylori, our model obtained mean prediction accuracies as high as of 97.06 and 92.89%, respectively, which demonstrates a noticeable improvement than some state-of-the-art methods. Moreover, we constructed Support Vector Machines (SVM) based predictive model and made comparison it with our model on Human benchmark dataset. In addition, to further demonstrate the predictive reliability of our proposed method, we also carried out extensive experiments for identifying cross-species PPIs on five other species datasets. Conclusions All the experimental results indicate that our method is very effective for identifying potential PPIs and could serve as a practical approach to aid bioexperiment in proteomics research.

2021 ◽  
Author(s):  
Tim Brandes ◽  
Stefano Scarso ◽  
Christian Koch ◽  
Stephan Staudacher

Abstract A numerical experiment of intentionally reduced complexity is used to demonstrate a method to classify flight missions in terms of the operational severity experienced by the engines. In this proof of concept, the general term of severity is limited to the erosion of the core flow compressor blade and vane leading edges. A Monte Carlo simulation of varying operational conditions generates a required database of 10000 flight missions. Each flight is sampled at a rate of 1 Hz. Eleven measurable or synthesizable physical parameters are deemed to be relevant for the problem. They are reduced to seven universal non-dimensional groups which are averaged for each flight. The application of principal component analysis allows a further reduction to three principal components. They are used to run a support-vector machine model in order to classify the flights. A linear kernel function is chosen for the support-vector machine due to its low computation time compared to other functions. The robustness of the classification approach against measurement precision error is evaluated. In addition, a minimum number of flights required for training and a sensible number of severity classes are documented. Furthermore, the importance to train the algorithms on a sufficiently wide range of operations is presented.


Symmetry ◽  
2019 ◽  
Vol 11 (3) ◽  
pp. 380 ◽  
Author(s):  
Kai Ye

When identifying the key features of the network intrusion signal based on the GA-RBF algorithm (using the genetic algorithm to optimize the radial basis) to identify the key features of the network intrusion signal, the pre-processing process of the network intrusion signal data is neglected, resulting in an increase in network signal data noise, reducing the accuracy of key feature recognition. Therefore, a key feature recognition algorithm for network intrusion signals based on neural network and support vector machine is proposed. The principal component neural network (PCNN) is used to extract the characteristics of the network intrusion signal and the support vector machine multi-classifier is constructed. The feature extraction result is input into the support vector machine classifier. Combined with PCNN and SVM (Support Vector Machine) algorithms, the key features of network intrusion signals are identified. The experimental results show that the algorithm has the advantages of high precision, low false positive rate and the recognition time of key features of R2L (it is a common way of network intrusion attack) data set is only 3.18 ms.


2020 ◽  
Vol 16 (1) ◽  
pp. 155014772090363 ◽  
Author(s):  
Ying Liu ◽  
Lihua Huang

Recently, support vector machines, a supervised learning algorithm, have been widely used in the scope of credit risk management. However, noise may increase the complexity of the algorithm building and destroy the performance of classifier. In our work, we propose an ensemble support vector machine model to solve the risk assessment of supply chain finance, combined with reducing noises method. The main characteristics of this approach include that (1) a novel noise filtering scheme that avoids the noisy examples based on fuzzy clustering and principal component analysis algorithm is proposed to remove both attribute noise and class noise to achieve an optimal clean set, and (2) support vector machine classifiers, based on the improved particle swarm optimization algorithm, are seen as component classifiers. Then, we obtained the final classification results by combining finally individual prediction through AdaBoosting algorithm on the new sample set. Some experiments are applied on supply chain financial analysis of China’s listed companies. Results indicate that the credit assessment accuracy can be increased by applying this approach.


2012 ◽  
Vol 2012 ◽  
pp. 1-23
Author(s):  
J. M. Urquiza ◽  
I. Rojas ◽  
H. Pomares ◽  
J. Herrera ◽  
J. P. Florido ◽  
...  

Protein-protein interactions (PPIs) play a crucial role in cellular processes. In the present work, a new approach is proposed to construct a PPI predictor training a support vector machine model through a mutual information filter-wrapper parallel feature selection algorithm and an iterative and hierarchical clustering to select a relevance negative training set. By means of a selected suboptimum set of features, the constructed support vector machine model is able to classify PPIs with high accuracy in any positive and negative datasets.


2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Wenzheng Ma ◽  
Yi Cao ◽  
Wenzheng Bao ◽  
Bin Yang ◽  
Yuehui Chen

The interactions between proteins play important roles in several organisms, and such issue can be involved in almost all activities in the cell. The research of protein-protein interactions (PPIs) can make a huge contribution to the prevention and treatment of diseases. Currently, many prediction methods based on machine learning have been proposed to predict PPIs. In this article, we propose a novel method ACT-SVM that can effectively predict PPIs. The ACT-SVM model maps protein sequences to digital features, performs feature extraction twice on the protein sequence to obtain vector A and descriptor CT, and combines them into a vector. Then, the feature vectors of the protein pair are merged as the input of the support vector machine (SVM) classifier. We utilize nonredundant H. pylori and human dataset to verify the prediction performance of our method. Finally, the proposed method has a prediction accuracy of 0.727897 for H. pylori data and a prediction accuracy of 0.838799 for human dataset. The results demonstrate that this method can be called a stable and reliable prediction model of PPIs.


2011 ◽  
Vol 7 (S285) ◽  
pp. 344-346
Author(s):  
Dae-Won Kim ◽  
Pavlos Protopapas ◽  
Markos Trichas ◽  
Michael Rowan-Robinson ◽  
Roni Khardon ◽  
...  

AbstractWe present 663 QSO candidates in the Large Magellanic Cloud (LMC) that were selected using multiple diagnostics. We started with a set of 2,566 QSO candidates selected using the methodology presented in our previous work based on time variability of the MACHO LMC light curves. We then obtained additional information for the candidates by cross-matching them with the Spitzer SAGE, the 2MASS, the Chandra, the XMM, and an LMC UBVI catalogues. Using that information, we specified diagnostic features based on mid-IR colours, photometric redshifts using SED template fitting, and X-ray luminosities, in order to discriminate more high-confidence QSO candidates in the absence of spectral information. We then trained a one-class Support Vector Machine model using those diagnostics features. We applied the trained model to the original candidates, and finally selected 663 high-confidence QSO candidates. We cross-matched those 663 QSO candidates with 152 newly-confirmed QSOs and 275 non-QSOs in the LMC fields, and found that the false positive rate was less than 1%.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Hao Wang ◽  
Yijie Ding ◽  
Jijun Tang ◽  
Quan Zou ◽  
Fei Guo

Abstract Background Biological functions of biomolecules rely on the cellular compartments where they are located in cells. Importantly, RNAs are assigned in specific locations of a cell, enabling the cell to implement diverse biochemical processes in the way of concurrency. However, lots of existing RNA subcellular localization classifiers only solve the problem of single-label classification. It is of great practical significance to expand RNA subcellular localization into multi-label classification problem. Results In this study, we extract multi-label classification datasets about RNA-associated subcellular localizations on various types of RNAs, and then construct subcellular localization datasets on four RNA categories. In order to study Homo sapiens, we further establish human RNA subcellular localization datasets. Furthermore, we utilize different nucleotide property composition models to extract effective features to adequately represent the important information of nucleotide sequences. In the most critical part, we achieve a major challenge that is to fuse the multivariate information through multiple kernel learning based on Hilbert-Schmidt independence criterion. The optimal combined kernel can be put into an integration support vector machine model for identifying multi-label RNA subcellular localizations. Our method obtained excellent results of 0.703, 0.757, 0.787, and 0.800, respectively on four RNA data sets on average precision. Conclusion To be specific, our novel method performs outstanding rather than other prediction tools on novel benchmark datasets. Moreover, we establish user-friendly web server with the implementation of our method.


Electronics ◽  
2020 ◽  
Vol 9 (9) ◽  
pp. 1531
Author(s):  
Shanshan Huang ◽  
Yikun Yang ◽  
Xin Jin ◽  
Ya Zhang ◽  
Qian Jiang ◽  
...  

Multi-sensor image fusion is used to combine the complementary information of source images from the multiple sensors. Recently, conventional image fusion schemes based on signal processing techniques have been studied extensively, and machine learning-based techniques have been introduced into image fusion because of the prominent advantages. In this work, a new multi-sensor image fusion method based on the support vector machine and principal component analysis is proposed. First, the key features of the source images are extracted by combining the sliding window technique and five effective evaluation indicators. Second, a trained support vector machine model is used to extract the focus region and the non-focus region of the source images according to the extracted image features, the fusion decision is therefore obtained for each source image. Then, the consistency verification operation is used to absorb a single singular point in the decisions of the trained classifier. Finally, a novel method based on principal component analysis and the multi-scale sliding window is proposed to handle the disputed areas in the fusion decision pair. Experiments are performed to verify the performance of the new combined method.


Sign in / Sign up

Export Citation Format

Share Document