scholarly journals FWHT-RF: A Novel Computational Approach to Predict Plant Protein-Protein Interactions via an Ensemble Learning Method

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Jie Pan ◽  
Li-Ping Li ◽  
Chang-Qing Yu ◽  
Zhu-Hong You ◽  
Zhong-Hao Ren ◽  
...  

Protein-protein interactions (PPIs) in plants are crucial for understanding biological processes. Although high-throughput techniques produced valuable information to identify PPIs in plants, they are usually expensive, inefficient, and extremely time-consuming. Hence, there is an urgent need to develop novel computational methods to predict PPIs in plants. In this article, we proposed a novel approach to predict PPIs in plants only using the information of protein sequences. Specifically, plants’ protein sequences are first converted as position-specific scoring matrix (PSSM); then, the fast Walsh–Hadamard transform (FWHT) algorithm is used to extract feature vectors from PSSM to obtain evolutionary information of plant proteins. Lastly, the rotation forest (RF) classifier is trained for prediction and produced a series of evaluation results. In this work, we named this approach FWHT-RF because FWHT and RF are used for feature extraction and classification, respectively. When applying FWHT-RF on three plants’ PPI datasets Maize, Rice, and Arabidopsis thaliana (Arabidopsis), the average accuracies of FWHT-RF using 5-fold cross validation were achieved as high as 95.20%, 94.42%, and 83.85%, respectively. To further evaluate the predictive power of FWHT-RF, we compared it with the state-of-art support vector machine (SVM) and K-nearest neighbor (KNN) classifier in different aspects. The experimental results demonstrated that FWHT-RF can be a useful supplementary method to predict potential PPIs in plants.

2020 ◽  
Author(s):  
aras Masood Ismael ◽  
Ömer F Alçin ◽  
Karmand H Abdalla ◽  
Abdulkadir k sengur

Abstract In this paper, a novel approach that is based on two-stepped majority voting is proposed for efficient EEG based emotion classification. Emotion recognition is important for human-machine interactions. Facial-features and body-gestures based approaches have been generally proposed for emotion recognition. Recently, EEG based approaches become more popular in emotion recognition. In the proposed approach, the raw EEG signals are initially low-pass filtered for noise removal and band-pass filters are used for rhythms extraction. For each rhythm, the best performed EEG channels are determined based on wavelet-based entropy features and fractal dimension based features. The k-nearest neighbor (KNN) classifier is used in classification. The best five EEG channels are used in majority voting for getting the final predictions for each EEG rhythm. In the second majority voting step, the predictions from all rhythms are used to get a final prediction. The DEAP dataset is used in experiments and classification accuracy, sensitivity and specificity are used for performance evaluation metrics. The experiments are carried out to classify the emotions into two binary classes such as high valence (HV) vs low valence (LV) and high arousal (HA) vs low arousal (LA). The experiments show that 86.3% HV vs LV discrimination accuracy and 85.0% HA vs LA discrimination accuracy is obtained. The obtained results are also compared with some of the existing methods. The comparisons show that the proposed method has potential in the use of EEG based emotion classification.


2015 ◽  
Vol 11 (A29A) ◽  
pp. 209-209
Author(s):  
Bo Han ◽  
Hongpeng Ding ◽  
Yanxia Zhang ◽  
Yongheng Zhao

AbstractCatastrophic failure is an unsolved problem existing in the most photometric redshift estimation approaches for a long history. In this study, we propose a novel approach by integration of k-nearest-neighbor (KNN) and support vector machine (SVM) methods together. Experiments based on the quasar sample from SDSS show that the fusion approach can significantly mitigate catastrophic failure and improve the accuracy of photometric redshift estimation.


Author(s):  
SHITALA PRASAD ◽  
GYANENDRA K. VERMA ◽  
BHUPESH KUMAR SINGH ◽  
PIYUSH KUMAR

This paper, proposes a novel approach for feature extraction based on the segmentation and morphological alteration of handwritten multi-lingual characters. We explored multi-resolution and multi-directional transforms such as wavelet, curvelet and ridgelet transform to extract classifying features of handwritten multi-lingual images. Evaluating the pros and cons of each multi-resolution algorithm has been discussed and resolved that Curvelet-based features extraction is most promising for multi-lingual character recognition. We have also applied some morphological operation such as thinning and thickening then feature level fusion is performed in order to create robust feature vector for classification. The classification is performed with K-nearest neighbor (K-NN) and support vector machine (SVM) classifier with their relative performance. We experiment with our in-house dataset, compiled in our lab by more than 50 personnel.


Author(s):  
Piyali Chatterjee ◽  
Subhadip Basu ◽  
Mahantapas Kundu ◽  
Mita Nasipuri ◽  
Dariusz Plewczynski

AbstractProtein-protein interactions (PPI) control most of the biological processes in a living cell. In order to fully understand protein functions, a knowledge of protein-protein interactions is necessary. Prediction of PPI is challenging, especially when the three-dimensional structure of interacting partners is not known. Recently, a novel prediction method was proposed by exploiting physical interactions of constituent domains. We propose here a novel knowledge-based prediction method, namely PPI_SVM, which predicts interactions between two protein sequences by exploiting their domain information. We trained a two-class support vector machine on the benchmarking set of pairs of interacting proteins extracted from the Database of Interacting Proteins (DIP). The method considers all possible combinations of constituent domains between two protein sequences, unlike most of the existing approaches. Moreover, it deals with both single-domain proteins and multi domain proteins; therefore it can be applied to the whole proteome in high-throughput studies. Our machine learning classifier, following a brainstorming approach, achieves accuracy of 86%, with specificity of 95%, and sensitivity of 75%, which are better results than most previous methods that sacrifice recall values in order to boost the overall precision. Our method has on average better sensitivity combined with good selectivity on the benchmarking dataset. The PPI_SVM source code, train/test datasets and supplementary files are available freely in the public domain at: http://code.google.com/p/cmater-bioinfo/.


2020 ◽  
Vol 22 (10) ◽  
pp. 705-715 ◽  
Author(s):  
Min Liu ◽  
Guangzhong Liu

Background: Citrullination, an important post-translational modification of proteins, alters the molecular weight and electrostatic charge of the protein side chains. Citrulline, in protein sequences, is catalyzed by a class of Peptidyl Arginine Deiminases (PADs). Dependent on Ca2+, PADs include five isozymes: PAD 1, 2, 3, 4/5, and 6. Citrullinated proteins have been identified in many biological and pathological processes. Among them, abnormal protein citrullination modification can lead to serious human diseases, including multiple sclerosis and rheumatoid arthritis. Objective: It is important to identify the citrullination sites in protein sequences. The accurate identification of citrullination sites may contribute to the studies on the molecular functions and pathological mechanisms of related diseases. Methods and Results: In this study, after an encoded training set (containing 116 positive and 348 negative samples) into the feature matrix, the mRMR method was used to analyze the 941- dimensional features which were sorted on the basis of their importance. Then, a predictive model based on a self-normalizing neural network (SNN) was proposed to predict the citrullination sites in protein sequences. Incremental Feature Selection (IFS) and 10-fold cross-validation were used as the model evaluation method. Three classical machine learning models, namely random forest, support vector machine, and k-nearest neighbor algorithm, were selected and compared with the SNN prediction model using the same evaluation methods. SNN may be the best tool for citrullination site prediction. The maximum value of the Matthews Correlation Coefficient (MCC) reached 0.672404 on the basis of the optimal classifier of SNN. Conclusion: The results showed that the SNN-based prediction methods performed better when evaluated by some common metrics, such as MCC, accuracy, and F1-Measure. SNN prediction model also achieved a better balance in the classification and recognition of positive and negative samples from datasets compared with the other three models.


2019 ◽  
Vol 15 ◽  
pp. 117693431987992 ◽  
Author(s):  
Ji-Yong An ◽  
Yong Zhou ◽  
Yu-Jun Zhao ◽  
Zi-Ji Yan

Background: Increasing evidence has indicated that protein-protein interactions (PPIs) play important roles in various aspects of the structural and functional organization of a cell. Thus, continuing to uncover potential PPIs is an important topic in the biomedical domain. Although various feature extraction methods with machine learning approaches have enhanced the prediction of PPIs. There remains room for improvement by developing novel and effective feature extraction methods and classifier approaches to identify PPIs. Method: In this study, we proposed a sequence-based feature extraction method called LCPSSMMF, which combined local coding position-specific scoring matrix (PSSM) with multifeatures fusion. First, we used a novel local coding method based on PSSM to build a new PSSM (CPSSM); the advantage of this method is that it incorporated global and local feature extraction, which can account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. Second, we adopted 2 different feature extraction methods (Local Average Group [LAG] and Bigram Probability [BP]) to capture multiple key feature information by employing the evolutionary information embedded in the CPSSM matrix. Finally, feature vectors were acquired by using multifeatures fusion method. Result: To evaluate the performance of the proposed feature extraction approach, we employed support vector machine (SVM) as a prediction classifier and applied this method to yeast and human PPI datasets. The prediction accuracies of LCPSSMMF were 93.43% and 90.41% on the yeast and human datasets, respectively. Moreover, we also compared the proposed method with the previous sequence-based approaches on the yeast datasets by using the same SVM classifier. The experimental results indicated that the performance of LCPSSMMF significantly exceeded that of several other state-of-the-art methods. It is proven that the LCPSSMMF approach can capture more local and global discriminatory information than almost all previous methods and can function remarkably well in identifying PPIs. To facilitate extensive research in future proteomics studies, we developed a LCPSSMMFSVM server, which is freely available for academic use at http://219.219.62.123:8888/LCPSSMMFSVM .


Author(s):  
Glenda Anak Kaya ◽  
Nor Ashikin Mohamad Kamal

<span lang="EN-US">As the number of protein sequences in the database is increasing, effective and efficient techniques are needed to make these data meaningful.  These protein sequences contain redundant and irrelevant features that cause lower classification accuracy and increase the running time of the computational algorithm. In this paper, we select the best features using Minimum Redundancy Maximum Relevance(mRMR) and Correlation-based feature selection(CFS) methods. Two datasets of human membrane protein are used, S1 and S2.  After the features have been selected by mRMR and CFS, K-Nearest Neighbor(KNN) and Support Vector Machine(SVM) classifiers are used to classify these membrane proteins. The performance of these techniques is measured using accuracy, specificity and sensitivity. and F-measure. The proposed algorithm managed to achieve 76% accuracy for S1 and 73% accuracy for S2. Finally, our proposed methods present competitive results when compared with the previous works on membrane protein classification</span><span>.</span>


Chronic Kidney Disease is a very dangerous health problem that has been spreading as well as growing due to diversification in life style such as food habits, changes in the atmosphere, etc. The branch of biosciences has progressive to a bigger extent and has bring out huge amounts of data from Electronic Health Records. The primary aim of this paper is to classify using various Classification techniques like Logistic Regression (LR), K-Nearest Neighbor (KNN) Classifier, Decision Tree Classifier Tree, Random Forest Classifier, Support Vector Machine (SVM), and SGD Classifier. According to the health statistics of India 63538 cases has been registered on chronic renal disorder. Average age of men and women susceptible to renal disorders occurs within the range of 48 to 70 years.


Sign in / Sign up

Export Citation Format

Share Document