PSSM-Suc: Accurately predicting succinylation using position specific scoring matrix into bigram for feature extraction

2017 ◽  
Vol 425 ◽  
pp. 97-102 ◽  
Author(s):  
Abdollah Dehzangi ◽  
Yosvany López ◽  
Sunil Pranit Lal ◽  
Ghazaleh Taherzadeh ◽  
Jacob Michaelson ◽  
...  
Author(s):  
Harsh Saini ◽  
◽  
Gaurav Raicar ◽  
Alok Sharma ◽  
Sunil Lal ◽  
...  

Protein structural class prediction (SCP) is as important task in identifying protein tertiary structure and protein functions. In this study, we propose a feature extraction technique to predict secondary structures. The technique utilizes bigram (of adjacent andk-separated amino acids) information derived from Position Specific Scoring Matrix (PSSM). The technique has shown promising results when evaluated on benchmarked Ding and Dubchak dataset.


2020 ◽  
Vol 15 ◽  
Author(s):  
Li Qian ◽  
Yu Jiang ◽  
Yan YuXuan ◽  
Chen Yuan ◽  
Tan SiQiao

Background: Predicting the protein-ATP binding sites is a highly unbalanced binary classification problem, and higher precision prediction through the machine learning methods is of great significance to the researches on proteins’ functions and the design of drugs. Objective: Most existing researches typically select 17aa as the length of window by experience, and extract features by the Position-specific Scoring Matrix (PSSM), and then construct models predicting with SVC. However, the independent prediction values obtained in these researches are either over-high(ACC) or lower(MCC), and there is therefore larger improving room in the prediction precision. Methods: This paper utilizes the mutual information, I, to define the window length of 15aa, and the Pseudo Position Specific Scoring Matrix (PsePSSM), which is more fault-tolerance, to extract the features, and then trains multiple 1:1 SVC classifiers to model, and finally performs the simple votings. Results: The prediction results over two protein-ATP binding site datasets, the ATP168 and the ATP227, are totally superior to the independent prediction results obtained in the Reference Feature Extraction Approach. And in our approach, the MCC values are respectively improved, from the range of 0.3110 ~ 0.5360 and the range of 0.3060 ~ 0.553, to 0.7512 and 0.7106. Conclusion: Further, we explain why the PsePSSM approach is more fault-tolerance. This approach has a promising application prospect in the feature-extraction of protein sequences.


2009 ◽  
Vol 02 (01) ◽  
pp. 51-56 ◽  
Author(s):  
Rong-Quan Xiao ◽  
Yan-Zhi Guo ◽  
Yu-Hong Zeng ◽  
Hai-Feng Tan ◽  
Hai-Feng Tan ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document