Bigram-PGK: phosphoglycerylation prediction using the technique of bigram probabilities of position specific scoring matrix

Abstract Background The biological process known as post-translational modification (PTM) is a condition whereby proteomes are modified that affects normal cell biology, and hence the pathogenesis. A number of PTMs have been discovered in the recent years and lysine phosphoglycerylation is one of the fairly recent developments. Even with a large number of proteins being sequenced in the post-genomic era, the identification of phosphoglycerylation remains a big challenge due to factors such as cost, time consumption and inefficiency involved in the experimental efforts. To overcome this issue, computational techniques have emerged to accurately identify phosphoglycerylated lysine residues. However, the computational techniques proposed so far hold limitations to correctly predict this covalent modification. Results We propose a new predictor in this paper called Bigram-PGK which uses evolutionary information of amino acids to try and predict phosphoglycerylated sites. The benchmark dataset which contains experimentally labelled sites is employed for this purpose and profile bigram occurrences is calculated from position specific scoring matrices of amino acids in the protein sequences. The statistical measures of this work, such as sensitivity, specificity, precision, accuracy, Mathews correlation coefficient and area under ROC curve have been reported to be 0.9642, 0.8973, 0.8253, 0.9193, 0.8330, 0.9306, respectively. Conclusions The proposed predictor, based on the feature of evolutionary information and support vector machine classifier, has shown great potential to effectively predict phosphoglycerylated and non-phosphoglycerylated lysine residues when compared against the existing predictors. The data and software of this work can be acquired from https://github.com/abelavit/Bigram-PGK.

Download Full-text

RAM-PGK: Prediction of Lysine Phosphoglycerylation Based on Residue Adjacency Matrix

Genes ◽

10.3390/genes11121524 ◽

2020 ◽

Vol 11 (12) ◽

pp. 1524

Author(s):

Abel Avitesh Chandra ◽

Alok Sharma ◽

Abdollah Dehzangi ◽

Tatushiko Tsunoda

Keyword(s):

Adjacency Matrix ◽

Cell Biology ◽

Performance Metrics ◽

Support Vector ◽

Computational Techniques ◽

Identification System ◽

Amino Acid Residues ◽

Post Translational Modification ◽

Lysine Residues ◽

Recent Developments

Background: Post-translational modification (PTM) is a biological process that is associated with the modification of proteome, which results in the alteration of normal cell biology and pathogenesis. There have been numerous PTM reports in recent years, out of which, lysine phosphoglycerylation has emerged as one of the recent developments. The traditional methods of identifying phosphoglycerylated residues, which are experimental procedures such as mass spectrometry, have shown to be time-consuming and cost-inefficient, despite the abundance of proteins being sequenced in this post-genomic era. Due to these drawbacks, computational techniques are being sought to establish an effective identification system of phosphoglycerylated lysine residues. The development of a predictor for phosphoglycerylation prediction is not a first, but it is necessary as the latest predictor falls short in adequately detecting phosphoglycerylated and non-phosphoglycerylated lysine residues. Results: In this work, we introduce a new predictor named RAM-PGK, which uses sequence-based information relating to amino acid residues to predict phosphoglycerylated and non-phosphoglycerylated sites. A benchmark dataset was employed for this purpose, which contained experimentally identified phosphoglycerylated and non-phosphoglycerylated lysine residues. From the dataset, we extracted the residue adjacency matrix pertaining to each lysine residue in the protein sequences and converted them into feature vectors, which is used to build the phosphoglycerylation predictor. Conclusion: RAM-PGK, which is based on sequential features and support vector machine classifiers, has shown a noteworthy improvement in terms of performance in comparison to some of the recent prediction methods. The performance metrics of the RAM-PGK predictor are: 0.5741 sensitivity, 0.6436 specificity, 0.0531 precision, 0.6414 accuracy, and 0.0824 Mathews correlation coefficient.

Download Full-text

PhoglyStruct: Prediction of phosphoglycerylated lysine residues using structural properties of amino acids

10.21203/rs.2.1673/v1 ◽

2019 ◽

Author(s):

Abel Chandra ◽

Alok Sharma

Keyword(s):

Amino Acids ◽

Cell Biology ◽

Structural Information ◽

Accessible Surface Area ◽

Post Translational Modification ◽

Torsion Angles ◽

Software Packages ◽

Lysine Residues ◽

Accessible Surface ◽

Sensitivity Specificity

Abstract The biological process known as post-translational modification \(PTM) contributes to diversifying the proteome hence affecting many aspects of normal cell biology and pathogenesis. There have been many recently reported PTMs, but lysine phosphoglycerylation has emerged as the most recent subject of interest. Despite a large number of proteins being sequenced, the experimental method for detection of phosphoglycerylated residues remains an expensive, time-consuming and inefficient endeavor in the post-genomic era. Instead, the computational methods are being proposed for accurately predicting phosphoglycerylated lysines. Though a number of predictors are available, performance in detecting phosphoglycerylated lysine residues is still limited. In this paper, we propose a new predictor called PhoglyStruct that utilizes structural information of amino acids alongside a multilayer perceptron classifier for predicting phosphoglycerylated and non-phosphoglycerylated lysine residues. For the experiment, we located phosphoglycerylated and non-phosphoglycerylated lysines in our employed benchmark. We then derived and integrated properties such as accessible surface area, backbone torsion angles, and local structure conformations. PhoglyStruct showed significant improvement in the ability to detect phosphoglycerylated residues from non-phosphoglycerylated ones when compared to previous predictors. The sensitivity, specificity, accuracy, Mathews correlation coefficient and AUC were 0.8542, 0.7597, 0.7834, 0.5468 and 0.8077, respectively. The data and Matlab/Octave software packages are available at https://github.com/abelavit/PhoglyStruct.

Download Full-text

PupStruct: Prediction of Pupylated Lysine Residues Using Structural Properties of Amino Acids

Genes ◽

10.3390/genes11121431 ◽

2020 ◽

Vol 11 (12) ◽

pp. 1431

Author(s):

Vineet Singh ◽

Alok Sharma ◽

Abdollah Dehzangi ◽

Tatushiko Tsunoda

Keyword(s):

Amino Acids ◽

State Of The Art ◽

Structural Information ◽

Computational Method ◽

Support Vector ◽

Biological Processes ◽

Post Translational Modification ◽

Lysine Residues ◽

Biological Reaction ◽

Statistical Metrics

Post-translational modification (PTM) is a critical biological reaction which adds to the diversification of the proteome. With numerous known modifications being studied, pupylation has gained focus in the scientific community due to its significant role in regulating biological processes. The traditional experimental practice to detect pupylation sites proved to be expensive and requires a lot of time and resources. Thus, there have been many computational predictors developed to challenge this issue. However, performance is still limited. In this study, we propose another computational method, named PupStruct, which uses the structural information of amino acids with a radial basis kernel function Support Vector Machine (SVM) to predict pupylated lysine residues. We compared PupStruct with three state-of-the-art predictors from the literature where PupStruct has validated a significant improvement in performance over them with statistical metrics such as sensitivity (0.9234), specificity (0.9359), accuracy (0.9296), precision (0.9349), and Mathew’s correlation coefficient (0.8616) on a benchmark dataset.

Download Full-text

Incorporating Amino Acids Composition and Functional Domains for Identifying Bacterial Toxin Proteins

BioMed Research International ◽

10.1155/2014/972692 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 2

Author(s):

Min-Gang Su ◽

Chien-Hsun Huang ◽

Tzong-Yi Lee ◽

Yu-Ju Chen ◽

Hsin-Yi Wu

Keyword(s):

Amino Acids ◽

Cell Biology ◽

Predictive Performance ◽

Computational Prediction ◽

Amino Acid Sequences ◽

Bacterial Toxins ◽

Bacterial Toxin ◽

Support Vector ◽

Functional Domain ◽

Domain Information

Aside from pathogenesis, bacterial toxins also have been used for medical purpose such as drugs for cancer and immune diseases. Correctly identifying bacterial toxins and their types (endotoxins and exotoxins) has great impact on the cell biology study and therapy development. However, experimental methods for bacterial toxins identification are time-consuming and labor-intensive, implying an urgent need for computational prediction. Thus, we are motivated to develop a method for computational identification of bacterial toxins based on amino acid sequences and functional domain information. In this study, a nonredundant dataset of 167 bacterial toxins including 77 exotoxins and 90 endotoxins is adopted to learn the predictive model by using support vector machines (SVMs). The cross-validation evaluation shows that the SVM models trained with amino acids and dipeptides composition could yield an accuracy of 96.07% and 92.50%, respectively. For discriminating endotoxins from exotoxins, the SVM models trained with amino acids and dipeptides composition have achieved an accuracy of 95.71% and 92.86%, respectively. After incorporating functional domain information, the predictive performance is further improved. The proposed method has been demonstrated to be able to more effectively identify and classify bacterial toxins than the other two features on independent dataset, which may aid in bacterial biomedical development.

Download Full-text

Human Platelet Protein Ubiquitylation and Changes following GPVI Activation

Thrombosis and Haemostasis ◽

10.1055/s-0038-1676344 ◽

2018 ◽

Vol 119 (01) ◽

pp. 104-116 ◽

Cited By ~ 8

Author(s):

Amanda Unsworth ◽

Izabela Bombik ◽

Adan Pinto-Fernandez ◽

Joanna McGouran ◽

Rebecca Konietzny ◽

...

Keyword(s):

Human Platelet ◽

Proteasome Inhibition ◽

Signal Propagation ◽

Human Platelets ◽

Covalent Modification ◽

Therapeutic Interventions ◽

Post Translational Modification ◽

Target Proteins ◽

Platelet Protein ◽

Lysine Residues

AbstractPlatelet activators stimulate post-translational modification of signalling proteins to change their activity or their molecular interactions leading to signal propagation. One covalent modification is attachment of the small protein ubiquitin to lysine residues in target proteins. Modification by ubiquitin can either target proteins for degradation by the proteasome or act as a scaffold for other proteins. Pharmacological inhibition of deubiquitylases or the proteasome inhibition of platelet activation by collagen, demonstrating a role for ubiquitylation, but relatively few substrates for ubiquitin have been identified and the molecular basis of inhibition is not established. Here, we report the ubiquitome of human platelets and changes in ubiquitylated proteins following stimulation by collagen-related peptide (CRP-XL). Using platelets from six individuals over three independent experiments, we identified 1,634 ubiquitylated peptides derived from 691 proteins, revealing extensive ubiquitylation in resting platelets. Note that 925 of these peptides show an increase of more than twofold following stimulation with CRP-XL. Multiple sites of ubiquitylation were identified on several proteins including Syk, filamin and integrin heterodimer sub-units. This work reveals extensive protein ubiquitylation during activation of human platelets and opens the possibility of novel therapeutic interventions targeting the ubiquitin machinery.

Download Full-text

predPhogly-Site: Predicting phosphoglycerylation sites by incorporating probabilistic sequence-coupling information into PseAAC and addressing data imbalance

PLoS ONE ◽

10.1371/journal.pone.0249396 ◽

2021 ◽

Vol 16 (4) ◽

pp. e0249396

Author(s):

Sabit Ahmed ◽

Afrida Rahman ◽

Md. Al Mehedi Hasan ◽

Md Khaled Ben Islam ◽

Julia Rahman ◽

...

Keyword(s):

Cell Biology ◽

Cross Validation ◽

Glycolytic Enzyme ◽

Covalent Modification ◽

Training Dataset ◽

Variable Cost ◽

Amino Acid Residues ◽

Web Interface ◽

Post Translational Modification ◽

Fold Cross Validation

Post-translational modification (PTM) involves covalent modification after the biosynthesis process and plays an essential role in the study of cell biology. Lysine phosphoglycerylation, a newly discovered reversible type of PTM that affects glycolytic enzyme activities, and is responsible for a wide variety of diseases, such as heart failure, arthritis, and degeneration of the nervous system. Our goal is to computationally characterize potential phosphoglycerylation sites to understand the functionality and causality more accurately. In this study, a novel computational tool, referred to as predPhogly-Site, has been developed to predict phosphoglycerylation sites in the protein. It has effectively utilized the probabilistic sequence-coupling information among the nearby amino acid residues of phosphoglycerylation sites along with a variable cost adjustment for the skewed training dataset to enhance the prediction characteristics. It has achieved around 99% accuracy with more than 0.96 MCC and 0.97 AUC in both 10-fold cross-validation and independent test. Even, the standard deviation in 10-fold cross-validation is almost negligible. This performance indicates that predPhogly-Site remarkably outperformed the existing prediction tools and can be used as a promising predictor, preferably with its web interface at http://103.99.176.239/predPhogly-Site.

Download Full-text

Review of Progress in Predicting Protein Methylation Sites

Current Organic Chemistry ◽

10.2174/1385272823666190723141347 ◽

2019 ◽

Vol 23 (15) ◽

pp. 1663-1670 ◽

Cited By ~ 1

Author(s):

Chunyan Ao ◽

Shunshan Jin ◽

Yuan Lin ◽

Quan Zou

Keyword(s):

Gene Expression ◽

Signal Transduction ◽

Transcriptional Activity ◽

Regulation Of Gene Expression ◽

Protein Methylation ◽

Biological Processes ◽

Post Translational Modification ◽

Regulatory Enzymes ◽

Lysine Residues ◽

Arginine Residues

Protein methylation is an important and reversible post-translational modification that regulates many biological processes in cells. It occurs mainly on lysine and arginine residues and involves many important biological processes, including transcriptional activity, signal transduction, and the regulation of gene expression. Protein methylation and its regulatory enzymes are related to a variety of human diseases, so improved identification of methylation sites is useful for designing drugs for a variety of related diseases. In this review, we systematically summarize and analyze the tools used for the prediction of protein methylation sites on arginine and lysine residues over the last decade.

Download Full-text

Based on 9-gram Coding of Amino Acids Predicting Proteases Types by Using Support Vector Machine

Recent Patents on Computer Science ◽

10.2174/2213275911205030220 ◽

2012 ◽

Vol 5 (3) ◽

pp. 220-225 ◽

Cited By ~ 1

Author(s):

Cunshuan Xu ◽

Ruijia Shi

Keyword(s):

Amino Acids ◽

Support Vector Machine ◽

Support Vector

Download Full-text

An efficient computational method for predicting drug-target interactions using weighted extreme learning machine and speed up robot features

BioData Mining ◽

10.1186/s13040-021-00242-1 ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Ji-Yong An ◽

Fan-Rong Meng ◽

Zi-Ji Yan

Keyword(s):

Ion Channel ◽

Extreme Learning Machine ◽

Nuclear Receptor ◽

Drug Target ◽

Computational Method ◽

Evolutionary Information ◽

Support Vector ◽

Weighted Extreme Learning Machine ◽

Speed Up ◽

Learning Machine

Abstract Background Prediction of novel Drug–Target interactions (DTIs) plays an important role in discovering new drug candidates and finding new proteins to target. In consideration of the time-consuming and expensive of experimental methods. Therefore, it is a challenging task that how to develop efficient computational approaches for the accurate predicting potential associations between drug and target. Results In the paper, we proposed a novel computational method called WELM-SURF based on drug fingerprints and protein evolutionary information for identifying DTIs. More specifically, for exploiting protein sequence feature, Position Specific Scoring Matrix (PSSM) is applied to capturing protein evolutionary information and Speed up robot features (SURF) is employed to extract sequence key feature from PSSM. For drug fingerprints, the chemical structure of molecular substructure fingerprints was used to represent drug as feature vector. Take account of the advantage that the Weighted Extreme Learning Machine (WELM) has short training time, good generalization ability, and most importantly ability to efficiently execute classification by optimizing the loss function of weight matrix. Therefore, the WELM classifier is used to carry out classification based on extracted features for predicting DTIs. The performance of the WELM-SURF model was evaluated by experimental validations on enzyme, ion channel, GPCRs and nuclear receptor datasets by using fivefold cross-validation test. The WELM-SURF obtained average accuracies of 93.54, 90.58, 85.43 and 77.45% on enzyme, ion channels, GPCRs and nuclear receptor dataset respectively. We also compared our performance with the Extreme Learning Machine (ELM), the state-of-the-art Support Vector Machine (SVM) on enzyme and ion channels dataset and other exiting methods on four datasets. By comparing with experimental results, the performance of WELM-SURF is significantly better than that of ELM, SVM and other previous methods in the domain. Conclusion The results demonstrated that the proposed WELM-SURF model is competent for predicting DTIs with high accuracy and robustness. It is anticipated that the WELM-SURF method is a useful computational tool to facilitate widely bioinformatics studies related to DTIs prediction.

Download Full-text

A Novel Disease Detection and Classification Method Using Improved Fusion Random Weight Support Vector Machine

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2021.3903 ◽

2021 ◽

Vol 11 (12) ◽

pp. 2976-2986

Author(s):

M. Usha Rani ◽

N. Saravana Selvam

Keyword(s):

Support Vector Machine ◽

Human Health ◽

Health Informatics ◽

Research Work ◽

Noise Removal ◽

Support Vector ◽

Computational Techniques ◽

Disease Detection ◽

Successful Implementation ◽

Leaf Disease

Health informatics is one of the main branch of engineering which provides a solution to a variety of problems like delayed, missed or incorrect diagnoses with the help of computational techniques. With the help of technologies such as bio-computing, health informatics, the disaster impacts on both human health and biological factors can be reduced to a large extend. Using these computational technologies, the country’s economy can also get boosted up and due to increased disease-causing pathogens, which directly impact the human health system. In this research work, a different type of sugarcane disease is detected and classified because manual identification is difficult and time-consuming. So, the farmers couldn’t find a better solution, than on the whole, they go for stubble burning, which is an alarming issue both on human and environmental wellness. The burning of bagasse causes bagassois, an interstitial lung disease that affects the tissues present in the lung through the air sacs. So, this sugarcane disease detection needs to be done early to avoid various health and environmental issues. The proposed work consists of the detection of four types of sugarcane leaf disease directly from the field. The sequence of methods is capturing images with WSN nodes, pre-processing with image enhancement and noise removal (IENR), segmentation with Fuzzy membership function and clustering (FMFC), feature extraction using Gray Level Co-occurrence Matrix Vector (GLCMV) and classification using Support Vector Machine (SVM). With the help of the effective proposed method, the highest parameters like precision, accuracy, sensitivity, and specificity for sugarcane leaf disease have been obtained. Based on the successful implementation process, the accuracy stated for the four sugarcane diseases along with the execution time is given below as Smut disease (87.12, 1.01 sec), Rust disease (90.23, 1.02 sec), Grassy Shoot disease (95.34, 1.047 sec), Red Rot disease (95.51, 1.04 sec).

Download Full-text