scholarly journals DeepPhos: prediction of protein phosphorylation sites with deep learning

2019 ◽  
Vol 35 (16) ◽  
pp. 2766-2773 ◽  
Author(s):  
Fenglin Luo ◽  
Minghui Wang ◽  
Yu Liu ◽  
Xing-Ming Zhao ◽  
Ao Li

Abstract Motivation Phosphorylation is the most studied post-translational modification, which is crucial for multiple biological processes. Recently, many efforts have been taken to develop computational predictors for phosphorylation site prediction, but most of them are based on feature selection and discriminative classification. Thus, it is useful to develop a novel and highly accurate predictor that can unveil intricate patterns automatically for protein phosphorylation sites. Results In this study we present DeepPhos, a novel deep learning architecture for prediction of protein phosphorylation. Unlike multi-layer convolutional neural networks, DeepPhos consists of densely connected convolutional neuron network blocks which can capture multiple representations of sequences to make final phosphorylation prediction by intra block concatenation layers and inter block concatenation layers. DeepPhos can also be used for kinase-specific prediction varying from group, family, subfamily and individual kinase level. The experimental results demonstrated that DeepPhos outperforms competitive predictors in general and kinase-specific phosphorylation site prediction. Availability and implementation The source code of DeepPhos is publicly deposited at https://github.com/USTCHIlab/DeepPhos. Supplementary information Supplementary data are available at Bioinformatics online.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Niraj Thapa ◽  
Meenal Chaudhari ◽  
Anthony A. Iannetta ◽  
Clarence White ◽  
Kaushik Roy ◽  
...  

AbstractProtein phosphorylation, which is one of the most important post-translational modifications (PTMs), is involved in regulating myriad cellular processes. Herein, we present a novel deep learning based approach for organism-specific protein phosphorylation site prediction in Chlamydomonas reinhardtii, a model algal phototroph. An ensemble model combining convolutional neural networks and long short-term memory (LSTM) achieves the best performance in predicting phosphorylation sites in C. reinhardtii. Deemed Chlamy-EnPhosSite, the measured best AUC and MCC are 0.90 and 0.64 respectively for a combined dataset of serine (S) and threonine (T) in independent testing higher than those measures for other predictors. When applied to the entire C. reinhardtii proteome (totaling 1,809,304 S and T sites), Chlamy-EnPhosSite yielded 499,411 phosphorylated sites with a cut-off value of 0.5 and 237,949 phosphorylated sites with a cut-off value of 0.7. These predictions were compared to an experimental dataset of phosphosites identified by liquid chromatography-tandem mass spectrometry (LC–MS/MS) in a blinded study and approximately 89.69% of 2,663 C. reinhardtii S and T phosphorylation sites were successfully predicted by Chlamy-EnPhosSite at a probability cut-off of 0.5 and 76.83% of sites were successfully identified at a more stringent 0.7 cut-off. Interestingly, Chlamy-EnPhosSite also successfully predicted experimentally confirmed phosphorylation sites in a protein sequence (e.g., RPS6 S245) which did not appear in the training dataset, highlighting prediction accuracy and the power of leveraging predictions to identify biologically relevant PTM sites. These results demonstrate that our method represents a robust and complementary technique for high-throughput phosphorylation site prediction in C. reinhardtii. It has potential to serve as a useful tool to the community. Chlamy-EnPhosSite will contribute to the understanding of how protein phosphorylation influences various biological processes in this important model microalga.


2021 ◽  
Author(s):  
Niraj Thapa ◽  
Meenal Chaudhari ◽  
Anthony A. Iannetta ◽  
Clarence White ◽  
Kaushik Roy ◽  
...  

Abstract Protein phosphorylation is one of the most important post-translational modifications (PTMs) and involved in myriad cellular processes. Although many non-organism-specific computational phosphorylation site prediction tools and a few tools for organism-specific phosphorylation site prediction exist, none are currently available for Chlamydomonas reinhardtii. Herein, we present a novel deep learning (DL) based approach for organism-specific protein phosphorylation site prediction in Chlamydomonas reinhardtii, a model algal phototroph. Our novel approach called Chlamy-EnPhosSite (based on ensemble approach combining convolutional neural networks (CNN) and long short-term memory LSTM) produces AUC and MCC of 0.90 and 0.64 respectively for a combined dataset of serine (S) and threonine (T) in independent testing. When applied to the entire C. reinhardtii proteome (totaling 1,809,304 S and T sites), Chlamy-EnPhosSite yielded 499,411 phosphorylated sites with a cut-off value of 0.5 and 237,949 phosphorylated sites with a cut-off value of 0.7. These predictions were compared to an experimental dataset of phosphosites identified by liquid chromatography-tandem mass spectrometry (LC-MS/MS) in a blinded study and approximately 90% of 2,663 C. reinhardtii S and T phosphorylation sites were successfully predicted by Chlamy-EnPhosSite at a probability cut-off of 0.5 and 77% of sites were successfully identified at a more stringent 0.7 cut-off. Interestingly, Chlamy-EnPhosSite also successfully predicted experimentally confirmed phosphorylation sites in a protein sequence (e.g., RPS6 S245) which did not appear in the training dataset, highlighting prediction accuracy and the power of leveraging predictions to identify biologically relevant PTM sites. These results demonstrate that our method represents a robust and complementary technique for high-throughput phosphorylation site prediction in C. reinhardtii. It has potential to serve as a useful tool to the community. Chlamy-EnPhosSite will contribute to the understanding of how protein phosphorylation influences various biological processes in this important model microalga.


2019 ◽  
Author(s):  
Thanh Hai Dang ◽  
Quang Thinh Trac ◽  
Huy Kinh Phan ◽  
Manh Cuong Nguyen ◽  
Quynh Trang Pham Thi

AbstractMotivationPhosphorylation, which is catalyzed by kinase proteins, is in the top two most common and widely studied types of known essential post-translation protein modification (PTM). Phosphorylation is known to regulate most cellular processes such as protein synthesis, cell division, signal transduction, cell growth, development and aging. Various phosphorylation site prediction models have been developed, which can be broadly categorized as being kinase-specific or non-kinase specific (general). Unlike the latter, the former requires a large enough number of experimentally known phosphorylation sites annotated with a given kinase for training the model, which is not the case in reality: less than 3% of the phosphorylation sites known to date have been annotated with a responsible kinase. To date, there are a few non-kinase specific phosphorylation site prediction models proposed.ResultsThis paper proposes SKIPHOS, a non-kinase specific phosphorylation site prediction model based on random forests on top of a continuous distributed representation of amino acids. Experimental results on the benchmark dataset and the independent test set demonstrate that SKIPHOS compares favorably to recent state-of-the-art related methods for three phosphorylation residues. Although being trained on phosphorylation sites in mamals, SKIPHOS can yield predictions for Y residues better than PHOSFER, a recently proposed plants-specific phosphorylation prediction model.Availability and ImplementationSKIPHOS Web Server is freely available for non-commercial use at http://fit.uet.vnu.edu.vn/SKIPHOS or http://112.137.130.46:[email protected] informationSupplementary data are available at Bioinformatics online.


Author(s):  
Shaofeng Lin ◽  
Chenwei Wang ◽  
Jiaqi Zhou ◽  
Ying Shi ◽  
Chen Ruan ◽  
...  

Abstract As an important post-translational modification (PTM), protein phosphorylation is involved in the regulation of almost all of biological processes in eukaryotes. Due to the rapid progress in mass spectrometry-based phosphoproteomics, a large number of phosphorylation sites (p-sites) have been characterized but remain to be curated. Here, we briefly summarized the current progresses in the development of data resources for the collection, curation, integration and annotation of p-sites in eukaryotic proteins. Also, we designed the eukaryotic phosphorylation site database (EPSD), which contained 1 616 804 experimentally identified p-sites in 209 326 phosphoproteins from 68 eukaryotic species. In EPSD, we not only collected 1 451 629 newly identified p-sites from high-throughput (HTP) phosphoproteomic studies, but also integrated known p-sites from 13 additional databases. Moreover, we carefully annotated the phosphoproteins and p-sites of eight model organisms by integrating the knowledge from 100 additional resources that covered 15 aspects, including phosphorylation regulator, genetic variation and mutation, functional annotation, structural annotation, physicochemical property, functional domain, disease-associated information, protein-protein interaction, drug-target relation, orthologous information, biological pathway, transcriptional regulator, mRNA expression, protein expression/proteomics and subcellular localization. We anticipate that the EPSD can serve as a useful resource for further analysis of eukaryotic phosphorylation. With a data volume of 14.1 GB, EPSD is free for all users at http://epsd.biocuckoo.cn/.


2014 ◽  
Author(s):  
◽  
Qiuming Yao

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] Protein posttranslational modification (PTM) occurs broadly after or during protein biosynthesis, to assist folding or activate function during the protein lifetime. Among all types of PTMs, protein phosphorylation is widely recognized as the most pervasive, enzyme-catalyzed post-translational modification in eukaryotes. In particular, plants have higher magnitude of this signaling mechanism in terms of the protein kinase frequency within the genome compared to other eukaryotes. Phosphorylation site mapping using high-resolution mass spectrometry has grown exponentially. In Arabidopsis alone there are thousands of experimentally-determined phosphorylation sites. Likewise, other types of post translational modification data are rapidly increasing too. Acetylation proteome is another big data set in PTM kingdom. To provide an easy access of these modification events in a user-intuitive format we have developed P3DB, The Plant Protein Phosphorylation Database (p3db.org). This database is a repository for plant protein phosphorylation site data. These data can be queried for a protein-of-interest using an integrated BLAST function to search for similar sequences with known phosphorylation sites among the multiple plants currently investigated. Thus, this resource can help identify functionally-conserved phosphorylation sites in plants using a multi-system approach. Centralized by these phosphorylation data, multiple related data and annotations are provided, including protein-protein interaction (PPI), gene ontology, protein tertiary structures, orthologous sequences, kinase/phosphatase classification and Kinase Client Assay (KiC Assay) data. P3DB thus is not only a repository, but also a context provider for studying phosphorylation events. In addition, P3DB incorporates multiple network viewers for the above features, such as PPI network, kinase-substrate network, phosphatase-substrate network, and domain co-occurrence network to help study phosphorylation from a systems point of view. Furthermore, P3DB reflects a community-based design through which users can share data sets and automate data depository processes for publication purposes. Since P3DB is a comprehensive, systematic, and interactive platform for phosphoproteomics research, many data analyses can be done based on it. For example, the disorder analysis and the sequence conservation can be done based on the P3DB datasets. Many researchers downloaded and did some meaningful analysis based on P3DB infrastructure. Although with the development of the high-resolution mass spectrometry protein phosphorylation sites can be reliably identified, the experimental approach is time-consuming and resource-dependent. Furthermore, it is unlikely that an experimental approach could catalog an entire phosphoproteome. Computational prediction of phosphorylation sites provides an efficient and flexible way to reveal potential phosphorylation sites, facilitate experimental phosphorylation site identification and provide hypotheses in experimental design. Musite is a powerful tool that we developed to predict phosphorylation sites based solely on protein sequence. Musite integrates data preprocessing, feature extraction, machine-learning method, and prediction models into one comprehensive tool. Musite (http://musite.net) can be extended to all types of post translational modification study, as long as the dataset contains sufficient modification sites. To further improve the performance of Musite, a generalized motif tree applying fuzzy logic is introduced to compensate the machine learning based prediction. On one hand, using a tree based approach and fuzzy variables help to interpret the final rules, in order to help biologists to obtain the significant patterns. On the other hand, its extracted rule sets essentially generalize the motifs and reveal more information. It can be paired with traditional classification method and provide better interpretation, pre-filtering and analyzing power. Comparing to traditional motif extraction, the fuzzy motif decision tree is able to borrow more information from the observations and thus it may extract more novel motifs or more comprehensive patterns. It can be applied on kinase specific phosphorylated peptides to achieve more insights of the phosphorylation events. A comprehensive database (P3DB), a well-developed prediction tool (Musite), and a generalized motif constructor (Fuzzy Motif Tree) combined enable researchers to investigate the phosphorylation and other posttranslational modification events more thoroughly and thus to reveal more underlying biological significance by applying these computational resources.


2020 ◽  
Vol 21 (21) ◽  
pp. 7891
Author(s):  
Chi-Wei Chen ◽  
Lan-Ying Huang ◽  
Chia-Feng Liao ◽  
Kai-Po Chang ◽  
Yen-Wei Chu

Protein phosphorylation is one of the most important post-translational modifications, and many biological processes are related to phosphorylation, such as DNA repair, transcriptional regulation and signal transduction and, therefore, abnormal regulation of phosphorylation usually causes diseases. If we can accurately predict human phosphorylation sites, this could help to solve human diseases. Therefore, we developed a kinase-specific phosphorylation prediction system, GasPhos, and proposed a new feature selection approach, called Gas, based on the ant colony system and a genetic algorithm and used performance evaluation strategies focused on different kinases to choose the best learning model. Gas uses the mean decrease Gini index (MDGI) as a heuristic value for path selection and adopts binary transformation strategies and new state transition rules. GasPhos can predict phosphorylation sites for six kinases and showed better performance than other phosphorylation prediction tools. The disease-related phosphorylated proteins that were predicted with GasPhos are also discussed. Finally, Gas can be applied to other issues that require feature selection, which could help to improve prediction performance.


2018 ◽  
Vol 21 (2) ◽  
pp. 595-608 ◽  
Author(s):  
Man Cao ◽  
Guodong Chen ◽  
Jialin Yu ◽  
Shaoping Shi

Abstract Protein phosphorylation is a reversible and ubiquitous post-translational modification that primarily occurs at serine, threonine and tyrosine residues and regulates a variety of biological processes. In this paper, we first briefly summarized the current progresses in computational prediction of eukaryotic protein phosphorylation sites, which mainly focused on animals and plants, especially on human, with a less extent on fungi. Since the number of identified fungi phosphorylation sites has greatly increased in a wide variety of organisms and their roles in pathological physiology still remain largely unknown, more attention has been paid on the identification of fungi-specific phosphorylation. Here, experimental fungi phosphorylation sites data were collected and most of the sites were classified into different types to be encoded with various features and trained via a two-step feature optimization method. A novel method for prediction of species-specific fungi phosphorylation-PreSSFP was developed, which can identify fungi phosphorylation in seven species for specific serine, threonine and tyrosine residues (http://computbiol.ncu.edu.cn/PreSSFP). Meanwhile, we critically evaluated the performance of PreSSFP and compared it with other existing tools. The satisfying results showed that PreSSFP is a robust predictor. Feature analyses exhibited that there have some significant differences among seven species. The species-specific prediction via two-step feature optimization method to mine important features for training could considerably improve the prediction performance. We anticipate that our study provides a new lead for future computational analysis of fungi phosphorylation.


Author(s):  
Min Zeng ◽  
Fuhao Zhang ◽  
Fang-Xiang Wu ◽  
Yaohang Li ◽  
Jianxin Wang ◽  
...  

Abstract Motivation Protein–protein interactions (PPIs) play important roles in many biological processes. Conventional biological experiments for identifying PPI sites are costly and time-consuming. Thus, many computational approaches have been proposed to predict PPI sites. Existing computational methods usually use local contextual features to predict PPI sites. Actually, global features of protein sequences are critical for PPI site prediction. Results A new end-to-end deep learning framework, named DeepPPISP, through combining local contextual and global sequence features, is proposed for PPI site prediction. For local contextual features, we use a sliding window to capture features of neighbors of a target amino acid as in previous studies. For global sequence features, a text convolutional neural network is applied to extract features from the whole protein sequence. Then the local contextual and global sequence features are combined to predict PPI sites. By integrating local contextual and global sequence features, DeepPPISP achieves the state-of-the-art performance, which is better than the other competing methods. In order to investigate if global sequence features are helpful in our deep learning model, we remove or change some components in DeepPPISP. Detailed analyses show that global sequence features play important roles in DeepPPISP. Availability and implementation The DeepPPISP web server is available at http://bioinformatics.csu.edu.cn/PPISP/. The source code can be obtained from https://github.com/CSUBioGroup/DeepPPISP. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document