EPSD: a well-annotated data resource of protein phosphorylation sites in eukaryotes

Author(s):  
Shaofeng Lin ◽  
Chenwei Wang ◽  
Jiaqi Zhou ◽  
Ying Shi ◽  
Chen Ruan ◽  
...  

Abstract As an important post-translational modification (PTM), protein phosphorylation is involved in the regulation of almost all of biological processes in eukaryotes. Due to the rapid progress in mass spectrometry-based phosphoproteomics, a large number of phosphorylation sites (p-sites) have been characterized but remain to be curated. Here, we briefly summarized the current progresses in the development of data resources for the collection, curation, integration and annotation of p-sites in eukaryotic proteins. Also, we designed the eukaryotic phosphorylation site database (EPSD), which contained 1 616 804 experimentally identified p-sites in 209 326 phosphoproteins from 68 eukaryotic species. In EPSD, we not only collected 1 451 629 newly identified p-sites from high-throughput (HTP) phosphoproteomic studies, but also integrated known p-sites from 13 additional databases. Moreover, we carefully annotated the phosphoproteins and p-sites of eight model organisms by integrating the knowledge from 100 additional resources that covered 15 aspects, including phosphorylation regulator, genetic variation and mutation, functional annotation, structural annotation, physicochemical property, functional domain, disease-associated information, protein-protein interaction, drug-target relation, orthologous information, biological pathway, transcriptional regulator, mRNA expression, protein expression/proteomics and subcellular localization. We anticipate that the EPSD can serve as a useful resource for further analysis of eukaryotic phosphorylation. With a data volume of 14.1 GB, EPSD is free for all users at http://epsd.biocuckoo.cn/.

2019 ◽  
Vol 35 (16) ◽  
pp. 2766-2773 ◽  
Author(s):  
Fenglin Luo ◽  
Minghui Wang ◽  
Yu Liu ◽  
Xing-Ming Zhao ◽  
Ao Li

Abstract Motivation Phosphorylation is the most studied post-translational modification, which is crucial for multiple biological processes. Recently, many efforts have been taken to develop computational predictors for phosphorylation site prediction, but most of them are based on feature selection and discriminative classification. Thus, it is useful to develop a novel and highly accurate predictor that can unveil intricate patterns automatically for protein phosphorylation sites. Results In this study we present DeepPhos, a novel deep learning architecture for prediction of protein phosphorylation. Unlike multi-layer convolutional neural networks, DeepPhos consists of densely connected convolutional neuron network blocks which can capture multiple representations of sequences to make final phosphorylation prediction by intra block concatenation layers and inter block concatenation layers. DeepPhos can also be used for kinase-specific prediction varying from group, family, subfamily and individual kinase level. The experimental results demonstrated that DeepPhos outperforms competitive predictors in general and kinase-specific phosphorylation site prediction. Availability and implementation The source code of DeepPhos is publicly deposited at https://github.com/USTCHIlab/DeepPhos. Supplementary information Supplementary data are available at Bioinformatics online.


2014 ◽  
Author(s):  
◽  
Qiuming Yao

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] Protein posttranslational modification (PTM) occurs broadly after or during protein biosynthesis, to assist folding or activate function during the protein lifetime. Among all types of PTMs, protein phosphorylation is widely recognized as the most pervasive, enzyme-catalyzed post-translational modification in eukaryotes. In particular, plants have higher magnitude of this signaling mechanism in terms of the protein kinase frequency within the genome compared to other eukaryotes. Phosphorylation site mapping using high-resolution mass spectrometry has grown exponentially. In Arabidopsis alone there are thousands of experimentally-determined phosphorylation sites. Likewise, other types of post translational modification data are rapidly increasing too. Acetylation proteome is another big data set in PTM kingdom. To provide an easy access of these modification events in a user-intuitive format we have developed P3DB, The Plant Protein Phosphorylation Database (p3db.org). This database is a repository for plant protein phosphorylation site data. These data can be queried for a protein-of-interest using an integrated BLAST function to search for similar sequences with known phosphorylation sites among the multiple plants currently investigated. Thus, this resource can help identify functionally-conserved phosphorylation sites in plants using a multi-system approach. Centralized by these phosphorylation data, multiple related data and annotations are provided, including protein-protein interaction (PPI), gene ontology, protein tertiary structures, orthologous sequences, kinase/phosphatase classification and Kinase Client Assay (KiC Assay) data. P3DB thus is not only a repository, but also a context provider for studying phosphorylation events. In addition, P3DB incorporates multiple network viewers for the above features, such as PPI network, kinase-substrate network, phosphatase-substrate network, and domain co-occurrence network to help study phosphorylation from a systems point of view. Furthermore, P3DB reflects a community-based design through which users can share data sets and automate data depository processes for publication purposes. Since P3DB is a comprehensive, systematic, and interactive platform for phosphoproteomics research, many data analyses can be done based on it. For example, the disorder analysis and the sequence conservation can be done based on the P3DB datasets. Many researchers downloaded and did some meaningful analysis based on P3DB infrastructure. Although with the development of the high-resolution mass spectrometry protein phosphorylation sites can be reliably identified, the experimental approach is time-consuming and resource-dependent. Furthermore, it is unlikely that an experimental approach could catalog an entire phosphoproteome. Computational prediction of phosphorylation sites provides an efficient and flexible way to reveal potential phosphorylation sites, facilitate experimental phosphorylation site identification and provide hypotheses in experimental design. Musite is a powerful tool that we developed to predict phosphorylation sites based solely on protein sequence. Musite integrates data preprocessing, feature extraction, machine-learning method, and prediction models into one comprehensive tool. Musite (http://musite.net) can be extended to all types of post translational modification study, as long as the dataset contains sufficient modification sites. To further improve the performance of Musite, a generalized motif tree applying fuzzy logic is introduced to compensate the machine learning based prediction. On one hand, using a tree based approach and fuzzy variables help to interpret the final rules, in order to help biologists to obtain the significant patterns. On the other hand, its extracted rule sets essentially generalize the motifs and reveal more information. It can be paired with traditional classification method and provide better interpretation, pre-filtering and analyzing power. Comparing to traditional motif extraction, the fuzzy motif decision tree is able to borrow more information from the observations and thus it may extract more novel motifs or more comprehensive patterns. It can be applied on kinase specific phosphorylated peptides to achieve more insights of the phosphorylation events. A comprehensive database (P3DB), a well-developed prediction tool (Musite), and a generalized motif constructor (Fuzzy Motif Tree) combined enable researchers to investigate the phosphorylation and other posttranslational modification events more thoroughly and thus to reveal more underlying biological significance by applying these computational resources.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Niraj Thapa ◽  
Meenal Chaudhari ◽  
Anthony A. Iannetta ◽  
Clarence White ◽  
Kaushik Roy ◽  
...  

AbstractProtein phosphorylation, which is one of the most important post-translational modifications (PTMs), is involved in regulating myriad cellular processes. Herein, we present a novel deep learning based approach for organism-specific protein phosphorylation site prediction in Chlamydomonas reinhardtii, a model algal phototroph. An ensemble model combining convolutional neural networks and long short-term memory (LSTM) achieves the best performance in predicting phosphorylation sites in C. reinhardtii. Deemed Chlamy-EnPhosSite, the measured best AUC and MCC are 0.90 and 0.64 respectively for a combined dataset of serine (S) and threonine (T) in independent testing higher than those measures for other predictors. When applied to the entire C. reinhardtii proteome (totaling 1,809,304 S and T sites), Chlamy-EnPhosSite yielded 499,411 phosphorylated sites with a cut-off value of 0.5 and 237,949 phosphorylated sites with a cut-off value of 0.7. These predictions were compared to an experimental dataset of phosphosites identified by liquid chromatography-tandem mass spectrometry (LC–MS/MS) in a blinded study and approximately 89.69% of 2,663 C. reinhardtii S and T phosphorylation sites were successfully predicted by Chlamy-EnPhosSite at a probability cut-off of 0.5 and 76.83% of sites were successfully identified at a more stringent 0.7 cut-off. Interestingly, Chlamy-EnPhosSite also successfully predicted experimentally confirmed phosphorylation sites in a protein sequence (e.g., RPS6 S245) which did not appear in the training dataset, highlighting prediction accuracy and the power of leveraging predictions to identify biologically relevant PTM sites. These results demonstrate that our method represents a robust and complementary technique for high-throughput phosphorylation site prediction in C. reinhardtii. It has potential to serve as a useful tool to the community. Chlamy-EnPhosSite will contribute to the understanding of how protein phosphorylation influences various biological processes in this important model microalga.


2020 ◽  
Vol 21 (21) ◽  
pp. 7891
Author(s):  
Chi-Wei Chen ◽  
Lan-Ying Huang ◽  
Chia-Feng Liao ◽  
Kai-Po Chang ◽  
Yen-Wei Chu

Protein phosphorylation is one of the most important post-translational modifications, and many biological processes are related to phosphorylation, such as DNA repair, transcriptional regulation and signal transduction and, therefore, abnormal regulation of phosphorylation usually causes diseases. If we can accurately predict human phosphorylation sites, this could help to solve human diseases. Therefore, we developed a kinase-specific phosphorylation prediction system, GasPhos, and proposed a new feature selection approach, called Gas, based on the ant colony system and a genetic algorithm and used performance evaluation strategies focused on different kinases to choose the best learning model. Gas uses the mean decrease Gini index (MDGI) as a heuristic value for path selection and adopts binary transformation strategies and new state transition rules. GasPhos can predict phosphorylation sites for six kinases and showed better performance than other phosphorylation prediction tools. The disease-related phosphorylated proteins that were predicted with GasPhos are also discussed. Finally, Gas can be applied to other issues that require feature selection, which could help to improve prediction performance.


2018 ◽  
Vol 21 (2) ◽  
pp. 595-608 ◽  
Author(s):  
Man Cao ◽  
Guodong Chen ◽  
Jialin Yu ◽  
Shaoping Shi

Abstract Protein phosphorylation is a reversible and ubiquitous post-translational modification that primarily occurs at serine, threonine and tyrosine residues and regulates a variety of biological processes. In this paper, we first briefly summarized the current progresses in computational prediction of eukaryotic protein phosphorylation sites, which mainly focused on animals and plants, especially on human, with a less extent on fungi. Since the number of identified fungi phosphorylation sites has greatly increased in a wide variety of organisms and their roles in pathological physiology still remain largely unknown, more attention has been paid on the identification of fungi-specific phosphorylation. Here, experimental fungi phosphorylation sites data were collected and most of the sites were classified into different types to be encoded with various features and trained via a two-step feature optimization method. A novel method for prediction of species-specific fungi phosphorylation-PreSSFP was developed, which can identify fungi phosphorylation in seven species for specific serine, threonine and tyrosine residues (http://computbiol.ncu.edu.cn/PreSSFP). Meanwhile, we critically evaluated the performance of PreSSFP and compared it with other existing tools. The satisfying results showed that PreSSFP is a robust predictor. Feature analyses exhibited that there have some significant differences among seven species. The species-specific prediction via two-step feature optimization method to mine important features for training could considerably improve the prediction performance. We anticipate that our study provides a new lead for future computational analysis of fungi phosphorylation.


2021 ◽  
Author(s):  
Niraj Thapa ◽  
Meenal Chaudhari ◽  
Anthony A. Iannetta ◽  
Clarence White ◽  
Kaushik Roy ◽  
...  

Abstract Protein phosphorylation is one of the most important post-translational modifications (PTMs) and involved in myriad cellular processes. Although many non-organism-specific computational phosphorylation site prediction tools and a few tools for organism-specific phosphorylation site prediction exist, none are currently available for Chlamydomonas reinhardtii. Herein, we present a novel deep learning (DL) based approach for organism-specific protein phosphorylation site prediction in Chlamydomonas reinhardtii, a model algal phototroph. Our novel approach called Chlamy-EnPhosSite (based on ensemble approach combining convolutional neural networks (CNN) and long short-term memory LSTM) produces AUC and MCC of 0.90 and 0.64 respectively for a combined dataset of serine (S) and threonine (T) in independent testing. When applied to the entire C. reinhardtii proteome (totaling 1,809,304 S and T sites), Chlamy-EnPhosSite yielded 499,411 phosphorylated sites with a cut-off value of 0.5 and 237,949 phosphorylated sites with a cut-off value of 0.7. These predictions were compared to an experimental dataset of phosphosites identified by liquid chromatography-tandem mass spectrometry (LC-MS/MS) in a blinded study and approximately 90% of 2,663 C. reinhardtii S and T phosphorylation sites were successfully predicted by Chlamy-EnPhosSite at a probability cut-off of 0.5 and 77% of sites were successfully identified at a more stringent 0.7 cut-off. Interestingly, Chlamy-EnPhosSite also successfully predicted experimentally confirmed phosphorylation sites in a protein sequence (e.g., RPS6 S245) which did not appear in the training dataset, highlighting prediction accuracy and the power of leveraging predictions to identify biologically relevant PTM sites. These results demonstrate that our method represents a robust and complementary technique for high-throughput phosphorylation site prediction in C. reinhardtii. It has potential to serve as a useful tool to the community. Chlamy-EnPhosSite will contribute to the understanding of how protein phosphorylation influences various biological processes in this important model microalga.


2008 ◽  
Vol 26 (12) ◽  
pp. 1339-1340 ◽  
Author(s):  
Bernd Bodenmiller ◽  
David Campbell ◽  
Bertran Gerrits ◽  
Henry Lam ◽  
Marko Jovanovic ◽  
...  

Author(s):  
Young IRIVBOJE ◽  
Adeboye FAFIOLU ◽  
Oluwabusayo IRIVBOJE ◽  
Christian IKEOBI

HSP90AA1, an isoform of HSP90 has been characterized to indicate it plays important roles in basic cellular events. It is activated in chicken in response to heat stress. This study was aimed at the computational analysis of the biochemical cum structural features and an evolutionary relationship study on the HSP90AA1 gene in Dominant brown layers (DBL) and some selected avian species using bioinformatics tools. ProtParam for physicochemical properties. Scanprosite for post-translational modification sites, Netphos-3.1 for phosphorylation sites, BDM-PUB program for Ubiquitination sites, PDBSUM for Secondary structure and homology modelling with SWISS-model. The findings revealed that intron 7 and exon 8 of HSP90AA1 protein in DBL had a molecular weight of 24681.19Da and an instability index of 27.60, contains N-myristoylation, Protein-kinase-C-phosphorylation and Tyrosine-kinase-phosphorylation site-2 post-translational modification sites, 4-serine and 2-threonine phosphorylation sites and 12-ubiquitination sites. The evolutionary relationship study found Japanese quail to be in a sister branch close to DBL and chicken. Motifs detected in the avian species revealed the gene to be highly conserved. The secondary structure consisted of 16-helices, 3-sheets and 14-strands. The homology modelling was 87.25% sequence identity with human MC-HSP90-alpha. The study elucidates the components and characteristics of HSP90AA1 in DBL in response to heat stress.


2020 ◽  
Author(s):  
Yujia Xiang ◽  
Quan Zou ◽  
Lilin Zhao

AbstractIn viruses, post-translational modifications (PTMs) are essential for their life cycle. Recognizing viral PTMs is very important for better understanding the mechanism of viral infections and finding potential drug targets. However, few studies have investigated the roles of viral PTMs in virus-human interactions using comprehensive viral PTM datasets. To fill this gap, firstly, we developed a viral post-translational modification database (VPTMdb) for collecting systematic information of viral PTM data. The VPTMdb contains 912 PTM sites that integrate 414 experimental-confirmed PTM sites with 98 proteins in 45 human viruses manually extracted from 162 publications and 498 PTMs extracted from UniProtKB/Swiss-Prot. Secondly, we investigated the viral PTM sequence motifs, the function of target human proteins, and characteristics of PTM protein domains. The results showed that (i) viral PTMs have the consensus motifs with human proteins in phosphorylation, SUMOylation and N-glycosylation. (ii) The function of human proteins that targeted by viral PTM proteins are related to protein targeting, translation, and localization. (iii) Viral PTMs are more likely to be enriched in protein domains. The findings should make an important contribution to the field of virus-human interaction. Moreover, we created a novel sequence-based classifier named VPTMpre to help users predict viral protein phosphorylation sites. Finally, an online web server was implemented for users to download viral protein PTM data and predict phosphorylation sites of interest.Author summaryPost-translational modifications (PTMs) plays an important role in the regulation of viral proteins; However, due to the limitation of data sets, there has been no detailed investigation of viral protein PTMs characteristics. In this manuscript, we collected experimentally verified viral protein post-translational modification sites and analysed viral PTMs data from a bioinformatics perspective. Besides, we constructed a novel feature-based machine learning model for predicting phosphorylation site. This is the first study to explore the roles of viral protein modification in virus infection using computational methods. The valuable viral protein PTM data resource will provide new insights into virus-host interaction.


Biomolecules ◽  
2019 ◽  
Vol 9 (2) ◽  
pp. 39 ◽  
Author(s):  
Zi-Shu Lu ◽  
Qian-Si Chen ◽  
Qing-Xia Zheng ◽  
Juan-Juan Shen ◽  
Zhao-Peng Luo ◽  
...  

Tobacco mosaic virus (TMV) is a common source of biological stress that significantly affects plant growth and development. It is also useful as a model in studies designed to clarify the mechanisms involved in plant viral disease. Plant responses to abiotic stress were recently reported to be regulated by complex mechanisms at the post-translational modification (PTM) level. Protein phosphorylation is one of the most widespread and major PTMs in organisms. Using immobilized metal ion affinity chromatography (IMAC) enrichment, high-pH C18 chromatography fraction, and high-accuracy mass spectrometry (MS), a set of proteins and phosphopeptides in both TMV-infected tobacco and control tobacco were identified. A total of 4905 proteins and 3998 phosphopeptides with 3063 phosphorylation sites were identified. These 3998 phosphopeptides were assigned to 1311 phosphoproteins, as some proteins carried multiple phosphorylation sites. Among them, 530 proteins and 337 phosphopeptides corresponding to 277 phosphoproteins differed between the two groups. There were 43 upregulated phosphoproteins, including phosphoglycerate kinase, pyruvate phosphate dikinase, protein phosphatase 2C, and serine/threonine protein kinase. To the best of our knowledge, this is the first phosphoproteomic analysis of leaves from a tobacco cultivar, K326. The results of this study advance our understanding of tobacco development and TMV action at the protein phosphorylation level.


Sign in / Sign up

Export Citation Format

Share Document