Computational prediction and analysis of species-specific fungi phosphorylation via feature optimization strategy

2018 ◽  
Vol 21 (2) ◽  
pp. 595-608 ◽  
Author(s):  
Man Cao ◽  
Guodong Chen ◽  
Jialin Yu ◽  
Shaoping Shi

Abstract Protein phosphorylation is a reversible and ubiquitous post-translational modification that primarily occurs at serine, threonine and tyrosine residues and regulates a variety of biological processes. In this paper, we first briefly summarized the current progresses in computational prediction of eukaryotic protein phosphorylation sites, which mainly focused on animals and plants, especially on human, with a less extent on fungi. Since the number of identified fungi phosphorylation sites has greatly increased in a wide variety of organisms and their roles in pathological physiology still remain largely unknown, more attention has been paid on the identification of fungi-specific phosphorylation. Here, experimental fungi phosphorylation sites data were collected and most of the sites were classified into different types to be encoded with various features and trained via a two-step feature optimization method. A novel method for prediction of species-specific fungi phosphorylation-PreSSFP was developed, which can identify fungi phosphorylation in seven species for specific serine, threonine and tyrosine residues (http://computbiol.ncu.edu.cn/PreSSFP). Meanwhile, we critically evaluated the performance of PreSSFP and compared it with other existing tools. The satisfying results showed that PreSSFP is a robust predictor. Feature analyses exhibited that there have some significant differences among seven species. The species-specific prediction via two-step feature optimization method to mine important features for training could considerably improve the prediction performance. We anticipate that our study provides a new lead for future computational analysis of fungi phosphorylation.

2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Qingyu Xiao ◽  
Benpeng Miao ◽  
Jie Bi ◽  
Zhen Wang ◽  
Yixue Li

Abstract Protein phosphorylation is an important type of post-translational modification that is involved in a variety of biological activities. Most phosphorylation events occur on serine, threonine and tyrosine residues in eukaryotes. In recent years, many phosphorylation sites have been identified as a result of advances in mass-spectrometric techniques. However, a large percentage of phosphorylation sites may be non-functional. Systematically prioritizing functional sites from a large number of phosphorylation sites will be increasingly important for the study of their biological roles. This study focused on exploring the intrinsic features of functional phosphorylation sites to predict whether a phosphosite is likely to be functional. We found significant differences in the distribution of evolutionary conservation, kinase association, disorder score, and secondary structure between known functional and background phosphorylation datasets. We built four different types of classifiers based on the most representative features and found that their performances were similar. We also prioritized 213,837 human phosphorylation sites from a variety of phosphorylation databases, which will be helpful for subsequent functional studies. All predicted results are available for query and download on our website (Predict Functional Phosphosites, PFP, http://pfp.biosino.org/pfp).


2019 ◽  
Vol 35 (16) ◽  
pp. 2766-2773 ◽  
Author(s):  
Fenglin Luo ◽  
Minghui Wang ◽  
Yu Liu ◽  
Xing-Ming Zhao ◽  
Ao Li

Abstract Motivation Phosphorylation is the most studied post-translational modification, which is crucial for multiple biological processes. Recently, many efforts have been taken to develop computational predictors for phosphorylation site prediction, but most of them are based on feature selection and discriminative classification. Thus, it is useful to develop a novel and highly accurate predictor that can unveil intricate patterns automatically for protein phosphorylation sites. Results In this study we present DeepPhos, a novel deep learning architecture for prediction of protein phosphorylation. Unlike multi-layer convolutional neural networks, DeepPhos consists of densely connected convolutional neuron network blocks which can capture multiple representations of sequences to make final phosphorylation prediction by intra block concatenation layers and inter block concatenation layers. DeepPhos can also be used for kinase-specific prediction varying from group, family, subfamily and individual kinase level. The experimental results demonstrated that DeepPhos outperforms competitive predictors in general and kinase-specific phosphorylation site prediction. Availability and implementation The source code of DeepPhos is publicly deposited at https://github.com/USTCHIlab/DeepPhos. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Shaofeng Lin ◽  
Chenwei Wang ◽  
Jiaqi Zhou ◽  
Ying Shi ◽  
Chen Ruan ◽  
...  

Abstract As an important post-translational modification (PTM), protein phosphorylation is involved in the regulation of almost all of biological processes in eukaryotes. Due to the rapid progress in mass spectrometry-based phosphoproteomics, a large number of phosphorylation sites (p-sites) have been characterized but remain to be curated. Here, we briefly summarized the current progresses in the development of data resources for the collection, curation, integration and annotation of p-sites in eukaryotic proteins. Also, we designed the eukaryotic phosphorylation site database (EPSD), which contained 1 616 804 experimentally identified p-sites in 209 326 phosphoproteins from 68 eukaryotic species. In EPSD, we not only collected 1 451 629 newly identified p-sites from high-throughput (HTP) phosphoproteomic studies, but also integrated known p-sites from 13 additional databases. Moreover, we carefully annotated the phosphoproteins and p-sites of eight model organisms by integrating the knowledge from 100 additional resources that covered 15 aspects, including phosphorylation regulator, genetic variation and mutation, functional annotation, structural annotation, physicochemical property, functional domain, disease-associated information, protein-protein interaction, drug-target relation, orthologous information, biological pathway, transcriptional regulator, mRNA expression, protein expression/proteomics and subcellular localization. We anticipate that the EPSD can serve as a useful resource for further analysis of eukaryotic phosphorylation. With a data volume of 14.1 GB, EPSD is free for all users at http://epsd.biocuckoo.cn/.


Biomolecules ◽  
2019 ◽  
Vol 9 (2) ◽  
pp. 39 ◽  
Author(s):  
Zi-Shu Lu ◽  
Qian-Si Chen ◽  
Qing-Xia Zheng ◽  
Juan-Juan Shen ◽  
Zhao-Peng Luo ◽  
...  

Tobacco mosaic virus (TMV) is a common source of biological stress that significantly affects plant growth and development. It is also useful as a model in studies designed to clarify the mechanisms involved in plant viral disease. Plant responses to abiotic stress were recently reported to be regulated by complex mechanisms at the post-translational modification (PTM) level. Protein phosphorylation is one of the most widespread and major PTMs in organisms. Using immobilized metal ion affinity chromatography (IMAC) enrichment, high-pH C18 chromatography fraction, and high-accuracy mass spectrometry (MS), a set of proteins and phosphopeptides in both TMV-infected tobacco and control tobacco were identified. A total of 4905 proteins and 3998 phosphopeptides with 3063 phosphorylation sites were identified. These 3998 phosphopeptides were assigned to 1311 phosphoproteins, as some proteins carried multiple phosphorylation sites. Among them, 530 proteins and 337 phosphopeptides corresponding to 277 phosphoproteins differed between the two groups. There were 43 upregulated phosphoproteins, including phosphoglycerate kinase, pyruvate phosphate dikinase, protein phosphatase 2C, and serine/threonine protein kinase. To the best of our knowledge, this is the first phosphoproteomic analysis of leaves from a tobacco cultivar, K326. The results of this study advance our understanding of tobacco development and TMV action at the protein phosphorylation level.


2014 ◽  
Author(s):  
◽  
Qiuming Yao

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] Protein posttranslational modification (PTM) occurs broadly after or during protein biosynthesis, to assist folding or activate function during the protein lifetime. Among all types of PTMs, protein phosphorylation is widely recognized as the most pervasive, enzyme-catalyzed post-translational modification in eukaryotes. In particular, plants have higher magnitude of this signaling mechanism in terms of the protein kinase frequency within the genome compared to other eukaryotes. Phosphorylation site mapping using high-resolution mass spectrometry has grown exponentially. In Arabidopsis alone there are thousands of experimentally-determined phosphorylation sites. Likewise, other types of post translational modification data are rapidly increasing too. Acetylation proteome is another big data set in PTM kingdom. To provide an easy access of these modification events in a user-intuitive format we have developed P3DB, The Plant Protein Phosphorylation Database (p3db.org). This database is a repository for plant protein phosphorylation site data. These data can be queried for a protein-of-interest using an integrated BLAST function to search for similar sequences with known phosphorylation sites among the multiple plants currently investigated. Thus, this resource can help identify functionally-conserved phosphorylation sites in plants using a multi-system approach. Centralized by these phosphorylation data, multiple related data and annotations are provided, including protein-protein interaction (PPI), gene ontology, protein tertiary structures, orthologous sequences, kinase/phosphatase classification and Kinase Client Assay (KiC Assay) data. P3DB thus is not only a repository, but also a context provider for studying phosphorylation events. In addition, P3DB incorporates multiple network viewers for the above features, such as PPI network, kinase-substrate network, phosphatase-substrate network, and domain co-occurrence network to help study phosphorylation from a systems point of view. Furthermore, P3DB reflects a community-based design through which users can share data sets and automate data depository processes for publication purposes. Since P3DB is a comprehensive, systematic, and interactive platform for phosphoproteomics research, many data analyses can be done based on it. For example, the disorder analysis and the sequence conservation can be done based on the P3DB datasets. Many researchers downloaded and did some meaningful analysis based on P3DB infrastructure. Although with the development of the high-resolution mass spectrometry protein phosphorylation sites can be reliably identified, the experimental approach is time-consuming and resource-dependent. Furthermore, it is unlikely that an experimental approach could catalog an entire phosphoproteome. Computational prediction of phosphorylation sites provides an efficient and flexible way to reveal potential phosphorylation sites, facilitate experimental phosphorylation site identification and provide hypotheses in experimental design. Musite is a powerful tool that we developed to predict phosphorylation sites based solely on protein sequence. Musite integrates data preprocessing, feature extraction, machine-learning method, and prediction models into one comprehensive tool. Musite (http://musite.net) can be extended to all types of post translational modification study, as long as the dataset contains sufficient modification sites. To further improve the performance of Musite, a generalized motif tree applying fuzzy logic is introduced to compensate the machine learning based prediction. On one hand, using a tree based approach and fuzzy variables help to interpret the final rules, in order to help biologists to obtain the significant patterns. On the other hand, its extracted rule sets essentially generalize the motifs and reveal more information. It can be paired with traditional classification method and provide better interpretation, pre-filtering and analyzing power. Comparing to traditional motif extraction, the fuzzy motif decision tree is able to borrow more information from the observations and thus it may extract more novel motifs or more comprehensive patterns. It can be applied on kinase specific phosphorylated peptides to achieve more insights of the phosphorylation events. A comprehensive database (P3DB), a well-developed prediction tool (Musite), and a generalized motif constructor (Fuzzy Motif Tree) combined enable researchers to investigate the phosphorylation and other posttranslational modification events more thoroughly and thus to reveal more underlying biological significance by applying these computational resources.


2020 ◽  
Vol 17 (4) ◽  
pp. 645-649
Author(s):  
Doan Minh Thu ◽  
Nguyen Thi Minh Viet ◽  
Pham Thi Kim Lien

Protein phosphorylation plays an important role in many cellular signalings which are relating to many diseases. Therefore, a variety of biochemical techniques has been developed to study protein phosphorylation in cells. Protein phosphorylation has traditionally been detected by radioisotope phosphate labeling of proteins with radioactive ATP. Phosphorylation site-specific antibodies are now available for the analysis of phosphorylation status at target sites. However, these antibodies cannot be used to detect unidentified phosphorylation sites. Recently, the Phos-tag technology has been developed to overcome the disadvantages and limitations of these methods. Phos-tag and its derivatives conjugated to biotin, acrylamide, or agarose, and can capture phosphate monoester dianions bound to serine, threonine, and tyrosine residues, in an amino acid sequence-independent manner. The grouping of the Phos-tag will alter the mobility of protein on the gel depending on the amount of serine, threonine or tyrosine which are phosphorylated. Here, we describe the method to detect the phosphorylation of Pop2 protein, one of the exonucleases in the Ccr4-Not complex regulating the shortening of poly(A) tail of mRNAs using phosphate affinity Phos-tag SDS-PAGE. We observed clear electrophoretic 04 shift bands of Pop2-3XFlag under unstressed conditions. This is the first study which observes Pop2 phosphorylation in normal culture conditions. This study showed the convenience and advantages of Phos-tag SDS-PAGE for research on molecular mechanisms regulating the function of protein.


2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Salma Jamal ◽  
Waseem Ali ◽  
Priya Nagpal ◽  
Abhinav Grover ◽  
Sonam Grover

Abstract Background Post-translational modification (PTM) is a biological process that alters proteins and is therefore involved in the regulation of various cellular activities and pathogenesis. Protein phosphorylation is an essential process and one of the most-studied PTMs: it occurs when a phosphate group is added to serine (Ser, S), threonine (Thr, T), or tyrosine (Tyr, Y) residue. Dysregulation of protein phosphorylation can lead to various diseases—most commonly neurological disorders, Alzheimer’s disease, and Parkinson’s disease—thus necessitating the prediction of S/T/Y residues that can be phosphorylated in an uncharacterized amino acid sequence. Despite a surplus of sequencing data, current experimental methods of PTM prediction are time-consuming, costly, and error-prone, so a number of computational methods have been proposed to replace them. However, phosphorylation prediction remains limited, owing to substrate specificity, performance, and the diversity of its features. Methods In the present study we propose machine-learning-based predictors that use the physicochemical, sequence, structural, and functional information of proteins to classify S/T/Y phosphorylation sites. Rigorous feature selection, the minimum redundancy/maximum relevance approach, and the symmetrical uncertainty method were employed to extract the most informative features to train the models. Results The RF and SVM models generated using diverse feature types in the present study were highly accurate as is evident from good values for different statistical measures. Moreover, independent test sets and benchmark validations indicated that the proposed method clearly outperformed the existing methods, demonstrating its ability to accurately predict protein phosphorylation. Conclusions The results obtained in the present work indicate that the proposed computational methodology can be effectively used for predicting putative phosphorylation sites further facilitating discovery of various biological processes mechanisms.


2021 ◽  
Vol 18 ◽  
Author(s):  
Min Liu ◽  
Lu Zhang ◽  
Xinyi Qin ◽  
Tao Huang ◽  
Ziwei Xu ◽  
...  

Background: Nitration is one of the important Post-Translational Modification (PTM) occurring on the tyrosine residues of proteins. The occurrence of protein tyrosine nitration under disease conditions is inevitable and represents a shift from the signal transducing physiological actions of -NO to oxidative and potentially pathogenic pathways. Abnormal protein nitration modification can lead to serious human diseases, including neurodegenerative diseases, acute respiratory distress, organ transplant rejection and lung cancer. Objective: It is necessary and important to identify the nitration sites in protein sequences. Predicting that which tyrosine residues in the protein sequence are nitrated and which are not is of great significance for the study of nitration mechanism and related diseases. Methods: In this study, a prediction model of nitration sites based on the over-under sampling strategy and the FCBF method was proposed by stacking ensemble learning and fusing multiple features. Firstly, the protein sequence sample was encoded by 2701-dimensional fusion features (PseAAC, PSSM, AAIndex, CKSAAP, Disorder). Secondly, the ranked feature set was generated by the FCBF method according to the symmetric uncertainty metric. Thirdly, in the process of model training, use the over- and under- sampling technique was used to tackle the imbalanced dataset. Finally, the Incremental Feature Selection (IFS) method was adopted to extract an optimal classifier based on 10-fold cross-validation. Results and Conclusion: Results show that the model has significant performance advantages in indicators such as MCC, Recall and F1-score, no matter in what way the comparison was conducted with other classifiers on the independent test set, or made by cross-validation with single-type feature or with fusion-features on the training set. By integrating the FCBF feature ranking methods, over- and under- sampling technique and a stacking model composed of multiple base classifiers, an effective prediction model for nitration PTM sites was build, which can achieve a better recall rate when the ratio of positive and negative samples is highly imbalanced.


Viruses ◽  
2021 ◽  
Vol 13 (7) ◽  
pp. 1393
Author(s):  
Thanyaporn Dechtawewat ◽  
Sittiruk Roytrakul ◽  
Yodying Yingchutrakul ◽  
Sawanya Charoenlappanit ◽  
Bunpote Siridechadilok ◽  
...  

Dengue virus (DENV) infection causes a spectrum of dengue diseases that have unclear underlying mechanisms. Nonstructural protein 1 (NS1) is a multifunctional protein of DENV that is involved in DENV infection and dengue pathogenesis. This study investigated the potential post-translational modification of DENV NS1 by phosphorylation following DENV infection. Using liquid chromatography-tandem mass spectrometry (LC-MS/MS), 24 potential phosphorylation sites were identified in both cell-associated and extracellular NS1 proteins from three different cell lines infected with DENV. Cell-free kinase assays also demonstrated kinase activity in purified preparations of DENV NS1 proteins. Further studies were conducted to determine the roles of specific phosphorylation sites on NS1 proteins by site-directed mutagenesis with alanine substitution. The T27A and Y32A mutations had a deleterious effect on DENV infectivity. The T29A, T230A, and S233A mutations significantly decreased the production of infectious DENV but did not affect relative levels of intracellular DENV NS1 expression or NS1 secretion. Only the T230A mutation led to a significant reduction of detectable DENV NS1 dimers in virus-infected cells; however, none of the mutations interfered with DENV NS1 oligomeric formation. These findings highlight the importance of DENV NS1 phosphorylation that may pave the way for future target-specific antiviral drug design.


Sign in / Sign up

Export Citation Format

Share Document