scholarly journals DeepNGlyPred: A Deep Neural Network-Based Approach for Human N-Linked Glycosylation Site Prediction

Molecules ◽  
2021 ◽  
Vol 26 (23) ◽  
pp. 7314
Author(s):  
Subash C. Pakhrin ◽  
Kiyoko F. Aoki-Kinoshita ◽  
Doina Caragea ◽  
Dukka B. KC

Protein N-linked glycosylation is a post-translational modification that plays an important role in a myriad of biological processes. Computational prediction approaches serve as complementary methods for the characterization of glycosylation sites. Most of the existing predictors for N-linked glycosylation utilize the information that the glycosylation site occurs at the N-X-[S/T] sequon, where X is any amino acid except proline. Not all N-X-[S/T] sequons are glycosylated, thus the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In that regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem. Here, we report DeepNGlyPred a deep learning-based approach that encodes the positive and negative sequences in the human proteome dataset (extracted from N-GlycositeAtlas) using sequence-based features (gapped-dipeptide), predicted structural features, and evolutionary information. DeepNGlyPred produces SN, SP, MCC, and ACC of 88.62%, 73.92%, 0.60, and 79.41%, respectively on N-GlyDE independent test set, which is better than the compared approaches. These results demonstrate that DeepNGlyPred is a robust computational technique to predict N-Linked glycosylation sites confined to N-X-[S/T] sequon. DeepNGlyPred will be a useful resource for the glycobiology community.

1993 ◽  
Vol 289 (3) ◽  
pp. 681-686 ◽  
Author(s):  
M M P Hermans ◽  
H A Wisselaar ◽  
M A Kroos ◽  
B A Oostra ◽  
A J J Reuser

N-linked glycosylation is one of the important events in the post-translational modification of human lysosomal alpha-glucosidase. Phosphorylation of mannose residues ensures efficient transport of the enzyme to the lysosomes via the mannose 6-phosphate receptor. The primary structure of lysosomal alpha-glucosidase, as deduced from the cDNA sequence, indicates that there are seven potential glycosylation sites. We have eliminated these sites individually by site-directed mutagenesis and thereby demonstrated that all seven sites are glycosylated. The sites at Asn-882 and Asn-925 were found to be located in a C-terminal propeptide which is cleaved off during maturation. Evidence is presented that at least two of the oligosaccharide side chains of human lysosomal alpha-glucosidase are phosphorylated. Elimination of six of the seven sites does not disturb enzyme synthesis or function. However, removal of the second glycosylation site at Asn-233 interferes dramatically with the formation of mature enzyme. The mutant precursor is synthesized normally and assembles in the endoplasmic reticulum, but immunoelectron microscopy reveals a deficiency of alpha-glucosidase in the Golgi complex and in the more distal compartments of the lysosomal transport pathway.


2020 ◽  
Vol 27 (3) ◽  
pp. 178-186 ◽  
Author(s):  
Ganesan Pugalenthi ◽  
Varadharaju Nithya ◽  
Kuo-Chen Chou ◽  
Govindaraju Archunan

Background: N-Glycosylation is one of the most important post-translational mechanisms in eukaryotes. N-glycosylation predominantly occurs in N-X-[S/T] sequon where X is any amino acid other than proline. However, not all N-X-[S/T] sequons in proteins are glycosylated. Therefore, accurate prediction of N-glycosylation sites is essential to understand Nglycosylation mechanism. Objective: In this article, our motivation is to develop a computational method to predict Nglycosylation sites in eukaryotic protein sequences. Methods: In this article, we report a random forest method, Nglyc, to predict N-glycosylation site from protein sequence, using 315 sequence features. The method was trained using a dataset of 600 N-glycosylation sites and 600 non-glycosylation sites and tested on the dataset containing 295 Nglycosylation sites and 253 non-glycosylation sites. Nglyc prediction was compared with NetNGlyc, EnsembleGly and GPP methods. Further, the performance of Nglyc was evaluated using human and mouse N-glycosylation sites. Results: Nglyc method achieved an overall training accuracy of 0.8033 with all 315 features. Performance comparison with NetNGlyc, EnsembleGly and GPP methods shows that Nglyc performs better than the other methods with high sensitivity and specificity rate. Conclusion: Our method achieved an overall accuracy of 0.8248 with 0.8305 sensitivity and 0.8182 specificity. Comparison study shows that our method performs better than the other methods. Applicability and success of our method was further evaluated using human and mouse N-glycosylation sites. Nglyc method is freely available at https://github.com/bioinformaticsML/ Ngly.


1997 ◽  
Vol 326 (1) ◽  
pp. 243-247 ◽  
Author(s):  
Gilles MILLAT ◽  
Roseline FROISSART ◽  
Irène MAIRE ◽  
Dominique BOZON

Iduronate sulphatase (IDS) is responsible for mucopolysaccharidosis type II, a rare recessive X-linked lysosomal storage disease. The aim of this work was to evaluate the functional importance of each N-glycosylation site, and of the cysteine-84 residue. IDS mutant cDNAs, lacking one of the eight potential N-glycosylation sites, were expressed in COS cells. Although each of the potential sites was used, none of the eight glycosylation sites appeared to be essential for lysosomal targeting. Another important sulphatase co- or post-translational modification for generating catalytic activity involves the conversion of a cysteine residue surrounded by a conserved sequence C-X-P-S-R into a 2-amino-3-oxopropionic acid residue [Schmidt, Selmer, Ingendoh and von Figura (1995) Cell 82, 271–278]. This conserved cysteine, located at amino acid position 84 in IDS, was replaced either by an alanine (C84A) or by a threonine (C84T) using site-directed mutagenesis. C84A and C84T mutant cDNAs were expressed either in COS cells or in human lymphoblastoid cells deleted for the IDS gene. C84A had a drastic effect both for IDS processing and for catalytic activity. The C84T mutation produced a small amount of mature forms but also abolished enzyme activity, confirming that the cysteine residue at position 84 is required for IDS activity.


2020 ◽  
Author(s):  
Stefan Schulze ◽  
Anne Oltmanns ◽  
Christian Fufezan ◽  
Julia Krägenbring ◽  
Michael Mormann ◽  
...  

AbstractMotivationProtein glycosylation is a complex post-translational modification with crucial cellular functions in all domains of life. Currently, large-scale glycoproteomics approaches rely on glycan database dependent algorithms and are thus unsuitable for discovery-driven analyses of glycoproteomes.ResultsTherefore, we devised SugarPy, a glycan database independent Python module, and validated it on the glycoproteome of human breast milk. We further demonstrated its applicability by analyzing glycoproteomes with uncommon glycans stemming from the green alga Chlamydomonas reinhardtii and the archaeon Haloferax volcanii. SugarPy also facilitated the novel characterization of glycoproteins from the red alga Cyanidioschyzon merolae.AvailabilityThe source code is freely available on GitHub (https://github.com/SugarPy/SugarPy), and its implementation in Python ensures support for all operating [email protected] and [email protected] informationSupplementary data are available online.


Viruses ◽  
2021 ◽  
Vol 13 (7) ◽  
pp. 1281
Author(s):  
Wei Liu ◽  
Junhua Li ◽  
Hongli Du ◽  
Zhihua Ou

Human papillomavirus type 16 (HPV16) is the most prevalent HPV type causing cervical cancers. Herein, using 1597 full genomes, we systemically investigated the mutation profiles, surface protein glycosylation sites and the codon usage bias (CUB) of HPV16 from different lineages and sublineages. Multiple lineage- or sublineage-conserved mutation sites were identified. Glycosylation analysis showed that HPV16 lineage D contained the highest number of different glycosylation sites from lineage A in both L1 and L2 capsid proteins, which might lead to their antigenic distances between the two lineages. CUB analysis showed that the HPV16 open reading frames (ORFs) preferred codons ending with A/T. The CUB of HPV16 ORFs was mainly affected by natural selection except for E1, E5 and L2. HPV16 only shared some of the preferred codons with humans, which might help reduce competition in translational resources. These findings increase our understanding of the heterogeneity between HPV16 lineages and sublineages, and the adaptation mechanism of HPV in human cells. In summary, this study might facilitate HPV classification and improve vaccine development and application.


2020 ◽  
Vol 19 (3) ◽  
pp. 529-539 ◽  
Author(s):  
Freja Scheys ◽  
Els J. M. Van Damme ◽  
Jarne Pauwels ◽  
An Staes ◽  
Kris Gevaert ◽  
...  

Glycosylation is a common modification of proteins and critical for a wide range of biological processes. Differences in protein glycosylation between sexes have already been observed in humans, nematodes and trematodes, and have recently also been reported in the rice pest insect Nilaparvata lugens. Although protein N-glycosylation in insects is nowadays of high interest because of its potential for exploitation in pest control strategies, the functionality of differential N-glycosylation between sexes is yet unknown. In this study, therefore, the occurrence and role of sex-related protein N-glycosylation in insects were examined. A comprehensive investigation of the N-glycosylation sites from the adult stages of N. lugens was conducted, allowing a qualitative and quantitative comparison between sexes at the glycopeptide level. N-glycopeptide enrichment via lectin capturing using the high mannose/paucimannose-binding lectin Concanavalin A, or the Rhizoctonia solani agglutinin which interacts with complex N-glycans, resulted in the identification of over 1300 N-glycosylation sites derived from over 600 glycoproteins. Comparison of these N-glycopeptides revealed striking differences in protein N-glycosylation between sexes. Male- and female-specific N-glycosylation sites were identified, and some of these sex-specific N-glycosylation sites were shown to be derived from proteins with a putative role in insect reproduction. In addition, differential glycan composition between males and females was observed for proteins shared across sexes. Both lectin blotting experiments as well as transcript expression analyses with complete insects and insect tissues confirmed the observed differences in N-glycosylation of proteins between sexes. In conclusion, this study provides further evidence for protein N-glycosylation to be sex-related in insects. Furthermore, original data on N-glycosylation sites of N. lugens adults are presented, providing novel insights into planthopper's biology and information for future biological pest control strategies.


2015 ◽  
Vol 9 ◽  
pp. BBI.S26864 ◽  
Author(s):  
Hebatallah Hassan ◽  
Amr Badr ◽  
M. B. Abdelhalim

O-glycosylation is one of the main types of the mammalian protein glycosylation; it occurs on the particular site of serine (S) or threonine (T). Several O-glycosylation site predictors have been developed. However, a need to get even better prediction tools remains. One challenge in training the classifiers is that the available datasets are highly imbalanced, which makes the classification accuracy for the minority class to become unsatisfactory. In our previous work, we have proposed a new classification approach, which is based on particle swarm optimization (PSO) and random forest (RF); this approach has considered the imbalanced dataset problem. The PSO parameters setting in the training process impacts the classification accuracy. Thus, in this paper, we perform parameters optimization for the PSO algorithm, based on genetic algorithm, in order to increase the classification accuracy. Our proposed genetic algorithm-based approach has shown better performance in terms of area under the receiver operating characteristic curve against existing predictors. In addition, we implemented a glycosylation predictor tool based on that approach, and we demonstrated that this tool could successfully identify candidate glycosylation sites in case study protein.


1995 ◽  
Vol 311 (3) ◽  
pp. 959-967 ◽  
Author(s):  
C Kronman ◽  
B Velan ◽  
D Marcus ◽  
A Ordentlich ◽  
S Reuveny ◽  
...  

The possible role of post-translational modifications such as subunit oligomerization, protein glycosylation and oligosaccharide processing on the circulatory life-time of proteins was studied using recombinant human acetylcholinesterase (rHuAChE). Different preparations of rHuAChE containing various amounts of tetramers, dimers and monomers are cleared at similar rates from the circulation, suggesting that oligomerization does not play an important role in determining the rate of clearance. An engineered rHuAChE mutant containing only one N-glycosylation site was cleared from the circulation more rapidly than the wild-type triglycosylated enzyme. On the other hand, hyperglycosylated mutants containing either four or five occupied N-glycosylation sites, analagous to those present on the slowly cleared fetal bovine serum acetylcholinesterase (FBS-AChE), were also cleared more rapidly from the bloodstream than the wild-type species. Furthermore, the two different tetraglycosylated mutants were cleared at different rates while the pentaglycosylated mutant exhibited the most rapid clearance profile. These results imply that though the number of N-glycosylation sites plays a role in the circulatory life-time of the enzyme, the number of N-glycan units in itself does not determine the rate of clearance. When saturating amounts of asialofetuin were administered together with rHuAChE, the circulatory half-life of the enzyme was dramatically increased (from 80 min to 19 h) and was found to be similar to that displayed by plasma-derived cholinesterases while desialylation of these enzymes caused a sharp decrease in the circulatory half-life to approximately 3-5 min. Determination of the average number of sialic acid residues per enzyme subunit of the five different N-glycosylation species generated, revealed that the rate of clearance is not a function of the absolute number of appended sialic acid moieties but rather of the number of unoccupied sialic acid attachment sites per enzyme molecule. Specifically, we demonstrate an inverse-linear relationship between the number of vacant sialic acid attachment sites and the values of the enzyme residence time within the bloodstream.


2020 ◽  
pp. mcp.R120.002093 ◽  
Author(s):  
Tomislav Caval ◽  
Albert J. R. Heck ◽  
Karli R. Reiding

Mass spectrometry-based glycoproteomics has gone through some incredible developments over the last few years. Technological advances in glycopeptide enrichment, fragmentation methods, and data analysis workflows have enabled the transition of glycoproteomics from a niche application, mainly focused on the characterization of isolated glycoproteins, to a mature technology capable of profiling thousands of intact glycopeptides at once. In addition to numerous biological discoveries catalyzed by the technology, we are also observing an increase in studies focusing on global protein glycosylation and the relationship between multiple glycosylation sites on the same protein. It has become apparent that just describing protein glycosylation in terms of micro- and macro-heterogeneity, respectively the variation and occupancy of glycans at a given site, is not sufficient to describe the observed interactions between sites. In this perspective we propose a new term, meta-heterogeneity, to describe a higher level of glycan regulation: the variation in glycosylation across multiple sites of a given protein. We provide literature examples of extensive meta-heterogeneity on relevant proteins such as antibodies, erythropoietin, myeloperoxidase and a number of serum and plasma proteins. Furthermore, we postulate on the possible biological reasons and causes behind the intriguing meta-heterogeneity observed in glycoproteins.


Author(s):  
Stefan Schulze ◽  
Anne Oltmanns ◽  
Christian Fufezan ◽  
Julia Krägenbring ◽  
Michael Mormann ◽  
...  

Abstract Motivation Protein glycosylation is a complex post-translational modification with crucial cellular functions in all domains of life. Currently, large-scale glycoproteomics approaches rely on glycan database dependent algorithms and are thus unsuitable for discovery-driven analyses of glycoproteomes. Results Therefore, we devised SugarPy, a glycan database independent Python module, and validated it on the glycoproteome of human breast milk. We further demonstrated its applicability by analyzing glycoproteomes with uncommon glycans stemming from the green alga Chlamydomonas reinhardtii and the archaeon Haloferax volcanii. SugarPy also facilitated the novel characterization of glycoproteins from the red alga Cyanidioschyzon merolae. Availability and implementation The source code is freely available on GitHub (https://github.com/SugarPy/SugarPy), and its implementation in Python ensures support for all operating systems. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document