Nglyc: A Random Forest Method for Prediction of N-Glycosylation Sites in Eukaryotic Protein Sequence

2020 ◽  
Vol 27 (3) ◽  
pp. 178-186 ◽  
Author(s):  
Ganesan Pugalenthi ◽  
Varadharaju Nithya ◽  
Kuo-Chen Chou ◽  
Govindaraju Archunan

Background: N-Glycosylation is one of the most important post-translational mechanisms in eukaryotes. N-glycosylation predominantly occurs in N-X-[S/T] sequon where X is any amino acid other than proline. However, not all N-X-[S/T] sequons in proteins are glycosylated. Therefore, accurate prediction of N-glycosylation sites is essential to understand Nglycosylation mechanism. Objective: In this article, our motivation is to develop a computational method to predict Nglycosylation sites in eukaryotic protein sequences. Methods: In this article, we report a random forest method, Nglyc, to predict N-glycosylation site from protein sequence, using 315 sequence features. The method was trained using a dataset of 600 N-glycosylation sites and 600 non-glycosylation sites and tested on the dataset containing 295 Nglycosylation sites and 253 non-glycosylation sites. Nglyc prediction was compared with NetNGlyc, EnsembleGly and GPP methods. Further, the performance of Nglyc was evaluated using human and mouse N-glycosylation sites. Results: Nglyc method achieved an overall training accuracy of 0.8033 with all 315 features. Performance comparison with NetNGlyc, EnsembleGly and GPP methods shows that Nglyc performs better than the other methods with high sensitivity and specificity rate. Conclusion: Our method achieved an overall accuracy of 0.8248 with 0.8305 sensitivity and 0.8182 specificity. Comparison study shows that our method performs better than the other methods. Applicability and success of our method was further evaluated using human and mouse N-glycosylation sites. Nglyc method is freely available at https://github.com/bioinformaticsML/ Ngly.

2012 ◽  
Vol 19 (1) ◽  
pp. 50-56 ◽  
Author(s):  
Ganesan Pugalenthi ◽  
Krishna Kumar Kandaswamy ◽  
Kuo-Chen Chou ◽  
Saravanan Vivekanandan ◽  
Prasanna Kolatkar

Molecules ◽  
2021 ◽  
Vol 26 (23) ◽  
pp. 7314
Author(s):  
Subash C. Pakhrin ◽  
Kiyoko F. Aoki-Kinoshita ◽  
Doina Caragea ◽  
Dukka B. KC

Protein N-linked glycosylation is a post-translational modification that plays an important role in a myriad of biological processes. Computational prediction approaches serve as complementary methods for the characterization of glycosylation sites. Most of the existing predictors for N-linked glycosylation utilize the information that the glycosylation site occurs at the N-X-[S/T] sequon, where X is any amino acid except proline. Not all N-X-[S/T] sequons are glycosylated, thus the N-X-[S/T] sequon is a necessary but not sufficient determinant for protein glycosylation. In that regard, computational prediction of N-linked glycosylation sites confined to N-X-[S/T] sequons is an important problem. Here, we report DeepNGlyPred a deep learning-based approach that encodes the positive and negative sequences in the human proteome dataset (extracted from N-GlycositeAtlas) using sequence-based features (gapped-dipeptide), predicted structural features, and evolutionary information. DeepNGlyPred produces SN, SP, MCC, and ACC of 88.62%, 73.92%, 0.60, and 79.41%, respectively on N-GlyDE independent test set, which is better than the compared approaches. These results demonstrate that DeepNGlyPred is a robust computational technique to predict N-Linked glycosylation sites confined to N-X-[S/T] sequon. DeepNGlyPred will be a useful resource for the glycobiology community.


2020 ◽  
Vol 10 (5) ◽  
pp. 6306-6316

Protein fold prediction is a milestone step towards predicting protein tertiary structure from protein sequence. It is considered one of the most researched topics in the area of Computational Biology. It has applications in the area of structural biology and medicines. Extracting sensitive features for prediction is a key step in protein fold prediction. The actionable features are extracted from keywords of sequence header and secondary structure representations of protein sequence. The keywords holding species information are used as features after verifying with uniref100 dataset using TaxId. Prominent patterns are identified experimentally based on the nature of protein structural class and protein fold. Global and native features are extracted capturing the nature of patterns experimentally. It is found that keywords based features have positive correlation with protein folds. Keywords indicating species are important for observing functional differences which help in guiding the prediction process. SCOPe 2.07 and EDD datasets are used. EDD is a benchmark dataset and SCOPe 2.07 is the latest and largest dataset holding astral protein sequences. The training set of SCOPe 2.07 is trained using 93 dimensional features vector using Random forest algorithm. The prediction results of SCOPe 2.07 test set reports the accuracy of better than 95%. The accuracy achieved on benchmark dataset EDD is better than 93%, which is best reported as per our knowledge.


2015 ◽  
Vol 9 ◽  
pp. BBI.S26864 ◽  
Author(s):  
Hebatallah Hassan ◽  
Amr Badr ◽  
M. B. Abdelhalim

O-glycosylation is one of the main types of the mammalian protein glycosylation; it occurs on the particular site of serine (S) or threonine (T). Several O-glycosylation site predictors have been developed. However, a need to get even better prediction tools remains. One challenge in training the classifiers is that the available datasets are highly imbalanced, which makes the classification accuracy for the minority class to become unsatisfactory. In our previous work, we have proposed a new classification approach, which is based on particle swarm optimization (PSO) and random forest (RF); this approach has considered the imbalanced dataset problem. The PSO parameters setting in the training process impacts the classification accuracy. Thus, in this paper, we perform parameters optimization for the PSO algorithm, based on genetic algorithm, in order to increase the classification accuracy. Our proposed genetic algorithm-based approach has shown better performance in terms of area under the receiver operating characteristic curve against existing predictors. In addition, we implemented a glycosylation predictor tool based on that approach, and we demonstrated that this tool could successfully identify candidate glycosylation sites in case study protein.


2020 ◽  
Vol 16 ◽  
pp. 117693432093449
Author(s):  
Xin-Ke Zhan ◽  
Zhu-Hong You ◽  
Li-Ping Li ◽  
Yang Li ◽  
Zheng Wang ◽  
...  

Protein-protein interactions (PPIs) play a crucial role in the life cycles of living cells. Thus, it is important to understand the underlying mechanisms of PPIs. Although many high-throughput technologies have generated large amounts of PPI data in different organisms, the experiments for detecting PPIs are still costly and time-consuming. Therefore, novel computational methods are urgently needed for predicting PPIs. For this reason, developing a new computational method for predicting PPIs is drawing more and more attention. In this study, we proposed a novel computational method based on texture feature of protein sequence for predicting PPIs. Especially, the Gabor feature is used to extract texture feature and protein evolutionary information from Position-Specific Scoring Matrix, which is generated by Position-Specific Iterated Basic Local Alignment Search Tool. Then, random forest–based classifiers are used to infer the protein interactions. When performed on PPI data sets of yeast, human, and Helicobacter pylori, we obtained good results with average accuracies of 92.10%, 97.03%, and 86.45%, respectively. To better evaluate the proposed method, we compared Gabor feature, Discrete Cosine Transform, and Local Phase Quantization. Our results show that the proposed method is both feasible and stable and the Gabor feature descriptor is reliable in extracting protein sequence information. Furthermore, additional experiments have been conducted to predict PPIs of other 4 species data sets. The promising results indicate that our proposed method is both powerful and robust.


Author(s):  
Erick Renata ◽  
Mewati Ayub

Abstract — Peer to peer lending (P2PL) is one of financial technology (fintech) that develops very fast in society. On the other side, P2PL project has many risks. The risk of  P2PL project can be analyzed using classification. There are two conditions of a loan, namely a good loan and a bad loan. This study uses two methods to analyze a P2PL dataset, that are Random Forest method and Logistic Regression method. Data is taken from P2PL loan dataset provided by Data World, which contains  887.379 entries with 74 features. The result of experiments is a model that can be used to predict and classify a P2PL loan as a good or bad one.   Keywords— Fintech; Logistic Regression; Peer to peer lending; Random forest


2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Apilak Worachartcheewan ◽  
Watshara Shoombuatong ◽  
Phannee Pidetcha ◽  
Wuttichai Nopnithipat ◽  
Virapong Prachayasittikul ◽  
...  

Aims. This study proposes a computational method for determining the prevalence of metabolic syndrome (MS) and to predict its occurrence using the National Cholesterol Education Program Adult Treatment Panel III (NCEP ATP III) criteria. The Random Forest (RF) method is also applied to identify significant health parameters.Materials and Methods. We used data from 5,646 adults aged between 18–78 years residing in Bangkok who had received an annual health check-up in 2008. MS was identified using the NCEP ATP III criteria. The RF method was applied to predict the occurrence of MS and to identify important health parameters surrounding this disorder.Results. The overall prevalence of MS was 23.70% (34.32% for males and 17.74% for females). RF accuracy for predicting MS in an adult Thai population was 98.11%. Further, based on RF, triglyceride levels were the most important health parameter associated with MS.Conclusion. RF was shown to predict MS in an adult Thai population with an accuracy >98% and triglyceride levels were identified as the most informative variable associated with MS. Therefore, using RF to predict MS may be potentially beneficial in identifying MS status for preventing the development of diabetes mellitus and cardiovascular diseases.


2000 ◽  
Vol 348 (3) ◽  
pp. 507-515 ◽  
Author(s):  
Ralph MELCHER ◽  
Alexandra HILLEBRAND ◽  
Ute BAHR ◽  
Bernd SCHRÖDER ◽  
Michael KARAS ◽  
...  

We have studied the elongation of oligosaccharides containing N-acetyl-lactosamine repeats using glycosylated human lysozyme mutants as a model. We reported previously that a combination of glycosylation sites at the 49th (site IV) and 68th (site II) amino acid residues of the protein particularly stimulates the synthesis of N-acetyl-lactosamine repeats [Melcher, Grosch, Grosse and Hasilik (1998) Glycoconjugate J. 15, 987-993]. In the present study we show that it is the carbohydrate attached to site IV that is selectively affected. It contains more N-acetyl-lactosamine repeats when site II is glycosylated in the same molecule. As a corollary of the glycosylation at site II, the synthesis of a third antenna at site IV is increased. The triantennary oligosaccharides at site IV contain more N-acetyl-lactosamine repeats than the biantennary ones. Thus placing a carbohydrate at site II stimulates the branching and the elongation of the carbohydrate at the other site.


Author(s):  
A. V. Crewe

We have become accustomed to differentiating between the scanning microscope and the conventional transmission microscope according to the resolving power which the two instruments offer. The conventional microscope is capable of a point resolution of a few angstroms and line resolutions of periodic objects of about 1Å. On the other hand, the scanning microscope, in its normal form, is not ordinarily capable of a point resolution better than 100Å. Upon examining reasons for the 100Å limitation, it becomes clear that this is based more on tradition than reason, and in particular, it is a condition imposed upon the microscope by adherence to thermal sources of electrons.


Author(s):  
Maxim B. Demchenko ◽  

The sphere of the unknown, supernatural and miraculous is one of the most popular subjects for everyday discussions in Ayodhya – the last of the provinces of the Mughal Empire, which entered the British Raj in 1859, and in the distant past – the space of many legendary and mythological events. Mostly they concern encounters with inhabitants of the “other world” – spirits, ghosts, jinns as well as miraculous healings following magic rituals or meetings with the so-called saints of different religions (Hindu sadhus, Sufi dervishes),with incomprehensible and frightening natural phenomena. According to the author’s observations ideas of the unknown in Avadh are codified and structured in Avadh better than in other parts of India. Local people can clearly define if they witness a bhut or a jinn and whether the disease is caused by some witchcraft or other reasons. Perhaps that is due to the presence in the holy town of a persistent tradition of katha, the public presentation of plots from the Ramayana epic in both the narrative and poetic as well as performative forms. But are the events and phenomena in question a miracle for the Avadhvasis, residents of Ayodhya and its environs, or are they so commonplace that they do not surprise or fascinate? That exactly is the subject of the essay, written on the basis of materials collected by the author in Ayodhya during the period of 2010 – 2019. The author would like to express his appreciation to Mr. Alok Sharma (Faizabad) for his advice and cooperation.


Sign in / Sign up

Export Citation Format

Share Document