Recent development of machine learning methods in sumoylation sites prediction

: Sumoylation of proteins is an important reversible post-translational modification of proteins and mediates a variety of cellular processes. Sumo-modified proteins can change their subcellular localization, activity and stability. In addition, it also plays an important role in various cellular processes such as transcriptional regulation and signal transduction. The abnormal sumoylation is involved in many diseases, including neurodegeneration and immune-related diseases, as well as the development of cancer. Therefore, identification of the sumoylation site (SUMO site) is fundamental to understanding their molecular mechanisms and regulatory roles. In contrast to labor-intensive and costly experimental approaches, computational prediction of sumoylation sites in silico also attracted much attention for its accuracy, convenience and speed. At present, many computational prediction models have been used to identify SUMO sites, but these contents have not been comprehensively summarized and reviewed. Therefore, the research progress of relevant models is summarized and discussed in this paper. We will briefly summarize the development of bioinformatics methods on sumoylation site prediction. We will mainly focus on the benchmark dataset construction, feature extraction, machine learning method, published results and online tools. We hope the review will provide more help for wet-experimental scholars.

Download Full-text

PredNTS: Improved and Robust Prediction of Nitrotyrosine Sites by Integrating Multiple Sequence Features

International Journal of Molecular Sciences ◽

10.3390/ijms22052704 ◽

2021 ◽

Vol 22 (5) ◽

pp. 2704

Author(s):

Andi Nur Nilamyani ◽

Firda Nurul Auliah ◽

Mohammad Ali Moni ◽

Watshara Shoombuatong ◽

Md Mehedi Hasan ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Web Application ◽

Computational Prediction ◽

Vital Role ◽

Machine Learning Algorithms ◽

Recursive Feature Elimination ◽

Post Translational Modification ◽

Multiple Sequence ◽

Sequence Features

Nitrotyrosine, which is generated by numerous reactive nitrogen species, is a type of protein post-translational modification. Identification of site-specific nitration modification on tyrosine is a prerequisite to understanding the molecular function of nitrated proteins. Thanks to the progress of machine learning, computational prediction can play a vital role before the biological experimentation. Herein, we developed a computational predictor PredNTS by integrating multiple sequence features including K-mer, composition of k-spaced amino acid pairs (CKSAAP), AAindex, and binary encoding schemes. The important features were selected by the recursive feature elimination approach using a random forest classifier. Finally, we linearly combined the successive random forest (RF) probability scores generated by the different, single encoding-employing RF models. The resultant PredNTS predictor achieved an area under a curve (AUC) of 0.910 using five-fold cross validation. It outperformed the existing predictors on a comprehensive and independent dataset. Furthermore, we investigated several machine learning algorithms to demonstrate the superiority of the employed RF algorithm. The PredNTS is a useful computational resource for the prediction of nitrotyrosine sites. The web-application with the curated datasets of the PredNTS is publicly available.

Download Full-text

Bioinformatics of Metalloproteins and Metalloproteomes

Molecules ◽

10.3390/molecules25153366 ◽

2020 ◽

Vol 25 (15) ◽

pp. 3366

Author(s):

Yan Zhang ◽

Junge Zheng

Keyword(s):

Comparative Genomics ◽

Metal Binding ◽

Computational Prediction ◽

Bioinformatic Analysis ◽

Research Progress ◽

Evolutionary Patterns ◽

Cellular Processes ◽

Metal Binding Sites ◽

Domains Of Life ◽

Systematic Understanding

Trace metals are inorganic elements that are required for all organisms in very low quantities. They serve as cofactors and activators of metalloproteins involved in a variety of key cellular processes. While substantial effort has been made in experimental characterization of metalloproteins and their functions, the application of bioinformatics in the research of metalloproteins and metalloproteomes is still limited. In the last few years, computational prediction and comparative genomics of metalloprotein genes have arisen, which provide significant insights into their distribution, function, and evolution in nature. This review aims to offer an overview of recent advances in bioinformatic analysis of metalloproteins, mainly focusing on metalloprotein prediction and the use of different metals across the tree of life. We describe current computational approaches for the identification of metalloprotein genes and metal-binding sites/patterns in proteins, and then introduce a set of related databases. Furthermore, we discuss the latest research progress in comparative genomics of several important metals in both prokaryotes and eukaryotes, which demonstrates divergent and dynamic evolutionary patterns of different metalloprotein families and metalloproteomes. Overall, bioinformatic studies of metalloproteins provide a foundation for systematic understanding of trace metal utilization in all three domains of life.

Download Full-text

In silico prediction of chemical neurotoxicity using machine learning

Toxicology Research ◽

10.1093/toxres/tfaa016 ◽

2020 ◽

Vol 9 (3) ◽

pp. 164-172

Author(s):

Changsheng Jiang ◽

Piaopiao Zhao ◽

Weihua Li ◽

Yun Tang ◽

Guixia Liu

Keyword(s):

Machine Learning ◽

Regression Models ◽

Cross Validation ◽

Prediction Models ◽

Drug Withdrawal ◽

Molecular Descriptors ◽

Computational Prediction ◽

Machine Learning Algorithms ◽

Training Set ◽

Data Set

Abstract Neurotoxicity is one of the main causes of drug withdrawal, and the biological experimental methods of detecting neurotoxic toxicity are time-consuming and laborious. In addition, the existing computational prediction models of neurotoxicity still have some shortcomings. In response to these shortcomings, we collected a large number of data set of neurotoxicity and used PyBioMed molecular descriptors and eight machine learning algorithms to construct regression prediction models of chemical neurotoxicity. Through the cross-validation and test set validation of the models, it was found that the extra-trees regressor model had the best predictive effect on neurotoxicity (${q}_{\mathrm{test}}^2$ = 0.784). In addition, we get the applicability domain of the models by calculating the standard deviation distance and the lever distance of the training set. We also found that some molecular descriptors are closely related to neurotoxicity by calculating the contribution of the molecular descriptors to the models. Considering the accuracy of the regression models, we recommend using the extra-trees regressor model to predict the chemical autonomic neurotoxicity.

Download Full-text

When the chains do not break: the role of USP10 in physiology and pathology

Cell Death and Disease ◽

10.1038/s41419-020-03246-7 ◽

2020 ◽

Vol 11 (12) ◽

Author(s):

Udayan Bhattacharya ◽

Fiifi Neizer-Ashun ◽

Priyabrata Mukherjee ◽

Resham Bhattacharya

Keyword(s):

Complex Formation ◽

Molecular Mechanisms ◽

Protein Homeostasis ◽

Post Translational Modification ◽

Intracellular Protein ◽

Lysosomal Degradation ◽

Life Activity ◽

Cellular Processes ◽

Pathological Conditions

AbstractDeubiquitination is now understood to be as important as its partner ubiquitination for the maintenance of protein half-life, activity, and localization under both normal and pathological conditions. The enzymes that remove ubiquitin from target proteins are called deubiquitinases (DUBs) and they regulate a plethora of cellular processes. DUBs are essential enzymes that maintain intracellular protein homeostasis by recycling ubiquitin. Ubiquitination is a post-translational modification where ubiquitin molecules are added to proteins thus influencing activation, localization, and complex formation. Ubiquitin also acts as a tag for protein degradation, especially by proteasomal or lysosomal degradation systems. With ~100 members, DUBs are a large enzyme family; the ubiquitin-specific peptidases (USPs) being the largest group. USP10, an important member of this family, has enormous significance in diverse cellular processes and many human diseases. In this review, we discuss recent studies that define the roles of USP10 in maintaining cellular function, its involvement in human pathologies, and the molecular mechanisms underlying its association with cancer and neurodegenerative diseases. We also discuss efforts to modulate USPs as therapy in these diseases.

Download Full-text

Combined use of feature engineering and machine-learning to predict essential genes in Drosophila melanogaster

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa051 ◽

2020 ◽

Vol 2 (3) ◽

Cited By ~ 1

Author(s):

Tulio L Campos ◽

Pasi K Korhonen ◽

Andreas Hofmann ◽

Robin B Gasser ◽

Neil D Young

Keyword(s):

Machine Learning ◽

Drosophila Melanogaster ◽

Molecular Mechanisms ◽

Deep Understanding ◽

Computational Prediction ◽

Essential Genes ◽

Model Species ◽

Combined Use ◽

Vinegar Fly ◽

Computational Predictions

Abstract Characterizing genes that are critical for the survival of an organism (i.e. essential) is important to gain a deep understanding of the fundamental cellular and molecular mechanisms that sustain life. Functional genomic investigations of the vinegar fly, Drosophila melanogaster, have unravelled the functions of numerous genes of this model species, but results from phenomic experiments can sometimes be ambiguous. Moreover, the features underlying gene essentiality are poorly understood, posing challenges for computational prediction. Here, we harnessed comprehensive genomic-phenomic datasets publicly available for D. melanogaster and a machine-learning-based workflow to predict essential genes of this fly. We discovered strong predictors of such genes, paving the way for computational predictions of essentiality in less-studied arthropod pests and vectors of infectious diseases.

Download Full-text

Computational analysis and prediction of lysine malonylation sites by exploiting informative features in an integrative machine-learning framework

Briefings in Bioinformatics ◽

10.1093/bib/bby079 ◽

2018 ◽

Vol 20 (6) ◽

pp. 2185-2199 ◽

Cited By ~ 32

Author(s):

Yanju Zhang ◽

Ruopeng Xie ◽

Jiawei Wang ◽

André Leier ◽

Tatiana T Marquez-Lago ◽

...

Keyword(s):

Machine Learning ◽

Computational Methods ◽

Prediction Models ◽

Gradient Boosting ◽

Support Vector ◽

Post Translational Modification ◽

K Nearest Neighbor ◽

Ensemble Models ◽

Light Gradient ◽

Optimal Ensemble

AbstractAs a newly discovered post-translational modification (PTM), lysine malonylation (Kmal) regulates a myriad of cellular processes from prokaryotes to eukaryotes and has important implications in human diseases. Despite its functional significance, computational methods to accurately identify malonylation sites are still lacking and urgently needed. In particular, there is currently no comprehensive analysis and assessment of different features and machine learning (ML) methods that are required for constructing the necessary prediction models. Here, we review, analyze and compare 11 different feature encoding methods, with the goal of extracting key patterns and characteristics from residue sequences of Kmal sites. We identify optimized feature sets, with which four commonly used ML methods (random forest, support vector machines, K-nearest neighbor and logistic regression) and one recently proposed [Light Gradient Boosting Machine (LightGBM)] are trained on data from three species, namely, Escherichia coli, Mus musculus and Homo sapiens, and compared using randomized 10-fold cross-validation tests. We show that integration of the single method-based models through ensemble learning further improves the prediction performance and model robustness on the independent test. When compared to the existing state-of-the-art predictor, MaloPred, the optimal ensemble models were more accurate for all three species (AUC: 0.930, 0.923 and 0.944 for E. coli, M. musculus and H. sapiens, respectively). Using the ensemble models, we developed an accessible online predictor, kmal-sp, available at http://kmalsp.erc.monash.edu/. We hope that this comprehensive survey and the proposed strategy for building more accurate models can serve as a useful guide for inspiring future developments of computational methods for PTM site prediction, expedite the discovery of new malonylation and other PTM types and facilitate hypothesis-driven experimental validation of novel malonylated substrates and malonylation sites.

Download Full-text

Molecular basis for specificity of the Met1-linked polyubiquitin signal

Biochemical Society Transactions ◽

10.1042/bst20160227 ◽

2016 ◽

Vol 44 (6) ◽

pp. 1581-1602 ◽

Cited By ~ 7

Author(s):

Paul R. Elliott

Keyword(s):

Molecular Mechanisms ◽

Cellular Responses ◽

Signalling Pathways ◽

Post Translational Modification ◽

Polyubiquitin Chains ◽

Cellular Processes ◽

Protein Ubiquitination ◽

Ubiquitin Binding ◽

Inflammatory Signalling ◽

Unique Ability

The post-translational modification of proteins provides a rapid and versatile system for regulating all signalling pathways. Protein ubiquitination is one such type of post-translational modification involved in controlling numerous cellular processes. The unique ability of ubiquitin to form polyubiquitin chains creates a highly complex code responsible for different subsequent signalling outcomes. Specialised enzymes (‘writers’) generate the ubiquitin code, whereas other enzymes (‘erasers’) disassemble it. Importantly, the ubiquitin code is deciphered by different ubiquitin-binding proteins (‘readers’) functioning to elicit particular cellular responses. Ten years ago, the methionine1 (Met1)-linked (linear) polyubiquitin code was first identified and the intervening years have witnessed a seismic shift in our understanding of Met1-linked polyubiquitin in cellular processes, particularly inflammatory signalling. This review will discuss the molecular mechanisms of specificity determination within Met1-linked polyubiquitin signalling.

Download Full-text

Protein SUMOylation modification and its associations with disease

Open Biology ◽

10.1098/rsob.170167 ◽

2017 ◽

Vol 7 (10) ◽

pp. 170167 ◽

Cited By ~ 48

Author(s):

Yanfang Yang ◽

Yu He ◽

Xixi Wang ◽

Ziwei liang ◽

Gu He ◽

...

Keyword(s):

Mass Spectrometry ◽

Cell Growth ◽

High Throughput ◽

Molecular Mechanisms ◽

Cellular Responses ◽

Biological Functions ◽

Post Translational Modification ◽

Sumoylation Site ◽

Signal Crosstalk ◽

Protein Sumoylation

SUMOylation, as a post-translational modification, plays essential roles in various biological functions including cell growth, migration, cellular responses to stress and tumorigenesis. The imbalance of SUMOylation and deSUMOylation has been associated with the occurrence and progression of various diseases. Herein, we summarize and discuss the signal crosstalk between SUMOylation and ubiquitination of proteins, protein SUMOylation relations with several diseases, and the identification approaches for SUMOylation site. With the continuous development of bioinformatics and mass spectrometry, several accurate and high-throughput methods have been implemented to explore small ubiquitin-like modifier-modified substrates and sites, which is helpful for deciphering protein SUMOylation-mediated molecular mechanisms of disease.

Download Full-text

Into the Seed: Auxin Controls Seed Development and Grain Yield

International Journal of Molecular Sciences ◽

10.3390/ijms21051662 ◽

2020 ◽

Vol 21 (5) ◽

pp. 1662 ◽

Cited By ~ 3

Author(s):

Jinshan Cao ◽

Guoji Li ◽

Dejie Qu ◽

Xia Li ◽

Youning Wang

Keyword(s):

Grain Yield ◽

Seed Development ◽

Seed Weight ◽

Molecular Mechanisms ◽

Crop Yields ◽

Molecular Regulation ◽

Research Progress ◽

Comprehensive Review ◽

Cellular Processes

Seed development, which involves mainly the embryo, endosperm and integuments, is regulated by different signaling pathways, leading to various changes in seed size or seed weight. Therefore, uncovering the genetic and molecular mechanisms of seed development has great potential for improving crop yields. The phytohormone auxin is a key regulator required for modulating different cellular processes involved in seed development. Here, we provide a comprehensive review of the role of auxin biosynthesis, transport, signaling, conjugation, and catabolism during seed development. More importantly, we not only summarize the research progress on the genetic and molecular regulation of seed development mediated by auxin but also discuss the potential of manipulating auxin metabolism and its signaling pathway for improving crop seed weight.

Download Full-text

Post-translational lysine ac(et)ylation in health, ageing and disease

Biological Chemistry ◽

10.1515/hsz-2021-0139 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Anna-Theresa Blasl ◽

Sabrina Schulze ◽

Chuan Qin ◽

Leonie G. Graf ◽

Robert Vogt ◽

...

Keyword(s):

Protein Function ◽

Molecular Mechanisms ◽

Research Field ◽

Ageing Process ◽

Post Translational Modification ◽

Direct Role ◽

Translational Machinery ◽

Beneficial Effects ◽

Cellular Processes ◽

Technological Advances

Abstract The acetylation/acylation (ac(et)ylation) of lysine side chains is a dynamic post-translational modification (PTM) regulating fundamental cellular processes with implications on the organisms’ ageing process: metabolism, transcription, translation, cell proliferation, regulation of the cytoskeleton and DNA damage repair. First identified to occur on histones, later studies revealed the presence of lysine ac(et)ylation in organisms of all kingdoms of life, in proteins covering all essential cellular processes. A remarkable finding showed that the NAD+-dependent sirtuin deacetylase Sir2 has an impact on replicative lifespan in Saccharomyces cerevisiae suggesting that lysine acetylation has a direct role in the ageing process. Later studies identified sirtuins as mediators for beneficial effects of caloric/dietary restriction on the organisms’ health- or lifespan. However, the molecular mechanisms underlying these effects are only incompletely understood. Progress in mass-spectrometry, structural biology, synthetic and semi-synthetic biology deepened our understanding of this PTM. This review summarizes recent developments in the research field. It shows how lysine ac(et)ylation regulates protein function, how it is regulated enzymatically and non-enzymatically, how a dysfunction in this post-translational machinery contributes to disease development. A focus is set on sirtuins and lysine acyltransferases as these are direct sensors and mediators of the cellular metabolic state. Finally, this review highlights technological advances to study lysine ac(et)ylation.

Download Full-text