Moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods

Abstract Background: Moonlighting proteins (MPs) are a subclass of multifunctional proteins in which more than one independent or usually distinct function occurs in a single polypeptide chain. Identification of unknown cellular processes, understanding novel protein mechanisms, improving the prediction of protein functions, and gaining information about protein evolution are the main reasons to study MPs. They also play an important role in disease pathways and drug-target discovery. Since detecting MPs experimentally is quite a challenge, most of them were detected randomly. Therefore, introducing an appropriate computational approach seems to be rational. Results: In this study, we would like to represent a competent model for detecting moonlighting and non-moonlighting proteins by extracted features from protein sequences. Then, we will represent a scheme for detecting outlier proteins. To do so, 15 distinct feature vectors were used to study each one's effect on detecting MPs. Furthermore, 8 different classification methods were assessed to find the best performance. To detect outliers, each one of the classifications was implemented 100 times by 10 fold cross-validation on feature vectors, then proteins which miss classified 80 times or more, were grouped. This process was applied to every single feature vector and in the end, the intersection of these groups was determined as the outlier proteins. The results of 10 fold cross-validation on a dataset of 351 samples (containing 215 moonlighting and 136 non-moonlighting proteins) show that the decision tree method on all feature vectors has the highest performance among all methods in this research and also in other available methods. Besides, the study of outliers shows that 57 of 351 proteins in the dataset could be an appropriate candidate for the outlier. Among the outlier proteins, there are non-moonlighting proteins (such as P69797) that have been misclassified by 8 different classification methods with 16 different feature types. Because these moonlighting proteins have been obtained by computational methods, the results of this study could reduce the likelihood of hypothesizing that, these proteins are non-moonlighting. Conclusions: Moonlighting proteins are difficult to identify by experiments. Our method enables identification of novel moonlighting proteins using distinct feature vectors. It also indicates that a number of non-moonlight proteins are likely to be moonlight.

Download Full-text

Moonlighting protein prediction using physico-chemical and evolutional properties via machine learning methods

BMC Bioinformatics ◽

10.1186/s12859-021-04194-5 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Farshid Shirafkan ◽

Sajjad Gharaghani ◽

Karim Rahimian ◽

Reza Hasan Sajedi ◽

Javad Zahiri

Keyword(s):

Cross Validation ◽

Classification Methods ◽

Drug Target Discovery ◽

Feature Vectors ◽

Single Polypeptide Chain ◽

Protein Functions ◽

Moonlighting Proteins ◽

Tenfold Cross Validation ◽

Single Polypeptide ◽

Set Up

Abstract Background Moonlighting proteins (MPs) are a subclass of multifunctional proteins in which more than one independent or usually distinct function occurs in a single polypeptide chain. Identification of unknown cellular processes, understanding novel protein mechanisms, improving the prediction of protein functions, and gaining information about protein evolution are the main reasons to study MPs. They also play an important role in disease pathways and drug-target discovery. Since detecting MPs experimentally is quite a challenge, most of them are detected randomly. Therefore, introducing an appropriate computational approach to predict MPs seems reasonable. Results In this study, we introduced a competent model for detecting moonlighting and non-MPs through extracted features from protein sequences. We attempted to set up a well-judged scheme for detecting outlier proteins. Consequently, 37 distinct feature vectors were utilized to study each protein’s impact on detecting MPs. Furthermore, 8 different classification methods were assessed to find the best performance. To detect outliers, each one of the classifications was executed 100 times by tenfold cross-validation on feature vectors; proteins which misclassified 90 times or more were grouped. This process was applied to every single feature vector and eventually the intersection of these groups was determined as the outlier proteins. The results of tenfold cross-validation on a dataset of 351 samples (containing 215 moonlighting and 136 non-moonlighting proteins) reveal that the SVM method on all feature vectors has the highest performance among all methods in this study and other available methods. Besides, the study of outliers showed that 57 of 351 proteins in the dataset could be an appropriate candidate for the outlier. Among the outlier proteins, there were non-MPs (such as P69797) that have been misclassified in 8 different classification methods with 16 different feature vectors. Because these proteins have been obtained by computational methods, the results of this study could reduce the likelihood of hypothesizing whether these proteins are non-moonlighting at all. Conclusions MPs are difficult to be identified through experimentation. Using distinct feature vectors, our method enabled identification of novel moonlighting proteins. The study also pinpointed that a number of non-MPs are likely to be moonlighting.

Download Full-text

A neural network-based method for polypharmacy side effects prediction

BMC Bioinformatics ◽

10.1186/s12859-021-04298-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Raziyeh Masumshah ◽

Rosa Aghdam ◽

Changiz Eslahchi

Keyword(s):

Neural Network ◽

Side Effects ◽

Network Architecture ◽

Cross Validation ◽

Running Time ◽

Feature Vectors ◽

Concurrent Use ◽

Large Numbers ◽

New Feature ◽

Fold Cross Validation

Abstract Background Polypharmacy is a type of treatment that involves the concurrent use of multiple medications. Drugs may interact when they are used simultaneously. So, understanding and mitigating polypharmacy side effects are critical for patient safety and health. Since the known polypharmacy side effects are rare and they are not detected in clinical trials, computational methods are developed to model polypharmacy side effects. Results We propose a neural network-based method for polypharmacy side effects prediction (NNPS) by using novel feature vectors based on mono side effects, and drug–protein interaction information. The proposed method is fast and efficient which allows the investigation of large numbers of polypharmacy side effects. Our novelty is defining new feature vectors for drugs and combining them with a neural network architecture to apply for the context of polypharmacy side effects prediction. We compare NNPS on a benchmark dataset to predict 964 polypharmacy side effects against 5 well-established methods and show that NNPS achieves better results than the results of all 5 methods in terms of accuracy, complexity, and running time speed. NNPS outperforms about 9.2% in Area Under the Receiver-Operating Characteristic, 12.8% in Area Under the Precision–Recall Curve, 8.6% in F-score, 10.3% in Accuracy, and 18.7% in Matthews Correlation Coefficient with 5-fold cross-validation against the best algorithm among other well-established methods (Decagon method). Also, the running time of the Decagon method which is 15 days for one fold of cross-validation is reduced to 8 h by the NNPS method. Conclusions The performance of NNPS is benchmarked against 5 well-known methods, Decagon, Concatenated drug features, Deep Walk, DEDICOM, and RESCAL, for 964 polypharmacy side effects. We adopt the 5-fold cross-validation for 50 iterations and use the average of the results to assess the performance of the NNPS method. The evaluation of the NNPS against five well-known methods, in terms of accuracy, complexity, and running time speed shows the performance of the presented method for an essential and challenging problem in pharmacology. Datasets and code for NNPS algorithm are freely accessible at https://github.com/raziyehmasumshah/NNPS.

Download Full-text

Collaborative Classification Approach for Airline Tweets Using Sentiment Analysis

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i3.1639 ◽

2021 ◽

Vol 12 (3) ◽

pp. 3597-3603

Author(s):

M.Veera Kumari Et.al

Keyword(s):

Sentiment Analysis ◽

Cross Validation ◽

Majority Voting ◽

Quality Of Services ◽

Classification Methods ◽

K Nearest Neighbors ◽

Classification Techniques ◽

Improve Accuracy ◽

Fold Cross Validation

In the world there are so many airline services which facilitate different airline facilities for their customers. Those airline services may satisfy or may not satisfy their customers. Customers cannot express their comments immediately, so airline services provide the twitter blog to give the feedback on their services. Twitter has been increased to develop the quality of services[4]. This paper develop the different classification techniques to improve accuracy for sentiment analysis. The tweets of services are classified into three polarities such as positive, negative and neutral. Classification methods are Random forest(RF), Logistic Regression(LR), K-Nearest Neighbors(KNN), Naïve Baye’s(NB), Decision Tree(DTC), Extreme Gradient Boost(XGB), merging of (two, three and four) classification techniques with majority Voting Classifier, AdaBoost measuring the accuracy achieved by the function using 20-fold and 30-fold cross validation was compassed in the validation phase. In this paper proposes a new ensemble Bagging approach for different classifiers[10]. The metrics of sentiment analysis precision, recall, f1-score, micro average, macro average and accuracy are discovered for all above mentioned classification techniques. In addition average predictions of classifiers and also accuracy of average predictions of classifiers was calculated for getting good quality of services. The result describes that bagging classifiers achieve better accuracy than non-bagging classifiers.

Download Full-text

Multitalented actors inside and outside the cell: recent discoveries add to the number of moonlighting proteins

Biochemical Society Transactions ◽

10.1042/bst20190798 ◽

2019 ◽

Vol 47 (6) ◽

pp. 1941-1948 ◽

Cited By ~ 9

Author(s):

Constance J. Jeffery

Keyword(s):

Molecular Mechanisms ◽

Food Crops ◽

Bacterial Strains ◽

Cellular Processes ◽

Biochemical Pathways ◽

The Past ◽

Single Polypeptide Chain ◽

New Antibiotics ◽

Moonlighting Proteins ◽

Single Polypeptide

During the past few decades, it's become clear that many enzymes evolved not only to act as specific, finely tuned and carefully regulated catalysts, but also to perform a second, completely different function in the cell. In general, these moonlighting proteins have a single polypeptide chain that performs two or more distinct and physiologically relevant biochemical or biophysical functions. This mini-review describes examples of moonlighting proteins that have been found within the past few years, including some that play key roles in human and animal diseases and in the regulation of biochemical pathways in food crops. Several belong to two of the most common subclasses of moonlighting proteins: trigger enzymes and intracellular/surface moonlighting proteins, but a few represent less often observed combinations of functions. These examples also help illustrate some of the current methods used for identifying proteins with multiple functions. In general, a greater understanding about the functions and molecular mechanisms of moonlighting proteins, their roles in the regulation of cellular processes, and their involvement in health and disease could aid in many areas including developing new antibiotics, predicting the functions of the millions of proteins being identified through genome sequencing projects, designing novel proteins, using biological circuitry analysis to construct bacterial strains that are better producers of materials for industrial use, and developing methods to tweak biochemical pathways for increasing yields of food crops.

Download Full-text

An introduction to protein moonlighting

Biochemical Society Transactions ◽

10.1042/bst20140226 ◽

2014 ◽

Vol 42 (6) ◽

pp. 1679-1683 ◽

Cited By ~ 79

Author(s):

Constance J. Jeffery

Keyword(s):

Transcription Factors ◽

Protein Sequence ◽

Polypeptide Chain ◽

Biochemical Pathways ◽

Single Polypeptide Chain ◽

Potential Benefits ◽

Moonlighting Proteins ◽

Structure Databases ◽

Single Polypeptide ◽

And Function

Moonlighting proteins comprise a class of multifunctional proteins in which a single polypeptide chain performs multiple physiologically relevant biochemical or biophysical functions. Almost 300 proteins have been found to moonlight. The known examples of moonlighting proteins include diverse types of proteins, including receptors, enzymes, transcription factors, adhesins and scaffolds, and different combinations of functions are observed. Moonlighting proteins are expressed throughout the evolutionary tree and function in many different biochemical pathways. Some moonlighting proteins can perform both functions simultaneously, but for others, the protein's function changes in response to changes in the environment. The diverse examples of moonlighting proteins already identified, and the potential benefits moonlighting proteins might provide to the organism, such as through coordinating cellular activities, suggest that many more moonlighting proteins are likely to be found. Continuing studies of the structures and functions of moonlighting proteins will aid in predicting the functions of proteins identified through genome sequencing projects, in interpreting results from proteomics experiments, in understanding how different biochemical pathways interact in systems biology, in annotating protein sequence and structure databases, in studies of protein evolution and in the design of proteins with novel functions.

Download Full-text

MoonProt 3.0: an update of the moonlighting proteins database

Nucleic Acids Research ◽

10.1093/nar/gkaa1101 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D368-D372

Author(s):

Chang Chen ◽

Haipeng Liu ◽

Shadi Zabad ◽

Nina Rivera ◽

Emily Rowin ◽

...

Keyword(s):

Data Bank ◽

Transmembrane Helices ◽

Additional Information ◽

Single Polypeptide Chain ◽

Open Access Database ◽

Moonlighting Proteins ◽

Version 2.0 ◽

Single Polypeptide ◽

Scop Classification ◽

Relationship Of

Abstract MoonProt 3.0 (http://moonlightingproteins.org) is an updated open-access database storing expert-curated annotations for moonlighting proteins. Moonlighting proteins have two or more physiologically relevant distinct biochemical or biophysical functions performed by a single polypeptide chain. Here, we describe an expansion in the database since our previous report in the Database Issue of Nucleic Acids Research in 2018. For this release, the number of proteins annotated has been expanded to over 500 proteins and dozens of protein annotations have been updated with additional information, including more structures in the Protein Data Bank, compared with version 2.0. The new entries include more examples from humans, plants and archaea, more proteins involved in disease and proteins with different combinations of functions. More kinds of information about the proteins and the species in which they have multiple functions has been added, including CATH and SCOP classification of structure, known and predicted disorder, predicted transmembrane helices, type of organism, relationship of the protein to disease, and relationship of organism to cause of disease.

Download Full-text

Pathogen Moonlighting Proteins: From Ancestral Key Metabolic Enzymes to Virulence Factors

Microorganisms ◽

10.3390/microorganisms9061300 ◽

2021 ◽

Vol 9 (6) ◽

pp. 1300

Author(s):

Luis Franco-Serrano ◽

David Sánchez-Redondo ◽

Araceli Nájar-García ◽

Sergio Hernández ◽

Isaac Amela ◽

...

Keyword(s):

Primary Metabolism ◽

Protein Targets ◽

Pathogen Virulence ◽

Single Polypeptide Chain ◽

Moonlighting Proteins ◽

Single Polypeptide ◽

Main Ideas ◽

Moonlighting Functions ◽

Work First ◽

Host Tissues

Moonlighting and multitasking proteins refer to proteins with two or more functions performed by a single polypeptide chain. An amazing example of the Gain of Function (GoF) phenomenon of these proteins is that 25% of the moonlighting functions of our Multitasking Proteins Database (MultitaskProtDB-II) are related to pathogen virulence activity. Moreover, they usually have a canonical function belonging to highly conserved ancestral key functions, and their moonlighting functions are often involved in inducing extracellular matrix (ECM) protein remodeling. There are three main questions in the context of moonlighting proteins in pathogen virulence: (A) Why are a high percentage of pathogen moonlighting proteins involved in virulence? (B) Why do most of the canonical functions of these moonlighting proteins belong to primary metabolism? Moreover, why are they common in many pathogen species? (C) How are these different protein sequences and structures able to bind the same set of host ECM protein targets, mainly plasminogen (PLG), and colonize host tissues? By means of an extensive bioinformatics analysis, we suggest answers and approaches to these questions. There are three main ideas derived from the work: first, moonlighting proteins are not good candidates for vaccines. Second, several motifs that might be important in the adhesion to the ECM were identified. Third, an overrepresentation of GO codes related with virulence in moonlighting proteins were seen.

Download Full-text

Combination of Support Vector Machine and K-Fold cross-validation for prediction of long-term degradation of the compressive strength of marine concrete

International Journal of Computational Physics Series ◽

10.29167/a1i1p120-130 ◽

2018 ◽

Vol 1 (1) ◽

pp. 120-130 ◽

Cited By ~ 1

Author(s):

Chunxiang Qian ◽

Wence Kang ◽

Hao Ling ◽

Hua Dong ◽

Chengyao Liang ◽

...

Keyword(s):

Support Vector Machine ◽

Environmental Factors ◽

Cross Validation ◽

Concrete Strength ◽

Simulation Method ◽

Support Vector ◽

Svm Model ◽

Artificial Neural Network Ann ◽

Influence Degree ◽

Fold Cross Validation

Support Vector Machine (SVM) model optimized by K-Fold cross-validation was built to predict and evaluate the degradation of concrete strength in a complicated marine environment. Meanwhile, several mathematical models, such as Artificial Neural Network (ANN) and Decision Tree (DT), were also built and compared with SVM to determine which one could make the most accurate predictions. The material factors and environmental factors that influence the results were considered. The materials factors mainly involved the original concrete strength, the amount of cement replaced by fly ash and slag. The environmental factors consisted of the concentration of Mg2+, SO42-, Cl-, temperature and exposing time. It was concluded from the prediction results that the optimized SVM model appeared to perform better than other models in predicting the concrete strength. Based on SVM model, a simulation method of variables limitation was used to determine the sensitivity of various factors and the influence degree of these factors on the degradation of concrete strength.

Download Full-text

Rancang Bangun Sistem Informasi Untuk Menentukan Kapabilitas Konsumen Dalam Mengambil Pinjaman KPR

Jurnal ULTIMA InfoSys ◽

10.31937/si.v7i2.543 ◽

2016 ◽

Vol 7 (2) ◽

pp. 75-80

Author(s):

Adhi Kusnadi ◽

Risyad Ananda Putra

Keyword(s):

Data Mining ◽

Low Income ◽

Cross Validation ◽

Classification Tree ◽

Large Population ◽

Housing Development ◽

Good Precision ◽

Index Terms ◽

The Government ◽

Fold Cross Validation

Indonesia is one country that has a relatively large population . The government in the period of 5 years, annually hold a procurement program 1 million FLPP house units. This program is held in an effort to provide a decent home for low income people. FLPP housing development requires good precision and speed of development on the part of the developer, this is often hampered by the bank process, because it is difficult to predict the results and speed of data processing in the bank. Knowing the ability of consumers to get subsidized credit, has many advantages, among others, developers can plan a better cash flow, and developers can replace consumers who will be rejected before entering the bank process. For that reason built a system that can help developers. There are many methods that can be used to create this application. One of them is data mining with Classification tree. The results of 10-fold-cross-validation applications have an accuracy of 92%. Index Terms-Data Mining, Classification Tree, Housing, FLPP, 10-fold-cross Validation, Consumer Capability

Download Full-text

Klasifikasi Berita Kriminal Menggunakan NaÃ¯ve Bayes Classifier (NBC) dengan Pengujian K-Fold Cross Validation

Jurnal Sains dan Informatika ◽

10.34128/jsi.v5i2.177 ◽

2019 ◽

Vol 5 (2) ◽

pp. 108-117

Author(s):

Herfia Rhomadhona ◽

Jaka Permadi

Keyword(s):

Cross Validation ◽

Online Media ◽

Bayes Classifier ◽

Ve Bayes ◽

Fold Cross Validation

Berita kriminalitas merupakan berita yang selalu menjadi trending topik di setiap media massa, khususnya media massa online. Media massa online terlah menyediakan beberapa fasilitas untuk mempermudah masyarakan dalam mencari sebuah berita berdasarkan topik. Media massa online melabeli suatu berita berdasarkan kategorinya. Namun, media massa online tidak memberikan sub kategori pada berita tersebut. Sebagai contoh jika seorang pengguna membuka kategori kriminal, maka yang ditampilkan adalah semua jenis berita kriminal tanpa memberikan informasi yang spesifik dari jenis kriminalitasnya. Permasalahan tersebut dapat diatasi dengan mengklasifikasikan berita kriminalitas berdasarkan subkategori. Penelitian ini menggunakan metode NaÃ¯ve Bayes Classifier (NBC) untuk mengklasifikasi berita berdasarkan sub kategorinya. Adapun subkategori terbagi kedalam 5 kategori yaitu korupsi, narkoba, pencurian, pemerkosaan dan pembunuhan. Penelitian ini bertujuan untuk mengetahui kemampuan NBC dalam mengklasifikasi berita dengan melakukan pengujian menggunakan teknik K-Fold Cross Validation dengan nilai K dari 3 sampai 10. Hasil pengujian menyatakan bahwa NBC memiliki kemampuan dalam klasifikasi berita kriminal dengan nilai precision sebesar 98,53 %, nilai recall sebesar 98,44 % dan nilai accuracy sebesar 99,38 %.

Download Full-text