XGBoost and Network Analysis for Prediction of Proteins Affecting Insulin based on Protein Protein Interactions

Author(s):  
Mohammad Hamim Zajuli Al Faroby ◽  
Mohammad Isa Irawan ◽  
Ni Nyoman Tri Puspaningsih

Protein Interaction Analysis (PPI) can be used to identify proteins that have a supporting function on the main protein, especially in the synthesis process. Insulin is synthesized by proteins that have the same molecular function covering different but mutually supportive roles. To identify this function, the translation of Gene Ontology (GO) gives certain characteristics to each protein. This study purpose to predict proteins that interact with insulin using the centrality method as a feature extractor and extreme gradient boosting as a classification algorithm. Characteristics using the centralized method produces  features as a central function of protein. Classification results are measured using measurements, precision, recall and ROC scores. Optimizing the model by finding the right parameters produces an accuracy of  and a ROC score of . The prediction model produced by XGBoost has capabilities above the average of other machine learning methods.

2021 ◽  
Vol 12 ◽  
Author(s):  
Pan Wang ◽  
Guiyang Zhang ◽  
Zu-Guo Yu ◽  
Guohua Huang

Knowledge about protein-protein interactions is beneficial in understanding cellular mechanisms. Protein-protein interactions are usually determined according to their protein-protein interaction sites. Due to the limitations of current techniques, it is still a challenging task to detect protein-protein interaction sites. In this article, we presented a method based on deep learning and XGBoost (called DeepPPISP-XGB) for predicting protein-protein interaction sites. The deep learning model served as a feature extractor to remove redundant information from protein sequences. The Extreme Gradient Boosting algorithm was used to construct a classifier for predicting protein-protein interaction sites. The DeepPPISP-XGB achieved the following results: area under the receiver operating characteristic curve of 0.681, a recall of 0.624, and area under the precision-recall curve of 0.339, being competitive with the state-of-the-art methods. We also validated the positive role of global features in predicting protein-protein interaction sites.


Protein-Protein Interactions referred as PPIs perform significant role in biological functions like cell metabolism, immune response, signal transduction etc. Hot spots are small fractions of residues in interfaces and provide substantial binding energy in PPIs. Therefore, identification of hot spots is important to discover and analyze molecular medicines and diseases. The current strategy, alanine scanning isn't pertinent to enormous scope applications since the technique is very costly and tedious. The existing computational methods are poor in classification performance as well as accuracy in prediction. They are concerned with the topological structure and gene expression of hub proteins. The proposed system focuses on hot spots of hub proteins by eliminating redundant as well as highly correlated features using Pearson Correlation Coefficient and Support Vector Machine based feature elimination. Extreme Gradient boosting and LightGBM algorithms are used to ensemble a set of weak classifiers to form a strong classifier. The proposed system shows better accuracy than the existing computational methods. The model can also be used to predict accurate molecular inhibitors for specific PPIs


Author(s):  
João Botelho ◽  
Paulo Mascarenhas ◽  
José João Mendes ◽  
Vanessa Machado

Recent studies supported a clinical association between Parkinson’s Disease (PD) and periodontitis. Hence, investigating possible protein interactions between these two conditions is of interest. In this study, we conducted a protein-protein network interaction analysis with recognized genes encoding proteins for PD and periodontitis. Genes of interest were collected via GWAS database. Then, we conducted a protein interaction analysis using STRING database, with a highest confidence cut-off of 0.9. Our protein network casted a comprehensive analysis of potential protein-protein interactions between PD and periodontitis. This analysis may underpin valuable information for new candidate molecular mechanisms between PD and periodontitis and may serve new potential targets for research purposes. These results should be carefully interpreted giving the limitations of this approach.


2021 ◽  
Vol 2072 (1) ◽  
pp. 012005
Author(s):  
M Sumanto ◽  
M A Martoprawiro ◽  
A L Ivansyah

Abstract Machine Learning is an artificial intelligence system, where the system has the ability to learn automatically from experience without being explicitly programmed. The learning process from Machine Learning starts from observing the data and then looking at the pattern of the data. The main purpose of this process is to make computers learn automatically. In this study, we will use Machine Learning to predict molecular atomization energy. From various methods in Machine Learning, we use two methods namely Neural Network and Extreme Gradient Boosting. Both methods have several parameters that must be adjusted so that the predicted value of the atomization energy of the molecule has the lowest possible error. We are trying to find the right parameter values for both methods. For the neural network method, it is quite difficult to find the right parameter value because it takes a long time to train the model of the neural network to find out whether the model is good or bad, while for the Extreme Gradient Boosting method the time needed to train the model is shorter, so it is quite easy to find the right parameter values for the model. This study also looked at the effects of the modification on the dataset with the output transformation of normalization and standardization then removing molecules containing Br atoms and changing the entry in the Coulomb matrix to 0 if the distance between atoms in the molecule exceeds 2 angstrom.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Akshi Kumar ◽  
Shubham Dikshit ◽  
Victor Hugo C. Albuquerque

Sarcasm detection in dialogues has been gaining popularity among natural language processing (NLP) researchers with the increased use of conversational threads on social media. Capturing the knowledge of the domain of discourse, context propagation during the course of dialogue, and situational context and tone of the speaker are some important features to train the machine learning models for detecting sarcasm in real time. As situational comedies vibrantly represent human mannerism and behaviour in everyday real-life situations, this research demonstrates the use of an ensemble supervised learning algorithm to detect sarcasm in the benchmark dialogue dataset, MUStARD. The punch-line utterance and its associated context are taken as features to train the eXtreme Gradient Boosting (XGBoost) method. The primary goal is to predict sarcasm in each utterance of the speaker using the chronological nature of a scene. Further, it is vital to prevent model bias and help decision makers understand how to use the models in the right way. Therefore, as a twin goal of this research, we make the learning model used for conversational sarcasm detection interpretable. This is done using two post hoc interpretability approaches, Local Interpretable Model-agnostic Explanations (LIME) and Shapley Additive exPlanations (SHAP), to generate explanations for the output of a trained classifier. The classification results clearly depict the importance of capturing the intersentence context to detect sarcasm in conversational threads. The interpretability methods show the words (features) that influence the decision of the model the most and help the user understand how the model is making the decision for detecting sarcasm in dialogues.


Author(s):  
Lara Marie Demajo ◽  
Vince Vella ◽  
Alexiei Dingli

With the recent boosted enthusiasm in Artificial Intelligence (AI) and Financial Technology (FinTech), applications such as credit scoring have gained substantial academic interest. However, despite the evergrowing achievements, the biggest obstacle in most AI systems is their lack of interpretability. This deficiency of transparency limits their application in different domains including credit scoring. Credit scoring systems help financial experts make better decisions regarding whether or not to accept a loan application so that loans with a high probability of default are not accepted. Apart from the noisy and highly imbalanced data challenges faced by such credit scoring models, recent regulations such as the `right to explanation' introduced by the General Data Protection Regulation (GDPR) and the Equal Credit Opportunity Act (ECOA) have added the need for model interpretability to ensure that algorithmic decisions are understandable and coherent. A recently introduced concept is eXplainable AI (XAI), which focuses on making black-box models more interpretable. In this work, we present a credit scoring model that is both accurate and interpretable. For classification, state-of-the-art performance on the Home Equity Line of Credit (HELOC) and Lending Club (LC) Datasets is achieved using the Extreme Gradient Boosting (XGBoost) model. The model is then further enhanced with a 360-degree explanation framework, which provides different explanations (i.e. global, local feature-based and local instance- based) that are required by different people in different situations. Evaluation through the use of functionally-grounded, application-grounded and human-grounded analysis shows that the explanations provided are simple and consistent as well as correct, effective, easy to understand, sufficiently detailed and trustworthy.


2003 ◽  
Vol 285 (2) ◽  
pp. F178-F190 ◽  
Author(s):  
Bruce C. Kone ◽  
Teresa Kuncewicz ◽  
Wenzheng Zhang ◽  
Zhi-Yuan Yu

Nitric oxide (NO) is a potent cell-signaling, effector, and vasodilator molecule that plays important roles in diverse biological effects in the kidney, vasculature, and many other tissues. Because of its high biological reactivity and diffusibility, multiple tiers of regulation, ranging from transcriptional to posttranslational controls, tightly control NO biosynthesis. Interactions of each of the major NO synthase (NOS) isoforms with heterologous proteins have emerged as a mechanism by which the activity, spatial distribution, and proximity of the NOS isoforms to regulatory proteins and intended targets are governed. Dimerization of the NOS isozymes, required for their activity, exhibits distinguishing features among these proteins and may serve as a regulated process and target for therapeutic intervention. An increasingly wide array of proteins, ranging from scaffolding proteins to membrane receptors, has been shown to function as NOS-binding partners. Neuronal NOS interacts via its PDZ domain with several PDZ-domain proteins. Several resident and recruited proteins of plasmalemmal caveolae, including caveolins, anchoring proteins, G protein-coupled receptors, kinases, and molecular chaperones, modulate the activity and trafficking of endothelial NOS in the endothelium. Inducible NOS (iNOS) interacts with the inhibitory molecules kalirin and NOS-associated protein 110 kDa, as well as activator proteins, the Rac GTPases. In addition, protein-protein interactions of proteins governing iNOS transcription function to specify activation or suppression of iNOS induction by cytokines. The calpain and ubiquitin-proteasome pathways are the major proteolytic systems responsible for the regulated degradation of NOS isozymes. The experimental basis for these protein-protein interactions, their functional importance, and potential implication for renal and vascular physiology and pathophysiology is reviewed.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Gustav N. Sundell ◽  
Ylva Ivarsson

Phage display is a powerful technique for profiling specificities of peptide binding domains. The method is suited for the identification of high-affinity ligands with inhibitor potential when using highly diverse combinatorial peptide phage libraries. Such experiments further provide consensus motifs for genome-wide scanning of ligands of potential biological relevance. A complementary but considerably less explored approach is to display expression products of genomic DNA, cDNA, open reading frames (ORFs), or oligonucleotide libraries designed to encode defined regions of a target proteome on phage particles. One of the main applications of such proteomic libraries has been the elucidation of antibody epitopes. This review is focused on the use of proteomic phage display to uncover protein-protein interactions of potential relevance for cellular function. The method is particularly suited for the discovery of interactions between peptide binding domains and their targets. We discuss the largely unexplored potential of this method in the discovery of domain-motif interactions of potential biological relevance.


2020 ◽  
Author(s):  
Lara Marie Demajo ◽  
Vince Vella ◽  
Alexiei Dingli

With the ever-growing achievements in Artificial Intelligence (AI) and the recent boosted enthusiasm in Financial Technology (FinTech), applications such as credit scoring have gained substantial academic interest. Credit scoring helps financial experts make better decisions regarding whether or not to accept a loan application, such that loans with a high probability of default are not accepted. Apart from the noisy and highly imbalanced data challenges faced by such credit scoring models, recent regulations such as the `right to explanation' introduced by the General Data Protection Regulation (GDPR) and the Equal Credit Opportunity Act (ECOA) have added the need for model interpretability to ensure that algorithmic decisions are understandable and coherent. An interesting concept that has been recently introduced is eXplainable AI (XAI), which focuses on making black-box models more interpretable. In this work, we present a credit scoring model that is both accurate and interpretable. For classification, state-of-the-art performance on the Home Equity Line of Credit (HELOC) and Lending Club (LC) Datasets is achieved using the Extreme Gradient Boosting (XGBoost) model. The model is then further enhanced with a 360-degree explanation framework, which provides different explanations (i.e. global, local feature-based and local instance-based) that are required by different people in different situations. Evaluation through the use of functionallygrounded, application-grounded and human-grounded analysis show that the explanations provided are simple, consistent as well as satisfy the six predetermined hypotheses testing for correctness, effectiveness, easy understanding, detail sufficiency and trustworthiness.


Sign in / Sign up

Export Citation Format

Share Document