Learning predictive models of drug side-effect relationships from distributed representations of literature-derived semantic predications

Abstract Objective The aim of this work is to leverage relational information extracted from biomedical literature using a novel synthesis of unsupervised pretraining, representational composition, and supervised machine learning for drug safety monitoring. Methods Using ≈80 million concept-relationship-concept triples extracted from the literature using the SemRep Natural Language Processing system, distributed vector representations (embeddings) were generated for concepts as functions of their relationships utilizing two unsupervised representational approaches. Embeddings for drugs and side effects of interest from two widely used reference standards were then composed to generate embeddings of drug/side-effect pairs, which were used as input for supervised machine learning. This methodology was developed and evaluated using cross-validation strategies and compared to contemporary approaches. To qualitatively assess generalization, models trained on the Observational Medical Outcomes Partnership (OMOP) drug/side-effect reference set were evaluated against a list of ≈1100 drugs from an online database. Results The employed method improved performance over previous approaches. Cross-validation results advance the state of the art (AUC 0.96; F1 0.90 and AUC 0.95; F1 0.84 across the two sets), outperforming methods utilizing literature and/or spontaneous reporting system data. Examination of predictions for unseen drug/side-effect pairs indicates the ability of these methods to generalize, with over tenfold label support enrichment in the top 100 predictions versus the bottom 100 predictions. Discussion and Conclusion Our methods can assist the pharmacovigilance process using information from the biomedical literature. Unsupervised pretraining generates a rich relationship-based representational foundation for machine learning techniques to classify drugs in the context of a putative side effect, given known examples.

Download Full-text

Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature

BMC Bioinformatics ◽

10.1186/1471-2105-16-s5-s6 ◽

2015 ◽

Vol 16 (S5) ◽

Cited By ~ 10

Author(s):

Rong Xu ◽

QuanQiu Wang

Keyword(s):

Machine Learning ◽

Side Effect ◽

Large Scale ◽

Biomedical Literature ◽

Supervised Machine Learning ◽

Free Text ◽

Learning Approach ◽

Drug Side Effect ◽

Machine Learning Approach

Download Full-text

Machine Learning Approaches to Retrieve High-Quality, Clinically Relevant, Evidence from the Biomedical Literature: A Systematic Review (Preprint)

10.2196/preprints.30401 ◽

2021 ◽

Author(s):

Wael Abdelkader ◽

Tamara Navarro ◽

Rick Parrish ◽

Chris Cotoi ◽

Federico Germini ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Strong Evidence ◽

Clinical Care ◽

Biomedical Literature ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Learning Approaches ◽

High Quality ◽

Applied Machine Learning

BACKGROUND The rapid growth of the biomedical literature makes identifying strong evidence a time-consuming task. Applying machine learning to the process could be a viable solution that limits effort while maintaining accuracy. OBJECTIVE To summarize the nature and comparative performance of machine learning approaches that have been applied to retrieve high-quality evidence for clinical consideration from the biomedical literature. METHODS We conducted a systematic review of studies that applied machine learning techniques to identify high-quality clinical articles in the biomedical literature. Multiple databases were searched to July 2020. Extracted data focused on the applied machine learning model, steps in the development of the models, and model performance. RESULTS From 3918 retrieved studies, 10 met our inclusion criteria. All followed a supervised machine learning approach and applied, from a limited range of options, a high-quality standard for the training of their model. The results show that machine learning can achieve a sensitivity of 95% while maintaining a high precision of 86%. CONCLUSIONS Applying machine learning to distinguish studies with strong evidence for clinical care has the potential to decrease the workload of manually identifying these. The evidence base is active and evolving. Reported methods were variable across the studies but focused on supervised machine learning approaches. Performance may improve by applying more sophisticated approaches such as active learning, auto-machine learning, and unsupervised machine learning approaches.

Download Full-text

Machine Learning Techniques Application

Research Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing ◽

10.4018/978-1-7998-5339-8.ch068 ◽

2021 ◽

pp. 1396-1417

Author(s):

Karthikeyan P. ◽

Karunakaran Velswamy ◽

Pon Harshavardhanan ◽

Rajagopal R. ◽

JeyaKrishnan V. ◽

...

Keyword(s):

Machine Learning ◽

Language Processing ◽

Character Recognition ◽

Optical Character Recognition ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Modern World ◽

Interdisciplinary Field ◽

Sound Image ◽

Learning Techniques

Machine learning is the part of artificial intelligence that makes machines learn without being expressly programmed. Machine learning application built the modern world. Machine learning techniques are mainly classified into three techniques: supervised, unsupervised, and semi-supervised. Machine learning is an interdisciplinary field, which can be joined in different areas including science, business, and research. Supervised techniques are applied in agriculture, email spam, malware filtering, online fraud detection, optical character recognition, natural language processing, and face detection. Unsupervised techniques are applied in market segmentation and sentiment analysis and anomaly detection. Deep learning is being utilized in sound, image, video, time series, and text. This chapter covers applications of various machine learning techniques, social media, agriculture, and task scheduling in a distributed system.

Download Full-text

Algebraic Shortcuts for Leave-One-Out Cross-Validation in Supervised Network Inference

10.1101/242321 ◽

2018 ◽

Author(s):

Michiel Stock ◽

Tapio Pahikkala ◽

Antti Airola ◽

Willem Waegeman ◽

Bernard De Baets

Keyword(s):

Machine Learning ◽

Biological Networks ◽

Regulatory Networks ◽

Network Inference ◽

Cross Validation ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Ligand Interaction ◽

Learning Techniques ◽

Leave One Out

AbstractMotivationSupervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Recently, much emphasis has been placed on the correct evaluation of such supervised models. It is vital to distinguish between using the model to either predict new interactions in a given network or to predict interactions for a new vertex not present in the original network. Specific cross-validation schemes need to be used to assess the performance in such different prediction settings.ResultsWe present a series of leave-one-out cross-validation shortcuts to rapidly estimate the performance of state-of-the-art kernel-based network inference techniques.AvailabilityThe machine learning techniques with the algebraic shortcuts are implemented in the RLScore software package.

Download Full-text

Machine Learning Techniques Application

Handbook of Research on Applications and Implementations of Machine Learning Techniques - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-9902-9.ch020 ◽

2020 ◽

pp. 380-401

Author(s):

Karthikeyan P. ◽

Karunakaran Velswamy ◽

Pon Harshavardhanan ◽

Rajagopal R. ◽

JeyaKrishnan V. ◽

...

Keyword(s):

Machine Learning ◽

Language Processing ◽

Character Recognition ◽

Optical Character Recognition ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Modern World ◽

Interdisciplinary Field ◽

Sound Image ◽

Learning Techniques

Download Full-text

Machine learning approach for the binary classification of biomedical literature

10.21203/rs.3.rs-16326/v1 ◽

2020 ◽

Author(s):

Anna Price ◽

Matthew Mort ◽

David N. Cooper ◽

Kevin E. Ashelford

Keyword(s):

Machine Learning ◽

Language Processing ◽

Full Text ◽

Biomedical Literature ◽

Machine Learning Techniques ◽

Validation Dataset ◽

Learning Models ◽

Clinical Databases ◽

Abstract Data ◽

Machine Learning Models

Abstract Background: We have applied machine learning techniques to automate the screening of biomedical literature prior to the manual curation of clinical databases such as performed by the Human Gene Mutation Database (HGMD). Methods: We have developed two machine learning models, one based on title and abstract data only, the other on the full text of the article. The models were built using a Natural Language Processing (NLP) pipeline and a logistic regression classifier. Our pipelines are implemented in Python and can be run using Docker. They are made available to the wider community via GitHub (https://github.com/annacprice/nlp-bio-tools) and Docker Hub. Results: During testing, both models performed well, correctly predicting HGMD relevant articles more than 93% of the time and correctly discarding irrelevant articles more than 96% of the time, with Matthews Correlation Coefficients (MCC's) of over 0.89. Evaluation of the finalised model using an unseen validation dataset demonstrated that the full text model correctly predicted HGMD-relevant articles more than 97% of the time, an accuracy 9.5% higher than that obtained with the title/abstract model. Conclusions: Through this work we have demonstrated that machine learning models can act as an effective pre-screen of biomedical literature, with the results indicating that a full text approach to screening biomedical literature is preferable to using just the title/abstract data.

Download Full-text

PHRASE STRUCTURE BASED ENGLISH TO KANNADA SENTENCE TRANSLATION

International Journal of Computer and Communication Technology ◽

10.47893/ijcct.2017.1407 ◽

2017 ◽

pp. 96-100

Author(s):

SHARANBASAPPA HONNASHETTY ◽

MALLAMMA V REDDY ◽

DR. M. HANUMANTHAPPA

Keyword(s):

Language Processing ◽

Processing System ◽

Phrase Structure ◽

Tree Structure ◽

Supervised Machine Learning ◽

Syntactic Analysis ◽

Phrase Structure Grammar ◽

Natural Language Processing System ◽

Part Of Speech ◽

Novel Approach

In order to build a natural language processing system first the words are placed into a structured form that leads to a syntactically correct sentence. Syntactic analysis of a sentence is performed by parsing technique. This paper explores the novel approach that how the shift reduce parsing technique is used for translating English sentences into a grammatically correct Kannada sentences by reordering of English parse tree structure, generating and implementing phrase structure grammar(PSG) for kannada sentences. Recursive Descent Parsing technique is used to generate English phrase tree structure and terminal symbols are tagged with Kannada equivalent words then Shift-Reduce Parsing technique is used to construct a Kannada sentence. Part-of-Speech (POS) tagger is used to tag Kannada words to English words. It is implemented by using supervised machine learning approach

Download Full-text

Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

10.26434/chemrxiv.5513581.v1 ◽

2017 ◽

Author(s):

Sabrina Jaeger ◽

Simone Fulle ◽

Samo Turk

Keyword(s):

Machine Learning ◽

Language Processing ◽

Supervised Machine Learning ◽

Learning Approach ◽

Learning Approaches ◽

Unsupervised Machine Learning ◽

Feature Representations ◽

Machine Learning Approach ◽

The Individual ◽

Vector Representations

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.

Download Full-text

Application of Machine Learning Techniques to Predict Binding Affinity for Drug Targets: A Study of Cyclin-Dependent Kinase 2

Current Medicinal Chemistry ◽

10.2174/2213275912666191102162959 ◽

2020 ◽

Vol 28 (2) ◽

pp. 253-265 ◽

Cited By ~ 3

Author(s):

Gabriela Bitencourt-Ferreira ◽

Amauri Duarte da Silva ◽

Walter Filgueira de Azevedo

Keyword(s):

Machine Learning ◽

Binding Affinity ◽

Predictive Performance ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Scoring Functions ◽

Cyclin Dependent Kinase ◽

Learning Models ◽

Learning Techniques ◽

Machine Learning Models

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.

Download Full-text

Predictive Modelling of Employee Turnover in Indian IT Industry Using Machine Learning Techniques

Vision The Journal of Business Perspective ◽

10.1177/0972262918821221 ◽

2019 ◽

Vol 23 (1) ◽

pp. 12-21 ◽

Cited By ~ 2

Author(s):

Shikha N. Khera ◽

Divya

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Confusion Matrix ◽

Predictive Modelling ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

It Industry ◽

Knowledge Based ◽

Employee Attrition

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.

Download Full-text