A survey on adverse drug reaction studies: data, tasks and machine learning methods

Author(s):  
Duc Anh Nguyen ◽  
Canh Hao Nguyen ◽  
Hiroshi Mamitsuka

Abstract Motivation Adverse drug reaction (ADR) or drug side effect studies play a crucial role in drug discovery. Recently, with the rapid increase of both clinical and non-clinical data, machine learning methods have emerged as prominent tools to support analyzing and predicting ADRs. Nonetheless, there are still remaining challenges in ADR studies. Results In this paper, we summarized ADR data sources and review ADR studies in three tasks: drug-ADR benchmark data creation, drug–ADR prediction and ADR mechanism analysis. We focused on machine learning methods used in each task and then compare performances of the methods on the drug–ADR prediction task. Finally, we discussed open problems for further ADR studies. Availability Data and code are available at https://github.com/anhnda/ADRPModels.

2021 ◽  
Author(s):  
Yu Rang Park

UNSTRUCTURED An adverse drug reaction (ADR) is an unintended response induced by a drug. It is important to determine the association between drugs and ADRs. There are many methods to demonstrate this association. This systematic review aimed to examine the analysis tools by considering original articles that introduced statistical and machine learning methods for predicting ADRs in humans. A systematic literature review of EMBASE and PubMed was conducted based on articles published from January 2015 to March 2020. The keywords were statistical, machine learning, and deep learning methods for the detection of ADR signals in the title and abstract. We followed the Preferred Reporting Items for Systematic Reviews and Meta-Analysis statement guidelines. In total, 72 articles were included in the current systematic review; of these, 51 and 21 addressed statistical and machine learning methods, respectively. This study provides a graphical overview of data-driven methods for detecting ADRs with multiple data sources for patient drug safety.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11900
Author(s):  
Xinyi Liu ◽  
Yueyue Shen ◽  
Youhua Zhang ◽  
Fei Liu ◽  
Zhiyu Ma ◽  
...  

Background A moonlighting protein refers to a protein that can perform two or more functions. Since the current moonlighting protein prediction tools mainly focus on the proteins in animals and microorganisms, and there are differences in the cells and proteins between animals and plants, these may cause the existing tools to predict plant moonlighting proteins inaccurately. Hence, the availability of a benchmark data set and a prediction tool specific for plant moonlighting protein are necessary. Methods This study used some protein feature classes from the data set constructed in house to develop a web-based prediction tool. In the beginning, we built a data set about plant protein and reduced redundant sequences. We then performed feature selection, feature normalization and feature dimensionality reduction on the training data. Next, machine learning methods for preliminary modeling were used to select feature classes that performed best in plant moonlighting protein prediction. This selected feature was incorporated into the final plant protein prediction tool. After that, we compared five machine learning methods and used grid searching to optimize parameters, and the most suitable method was chosen as the final model. Results The prediction results indicated that the eXtreme Gradient Boosting (XGBoost) performed best, which was used as the algorithm to construct the prediction tool, called IdentPMP (Identification of Plant Moonlighting Proteins). The results of the independent test set shows that the area under the precision-recall curve (AUPRC) and the area under the receiver operating characteristic curve (AUC) of IdentPMP is 0.43 and 0.68, which are 19.44% (0.43 vs. 0.36) and 13.33% (0.68 vs. 0.60) higher than state-of-the-art non-plant specific methods, respectively. This further demonstrated that a benchmark data set and a plant-specific prediction tool was required for plant moonlighting protein studies. Finally, we implemented the tool into a web version, and users can use it freely through the URL: http://identpmp.aielab.net/.


2020 ◽  
Author(s):  
Edwin Tse ◽  
Laksh Aithani ◽  
Mark Anderson ◽  
Jonathan Cardoso-Silva ◽  
Giovanni Cincilla ◽  
...  

<p>The discovery of new antimalarial medicines with novel mechanisms of action is key to combating the problem of increasing resistance to our frontline treatments. The Open Source Malaria (OSM) consortium has been developing compounds ("Series 4") that have potent activity against <i>Plasmodium falciparum</i> <i>in vitro</i> and <i>in vivo</i> and that have been suggested to act through the inhibition of <i>Pf</i>ATP4, an essential membrane ion pump that regulates the parasite’s intracellular Na<sup>+</sup> concentration. The structure of <i>Pf</i>ATP4 is yet to be determined. In the absence of structural information about this target, a public competition was created to develop a model that would allow the prediction of anti-<i>Pf</i>ATP4 activity among Series 4 compounds, thereby reducing project costs associated with the unnecessary synthesis of inactive compounds.</p>In the first round, in 2016, six participants used the open data collated by OSM to develop moderately predictive models using diverse methods. Notably, all submitted models were available to all other participants in real time. Since then further bioactivity data have been acquired and machine learning methods have rapidly developed, so a second round of the competition was undertaken, in 2019, again with freely-donated models that other participants could see. The best-performing models from this second round were used to predict novel inhibitory molecules, of which several were synthesised and evaluated against the parasite. One such compound, containing a motif that the human chemists familiar with this series would have dismissed as ill-advised, was active. The project demonstrated the abilities of new machine learning methods in the prediction of active compounds where there is no biological target structure, frequently the central problem in phenotypic drug discovery. Since all data and participant interactions remain in the public domain, this research project “lives” and may be improved by others.


2020 ◽  
Author(s):  
Thomas R. Lane ◽  
Daniel H. Foil ◽  
Eni Minerali ◽  
Fabio Urbina ◽  
Kimberley M. Zorn ◽  
...  

<p>Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies we have applied multiple machine learning algorithms, modeling metrics and in some cases compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and comparison of our proprietary software Assay Central<sup>TM</sup> with random forest, k-Nearest Neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (3 levels). Model performance <a>was</a> assessed using an array of five-fold cross-validation metrics including area-under-the-curve, F1 score, Cohen’s kappa and Matthews correlation coefficient. <a>Based on ranked normalized scores for the metrics or datasets all methods appeared comparable while the distance from the top indicated Assay Central<sup>TM</sup> and support vector classification were comparable. </a>Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case where minimal tuning was performed of any of the methods. If anything, Assay Central<sup>TM</sup> may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay Central<sup>TM</sup>performance, but support vector classification seems to be a strong competitor. We also apply Assay Central<sup>TM</sup> to prospective predictions for PXR and hERG to further validate these models. This work currently appears to be the largest comparison of machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors and algorithms, as well as further refining methods for evaluating and comparing models. </p><p><b> </b></p>


2022 ◽  
Vol 16 (1) ◽  
pp. e0010056
Author(s):  
Emmanuelle Sylvestre ◽  
Clarisse Joachim ◽  
Elsa Cécilia-Joseph ◽  
Guillaume Bouzillé ◽  
Boris Campillo-Gimenez ◽  
...  

Background Traditionally, dengue surveillance is based on case reporting to a central health agency. However, the delay between a case and its notification can limit the system responsiveness. Machine learning methods have been developed to reduce the reporting delays and to predict outbreaks, based on non-traditional and non-clinical data sources. The aim of this systematic review was to identify studies that used real-world data, Big Data and/or machine learning methods to monitor and predict dengue-related outcomes. Methodology/Principal findings We performed a search in PubMed, Scopus, Web of Science and grey literature between January 1, 2000 and August 31, 2020. The review (ID: CRD42020172472) focused on data-driven studies. Reviews, randomized control trials and descriptive studies were not included. Among the 119 studies included, 67% were published between 2016 and 2020, and 39% used at least one novel data stream. The aim of the included studies was to predict a dengue-related outcome (55%), assess the validity of data sources for dengue surveillance (23%), or both (22%). Most studies (60%) used a machine learning approach. Studies on dengue prediction compared different prediction models, or identified significant predictors among several covariates in a model. The most significant predictors were rainfall (43%), temperature (41%), and humidity (25%). The two models with the highest performances were Neural Networks and Decision Trees (52%), followed by Support Vector Machine (17%). We cannot rule out a selection bias in our study because of our two main limitations: we did not include preprints and could not obtain the opinion of other international experts. Conclusions/Significance Combining real-world data and Big Data with machine learning methods is a promising approach to improve dengue prediction and monitoring. Future studies should focus on how to better integrate all available data sources and methods to improve the response and dengue management by stakeholders.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Khushnood Abbas ◽  
Alireza Abbasi ◽  
Shi Dong ◽  
Ling Niu ◽  
Laihang Yu ◽  
...  

Abstract Background Technological and research advances have produced large volumes of biomedical data. When represented as a network (graph), these data become useful for modeling entities and interactions in biological and similar complex systems. In the field of network biology and network medicine, there is a particular interest in predicting results from drug–drug, drug–disease, and protein–protein interactions to advance the speed of drug discovery. Existing data and modern computational methods allow to identify potentially beneficial and harmful interactions, and therefore, narrow drug trials ahead of actual clinical trials. Such automated data-driven investigation relies on machine learning techniques. However, traditional machine learning approaches require extensive preprocessing of the data that makes them impractical for large datasets. This study presents wide range of machine learning methods for predicting outcomes from biomedical interactions and evaluates the performance of the traditional methods with more recent network-based approaches. Results We applied a wide range of 32 different network-based machine learning models to five commonly available biomedical datasets, and evaluated their performance based on three important evaluations metrics namely AUROC, AUPR, and F1-score. We achieved this by converting link prediction problem as binary classification problem. In order to achieve this we have considered the existing links as positive example and randomly sampled negative examples from non-existant set. After experimental evaluation we found that Prone, ACT and $$LRW_5$$ L R W 5 are the top 3 best performers on all five datasets. Conclusions This work presents a comparative evaluation of network-based machine learning algorithms for predicting network links, with applications in the prediction of drug-target and drug–drug interactions, and applied well known network-based machine learning methods. Our work is helpful in guiding researchers in the appropriate selection of machine learning methods for pharmaceutical tasks.


2021 ◽  
pp. 11-21
Author(s):  
Miki Sirola ◽  
John Einar Hulsund

In the Long-Term Degradation Management (LTDM) project we approach component ageing problems with data-analysis methods. It includes literature review about related work. We have used several data sources: water chemistry data from the Halden reactor, simulator data from the HAMBO simulator, and data from a local coffee machine instrumented with sensors. K-means clustering is used in cluster analysis of nuclear power plant data. A method for detecting trends in selected clusters is developed. Prognosis models are developed and tested. In our analysis ARIMA models and gamma processes are used. Such tasks as classification and time-series prediction are focused on. Methodologies are tested in experiments. The realization of practical applications is made with the Jupyter Notebook programming tool and Python 3 programming language. Failure rates and drifts from normal operating states can be the first symptoms of an approaching fault. The problem is to find data sources with enough transients and events to create prognostic models. Prognosis models for predicting possible developing ageing features in nuclear power plant data utilizing machine learning methods or closely related methods are demonstrated.


Sign in / Sign up

Export Citation Format

Share Document