Target identification of drug candidates with machine-learning algorithms: how choosing negative examples for training

Abstract(1) Background:Identification of hit molecules protein targets is essential in the drug discovery process. Target prediction with machine-learning algorithms can help accelerate this search, limiting the number of required experiments. However, Drug-Target Interactions databases used for training present high statistical bias, leading to a high number of false positive predicted targets, thus increasing time and cost of experimental validation campaigns. (2) Methods: To minimize the number of false positive predicted proteins, we propose a new scheme for choosing negative examples, so that each protein and each drug appears an equal number of times in positive and negative examples. We artificially reproduce the process of target identification for 3 particular drugs, and more globally for 200 approved drugs. (3) Results: For the detailed 3 drugs examples, and for the larger set of 200 drugs, training with the proposed scheme for the choice of negative examples improved target prediction results: the average number of false positive among the top ranked predicted targets decreased and overall the rank of the true targets was improved. (4) Conclusion: Our method enables to correct databases statistical bias and reduces the number of false positive predictions, and therefore the number of useless experiments potentially undertaken.

Download Full-text

Drug Target Identification with Machine Learning: How to Choose Negative Examples

International Journal of Molecular Sciences ◽

10.3390/ijms22105118 ◽

2021 ◽

Vol 22 (10) ◽

pp. 5118

Author(s):

Matthieu Najm ◽

Chloé-Agathe Azencott ◽

Benoit Playe ◽

Véronique Stoven

Keyword(s):

Machine Learning ◽

Drug Target ◽

Target Identification ◽

Target Prediction ◽

False Positives ◽

Machine Learning Algorithms ◽

Statistical Bias ◽

Protein Targets ◽

Drug Target Identification ◽

Approved Drugs

Identification of the protein targets of hit molecules is essential in the drug discovery process. Target prediction with machine learning algorithms can help accelerate this search, limiting the number of required experiments. However, Drug-Target Interactions databases used for training present high statistical bias, leading to a high number of false positives, thus increasing time and cost of experimental validation campaigns. To minimize the number of false positives among predicted targets, we propose a new scheme for choosing negative examples, so that each protein and each drug appears an equal number of times in positive and negative examples. We artificially reproduce the process of target identification for three specific drugs, and more globally for 200 approved drugs. For the detailed three drug examples, and for the larger set of 200 drugs, training with the proposed scheme for the choice of negative examples improved target prediction results: the average number of false positives among the top ranked predicted targets decreased, and overall, the rank of the true targets was improved.Our method corrects databases’ statistical bias and reduces the number of false positive predictions, and therefore the number of useless experiments potentially undertaken.

Download Full-text

Machine Learning for Automated Polyp Detection in Computed Tomography Colonography

Machine Learning ◽

10.4018/978-1-60960-818-7.ch407 ◽

2012 ◽

pp. 830-850

Author(s):

Abhilash Alexander Miranda ◽

Olivier Caelen ◽

Gianluca Bontempi

Keyword(s):

Machine Learning ◽

Computed Tomography ◽

False Positive ◽

False Positive Rate ◽

Learning Algorithms ◽

Colorectal Polyps ◽

Machine Learning Algorithms ◽

Computed Tomography Colonography ◽

Positive Rate ◽

Independent Features

This chapter presents a comprehensive scheme for automated detection of colorectal polyps in computed tomography colonography (CTC) with particular emphasis on robust learning algorithms that differentiate polyps from non-polyp shapes. The authors’ automated CTC scheme introduces two orientation independent features which encode the shape characteristics that aid in classification of polyps and non-polyps with high accuracy, low false positive rate, and low computations making the scheme suitable for colorectal cancer screening initiatives. Experiments using state-of-the-art machine learning algorithms viz., lazy learning, support vector machines, and naïve Bayes classifiers reveal the robustness of the two features in detecting polyps at 100% sensitivity for polyps with diameter greater than 10 mm while attaining total low false positive rates, respectively, of 3.05, 3.47 and 0.71 per CTC dataset at specificities above 99% when tested on 58 CTC datasets. The results were validated using colonoscopy reports provided by expert radiologists.

Download Full-text

Exoplanet Validation with Machine Learning: 50 new validated Kepler planets

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa2498 ◽

2020 ◽

Author(s):

David J Armstrong ◽

Jevgenij Gamper ◽

Theodoros Damoulas

Keyword(s):

Machine Learning ◽

Gaussian Process ◽

False Positive ◽

Failure Modes ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Prior Probabilities ◽

Threshold Crossing ◽

Single Method ◽

Automated Algorithms

Abstract Over 30% of the ∼4000 known exoplanets to date have been discovered using ‘validation’, where the statistical likelihood of a transit arising from a false positive (FP), non-planetary scenario is calculated. For the large majority of these validated planets calculations were performed using the vespa algorithm (Morton et al. 2016). Regardless of the strengths and weaknesses of vespa, it is highly desirable for the catalogue of known planets not to be dependent on a single method. We demonstrate the use of machine learning algorithms, specifically a gaussian process classifier (GPC) reinforced by other models, to perform probabilistic planet validation incorporating prior probabilities for possible FP scenarios. The GPC can attain a mean log-loss per sample of 0.54 when separating confirmed planets from FPs in the Kepler threshold crossing event (TCE) catalogue. Our models can validate thousands of unseen candidates in seconds once applicable vetting metrics are calculated, and can be adapted to work with the active TESS mission, where the large number of observed targets necessitates the use of automated algorithms. We discuss the limitations and caveats of this methodology, and after accounting for possible failure modes newly validate 50 Kepler candidates as planets, sanity checking the validations by confirming them with vespa using up to date stellar information. Concerning discrepancies with vespa arise for many other candidates, which typically resolve in favour of our models. Given such issues, we caution against using single-method planet validation with either method until the discrepancies are fully understood.

Download Full-text

Automatic detection of false positive RFID readings using machine learning algorithms

Expert Systems with Applications ◽

10.1016/j.eswa.2017.09.021 ◽

2018 ◽

Vol 91 ◽

pp. 442-451 ◽

Cited By ~ 24

Author(s):

Haishu Ma ◽

Yi Wang ◽

Kesheng Wang

Keyword(s):

Machine Learning ◽

False Positive ◽

Learning Algorithms ◽

Automatic Detection ◽

Machine Learning Algorithms

Download Full-text

Machine Learning for Automated Polyp Detection in Computed Tomography Colonography

Advances in Bioinformatics and Biomedical Engineering - Biomedical Image Analysis and Machine Learning Technologies ◽

10.4018/978-1-60566-956-4.ch003 ◽

2010 ◽

pp. 54-77

Author(s):

Abhilash Alexander Miranda ◽

Olivier Caelen ◽

Gianluca Bontempi

Keyword(s):

Machine Learning ◽

Computed Tomography ◽

False Positive ◽

False Positive Rate ◽

Learning Algorithms ◽

Colorectal Polyps ◽

Machine Learning Algorithms ◽

Computed Tomography Colonography ◽

Positive Rate ◽

Independent Features

Download Full-text

Artificial Intelligence Suggested Repositionable Therapeutics for Managing COVID-19: An Investigation with Machine Learning Algorithms and Molecular Structures

10.21203/rs.3.rs-40988/v1 ◽

2020 ◽

Author(s):

T.V. Sundar ◽

K. Menaka ◽

G. Vinotha

Keyword(s):

Machine Learning ◽

Structural Information ◽

Learning Algorithms ◽

Drug Repurposing ◽

Docking Studies ◽

Machine Learning Algorithms ◽

Molecular Structures ◽

Treatment Modalities ◽

Corona Virus ◽

Approved Drugs

Abstract The COVID-19 pandemic is almost half year old now and is still tormenting the humans to unimaginable extent with deeper interference to their routine life and peace. As approved vaccines are yet to be synthesized and standard therapeutic procedures are awaiting establishment for fighting against new Corona virus, several treatment modalities are being suggested and tried out by scientific community. Many of such approaches follow a drug repurposing approach as a possible remedy could prevent a great amount of loss in a shorter span of time. In this background, we report our attempt made for identifying a solution to this malady with a similar strategy. We used machine learning algorithms and the structural information of already approved drugs to identify potential therapeutics for managing the Covid-19 crisis. The experiments have been done with a group of 77 antiviral molecules (for the training phase of machine learning) and another group, comprising 9 antivirals and 11 antimalarials (meant for the testing phase). All the chosen molecules are approved category drugs and have significant drug action against the viruses. The identified molecules are subjected to validation by making docking studies with recently released crystal structures of Corona Virus. The binding affinity of the tested small molecules with three selected severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) structures are computed and compared with the affinity scores of five other medications viz. Hydroxychloroquine, Favipiravir, Dexamethasone, Dichlorobenzyl alcohol and Amyl metacresol followed by subjecting the results to the statistical test of ANOVA. The predicted therapeutics in conjunction with their already established characteristics could be further put to evaluation by approved clinical trials towards determining the efficiency of them against COVID-19 infection.

Download Full-text

Supplemental Material for One Model to Rule Them All? Using Machine Learning Algorithms to Determine the Number of Factors in Exploratory Factor Analysis

Psychological Methods ◽

10.1037/met0000262.supp ◽

2020 ◽

Keyword(s):

Machine Learning ◽

Factor Analysis ◽

Exploratory Factor Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Number Of Factors

Download Full-text

Forecasting US movies box office performances in Turkey using machine learning algorithms

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189120 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6579-6590

Author(s):

Sandy Çağlıyor ◽

Başar Öztayşi ◽

Selime Sezgin

Keyword(s):

Machine Learning ◽

Global Economy ◽

Learning Algorithms ◽

Forecast Model ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

High Stakes ◽

Box Office ◽

Industry Forecast ◽

The Impact

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.

Download Full-text

Intelligent system of English composition scoring model based on improved machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189235 ◽

2020 ◽

pp. 1-11

Author(s):

Jie Liu ◽

Lin Lin ◽

Xiufang Liang

Keyword(s):

Machine Learning ◽

Evaluation System ◽

Intelligent System ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Assessment System ◽

English Composition ◽

Region Extraction ◽

Constraint Model

The online English teaching system has certain requirements for the intelligent scoring system, and the most difficult stage of intelligent scoring in the English test is to score the English composition through the intelligent model. In order to improve the intelligence of English composition scoring, based on machine learning algorithms, this study combines intelligent image recognition technology to improve machine learning algorithms, and proposes an improved MSER-based character candidate region extraction algorithm and a convolutional neural network-based pseudo-character region filtering algorithm. In addition, in order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. The research results show that the algorithm proposed in this paper has a certain practical effect, and it can be applied to the English assessment system and the online assessment system of the homework evaluation system algorithm system.

Download Full-text

The Unlearnable Checkerboard Pattern

Communications of the Blyth Institute ◽

10.33014/issn.2640-5652.1.2.holloway.1 ◽

2019 ◽

Vol 1 (2) ◽

pp. 78-80

Author(s):

Eric Holloway

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Checkerboard Pattern ◽

Simple Task

Detecting some patterns is a simple task for humans, but nearly impossible for current machine learning algorithms. Here, the "checkerboard" pattern is examined, where human prediction nears 100% and machine prediction drops significantly below 50%.

Download Full-text