scholarly journals Target identification of drug candidates with machine-learning algorithms: how choosing negative examples for training

2021 ◽  
Author(s):  
Matthieu Najm ◽  
Chloé-Agathe Azencott ◽  
Benoit Playe ◽  
Véronique Stoven

Abstract(1) Background:Identification of hit molecules protein targets is essential in the drug discovery process. Target prediction with machine-learning algorithms can help accelerate this search, limiting the number of required experiments. However, Drug-Target Interactions databases used for training present high statistical bias, leading to a high number of false positive predicted targets, thus increasing time and cost of experimental validation campaigns. (2) Methods: To minimize the number of false positive predicted proteins, we propose a new scheme for choosing negative examples, so that each protein and each drug appears an equal number of times in positive and negative examples. We artificially reproduce the process of target identification for 3 particular drugs, and more globally for 200 approved drugs. (3) Results: For the detailed 3 drugs examples, and for the larger set of 200 drugs, training with the proposed scheme for the choice of negative examples improved target prediction results: the average number of false positive among the top ranked predicted targets decreased and overall the rank of the true targets was improved. (4) Conclusion: Our method enables to correct databases statistical bias and reduces the number of false positive predictions, and therefore the number of useless experiments potentially undertaken.

2021 ◽  
Vol 22 (10) ◽  
pp. 5118
Author(s):  
Matthieu Najm ◽  
Chloé-Agathe Azencott ◽  
Benoit Playe ◽  
Véronique Stoven

Identification of the protein targets of hit molecules is essential in the drug discovery process. Target prediction with machine learning algorithms can help accelerate this search, limiting the number of required experiments. However, Drug-Target Interactions databases used for training present high statistical bias, leading to a high number of false positives, thus increasing time and cost of experimental validation campaigns. To minimize the number of false positives among predicted targets, we propose a new scheme for choosing negative examples, so that each protein and each drug appears an equal number of times in positive and negative examples. We artificially reproduce the process of target identification for three specific drugs, and more globally for 200 approved drugs. For the detailed three drug examples, and for the larger set of 200 drugs, training with the proposed scheme for the choice of negative examples improved target prediction results: the average number of false positives among the top ranked predicted targets decreased, and overall, the rank of the true targets was improved.Our method corrects databases’ statistical bias and reduces the number of false positive predictions, and therefore the number of useless experiments potentially undertaken.


2012 ◽  
pp. 830-850
Author(s):  
Abhilash Alexander Miranda ◽  
Olivier Caelen ◽  
Gianluca Bontempi

This chapter presents a comprehensive scheme for automated detection of colorectal polyps in computed tomography colonography (CTC) with particular emphasis on robust learning algorithms that differentiate polyps from non-polyp shapes. The authors’ automated CTC scheme introduces two orientation independent features which encode the shape characteristics that aid in classification of polyps and non-polyps with high accuracy, low false positive rate, and low computations making the scheme suitable for colorectal cancer screening initiatives. Experiments using state-of-the-art machine learning algorithms viz., lazy learning, support vector machines, and naïve Bayes classifiers reveal the robustness of the two features in detecting polyps at 100% sensitivity for polyps with diameter greater than 10 mm while attaining total low false positive rates, respectively, of 3.05, 3.47 and 0.71 per CTC dataset at specificities above 99% when tested on 58 CTC datasets. The results were validated using colonoscopy reports provided by expert radiologists.


Author(s):  
David J Armstrong ◽  
Jevgenij Gamper ◽  
Theodoros Damoulas

Abstract Over 30% of the ∼4000 known exoplanets to date have been discovered using ‘validation’, where the statistical likelihood of a transit arising from a false positive (FP), non-planetary scenario is calculated. For the large majority of these validated planets calculations were performed using the vespa algorithm (Morton et al. 2016). Regardless of the strengths and weaknesses of vespa, it is highly desirable for the catalogue of known planets not to be dependent on a single method. We demonstrate the use of machine learning algorithms, specifically a gaussian process classifier (GPC) reinforced by other models, to perform probabilistic planet validation incorporating prior probabilities for possible FP scenarios. The GPC can attain a mean log-loss per sample of 0.54 when separating confirmed planets from FPs in the Kepler threshold crossing event (TCE) catalogue. Our models can validate thousands of unseen candidates in seconds once applicable vetting metrics are calculated, and can be adapted to work with the active TESS mission, where the large number of observed targets necessitates the use of automated algorithms. We discuss the limitations and caveats of this methodology, and after accounting for possible failure modes newly validate 50 Kepler candidates as planets, sanity checking the validations by confirming them with vespa using up to date stellar information. Concerning discrepancies with vespa arise for many other candidates, which typically resolve in favour of our models. Given such issues, we caution against using single-method planet validation with either method until the discrepancies are fully understood.


Author(s):  
Abhilash Alexander Miranda ◽  
Olivier Caelen ◽  
Gianluca Bontempi

This chapter presents a comprehensive scheme for automated detection of colorectal polyps in computed tomography colonography (CTC) with particular emphasis on robust learning algorithms that differentiate polyps from non-polyp shapes. The authors’ automated CTC scheme introduces two orientation independent features which encode the shape characteristics that aid in classification of polyps and non-polyps with high accuracy, low false positive rate, and low computations making the scheme suitable for colorectal cancer screening initiatives. Experiments using state-of-the-art machine learning algorithms viz., lazy learning, support vector machines, and naïve Bayes classifiers reveal the robustness of the two features in detecting polyps at 100% sensitivity for polyps with diameter greater than 10 mm while attaining total low false positive rates, respectively, of 3.05, 3.47 and 0.71 per CTC dataset at specificities above 99% when tested on 58 CTC datasets. The results were validated using colonoscopy reports provided by expert radiologists.


2020 ◽  
Author(s):  
T.V. Sundar ◽  
K. Menaka ◽  
G. Vinotha

Abstract The COVID-19 pandemic is almost half year old now and is still tormenting the humans to unimaginable extent with deeper interference to their routine life and peace. As approved vaccines are yet to be synthesized and standard therapeutic procedures are awaiting establishment for fighting against new Corona virus, several treatment modalities are being suggested and tried out by scientific community. Many of such approaches follow a drug repurposing approach as a possible remedy could prevent a great amount of loss in a shorter span of time. In this background, we report our attempt made for identifying a solution to this malady with a similar strategy. We used machine learning algorithms and the structural information of already approved drugs to identify potential therapeutics for managing the Covid-19 crisis. The experiments have been done with a group of 77 antiviral molecules (for the training phase of machine learning) and another group, comprising 9 antivirals and 11 antimalarials (meant for the testing phase). All the chosen molecules are approved category drugs and have significant drug action against the viruses. The identified molecules are subjected to validation by making docking studies with recently released crystal structures of Corona Virus. The binding affinity of the tested small molecules with three selected severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) structures are computed and compared with the affinity scores of five other medications viz. Hydroxychloroquine, Favipiravir, Dexamethasone, Dichlorobenzyl alcohol and Amyl metacresol followed by subjecting the results to the statistical test of ANOVA. The predicted therapeutics in conjunction with their already established characteristics could be further put to evaluation by approved clinical trials towards determining the efficiency of them against COVID-19 infection.


2020 ◽  
Vol 39 (5) ◽  
pp. 6579-6590
Author(s):  
Sandy Çağlıyor ◽  
Başar Öztayşi ◽  
Selime Sezgin

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.


2020 ◽  
pp. 1-11
Author(s):  
Jie Liu ◽  
Lin Lin ◽  
Xiufang Liang

The online English teaching system has certain requirements for the intelligent scoring system, and the most difficult stage of intelligent scoring in the English test is to score the English composition through the intelligent model. In order to improve the intelligence of English composition scoring, based on machine learning algorithms, this study combines intelligent image recognition technology to improve machine learning algorithms, and proposes an improved MSER-based character candidate region extraction algorithm and a convolutional neural network-based pseudo-character region filtering algorithm. In addition, in order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. The research results show that the algorithm proposed in this paper has a certain practical effect, and it can be applied to the English assessment system and the online assessment system of the homework evaluation system algorithm system.


2019 ◽  
Vol 1 (2) ◽  
pp. 78-80
Author(s):  
Eric Holloway

Detecting some patterns is a simple task for humans, but nearly impossible for current machine learning algorithms.  Here, the "checkerboard" pattern is examined, where human prediction nears 100% and machine prediction drops significantly below 50%.


Sign in / Sign up

Export Citation Format

Share Document