Compound Collections at KU 1947-2017: Cheminformatic Analysis and Computational Protein Target Prediction

Author(s):  
Zachary Pearson ◽  
Manvendra Singh ◽  
Zarko Boskovic

<div> <div> <div> <p>We report the comparison of two small-molecule collections synthesized at KU at two different eras. We used a machine learning tool to classify the compounds in these collections by their predicted protein targets. The analyses shine light on the evolution of medicinal chemistry research at the University of Kansas, and reveal several new associations between compounds and protein targets. </p> </div> </div> </div>

2020 ◽  
Author(s):  
Zachary Pearson ◽  
Manvendra Singh ◽  
Zarko Boskovic

<div> <div> <div> <p>We report the comparison of two small-molecule collections synthesized at KU at two different eras. We used a machine learning tool to classify the compounds in these collections by their predicted protein targets. The analyses shine light on the evolution of medicinal chemistry research at the University of Kansas, and reveal several new associations between compounds and protein targets. </p> </div> </div> </div>


2021 ◽  
Vol 22 (10) ◽  
pp. 5118
Author(s):  
Matthieu Najm ◽  
Chloé-Agathe Azencott ◽  
Benoit Playe ◽  
Véronique Stoven

Identification of the protein targets of hit molecules is essential in the drug discovery process. Target prediction with machine learning algorithms can help accelerate this search, limiting the number of required experiments. However, Drug-Target Interactions databases used for training present high statistical bias, leading to a high number of false positives, thus increasing time and cost of experimental validation campaigns. To minimize the number of false positives among predicted targets, we propose a new scheme for choosing negative examples, so that each protein and each drug appears an equal number of times in positive and negative examples. We artificially reproduce the process of target identification for three specific drugs, and more globally for 200 approved drugs. For the detailed three drug examples, and for the larger set of 200 drugs, training with the proposed scheme for the choice of negative examples improved target prediction results: the average number of false positives among the top ranked predicted targets decreased, and overall, the rank of the true targets was improved.Our method corrects databases’ statistical bias and reduces the number of false positive predictions, and therefore the number of useless experiments potentially undertaken.


2021 ◽  
Author(s):  
Matthieu Najm ◽  
Chloé-Agathe Azencott ◽  
Benoit Playe ◽  
Véronique Stoven

Abstract(1) Background:Identification of hit molecules protein targets is essential in the drug discovery process. Target prediction with machine-learning algorithms can help accelerate this search, limiting the number of required experiments. However, Drug-Target Interactions databases used for training present high statistical bias, leading to a high number of false positive predicted targets, thus increasing time and cost of experimental validation campaigns. (2) Methods: To minimize the number of false positive predicted proteins, we propose a new scheme for choosing negative examples, so that each protein and each drug appears an equal number of times in positive and negative examples. We artificially reproduce the process of target identification for 3 particular drugs, and more globally for 200 approved drugs. (3) Results: For the detailed 3 drugs examples, and for the larger set of 200 drugs, training with the proposed scheme for the choice of negative examples improved target prediction results: the average number of false positive among the top ranked predicted targets decreased and overall the rank of the true targets was improved. (4) Conclusion: Our method enables to correct databases statistical bias and reduces the number of false positive predictions, and therefore the number of useless experiments potentially undertaken.


2020 ◽  
Author(s):  
Lewis Mervin ◽  
Avid M. Afzal ◽  
Ola Engkvist ◽  
Andreas Bender

In the context of bioactivity prediction, the question of how to calibrate a score produced by a machine learning method into reliable probability of binding to a protein target is not yet satisfactorily addressed. In this study, we compared the performance of three such methods, namely Platt Scaling, Isotonic Regression and Venn-ABERS in calibrating prediction scores for ligand-target prediction comprising the Naïve Bayes, Support Vector Machines and Random Forest algorithms with bioactivity data available at AstraZeneca (40 million data points (compound-target pairs) across 2112 targets). Performance was assessed using Stratified Shuffle Split (SSS) and Leave 20% of Scaffolds Out (L20SO) validation.


2021 ◽  
Author(s):  
Norberto Sánchez-Cruz ◽  
Jose L. Medina-Franco

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>


Author(s):  
Dhruvil Shah ◽  
Devarsh Patel ◽  
Jainish Adesara ◽  
Pruthvi Hingu ◽  
Manan Shah

AbstractAlthough the education sector is improving more quickly than ever with the help of advancing technologies, there are still many areas yet to be discovered, and there will always be room for further enhancements. Two of the most disruptive technologies, machine learning (ML) and blockchain, have helped replace conventional approaches used in the education sector with highly technical and effective methods. In this study, a system is proposed that combines these two radiant technologies and helps resolve problems such as forgeries of educational records and fake degrees. The idea here is that if these technologies can be merged and a system can be developed that uses blockchain to store student data and ML to accurately predict the future job roles for students after graduation, the problems of further counterfeiting and insecurity in the student achievements can be avoided. Further, ML models will be used to train and predict valid data. This system will provide the university with an official decentralized database of student records who have graduated from there. In addition, this system provides employers with a platform where the educational records of the employees can be verified. Students can share their educational information in their e-portfolios on platforms such as LinkedIn, which is a platform for managing professional profiles. This allows students, companies, and other industries to find approval for student data more easily.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Scott Broderick ◽  
Ruhil Dongol ◽  
Tianmu Zhang ◽  
Krishna Rajan

AbstractThis paper introduces the use of topological data analysis (TDA) as an unsupervised machine learning tool to uncover classification criteria in complex inorganic crystal chemistries. Using the apatite chemistry as a template, we track through the use of persistent homology the topological connectivity of input crystal chemistry descriptors on defining similarity between different stoichiometries of apatites. It is shown that TDA automatically identifies a hierarchical classification scheme within apatites based on the commonality of the number of discrete coordination polyhedra that constitute the structural building units common among the compounds. This information is presented in the form of a visualization scheme of a barcode of homology classifications, where the persistence of similarity between compounds is tracked. Unlike traditional perspectives of structure maps, this new “Materials Barcode” schema serves as an automated exploratory machine learning tool that can uncover structural associations from crystal chemistry databases, as well as to achieve a more nuanced insight into what defines similarity among homologous compounds.


Sign in / Sign up

Export Citation Format

Share Document