Deriving Knowledge through Data Mining High-Throughput Screening Data

2004 ◽  
Vol 47 (25) ◽  
pp. 6373-6383 ◽  
Author(s):  
David J. Diller ◽  
Doug W. Hobbs
2019 ◽  
Author(s):  
Mohammad Atif Faiz Afzal ◽  
Mojtaba Haghighatlari ◽  
Sai Prasad Ganesh ◽  
Chong Cheng ◽  
Johannes Hachmann

<div>We present a high-throughput computational study to identify novel polyimides (PIs) with exceptional refractive index (RI) values for use as optic or optoelectronic materials. Our study utilizes an RI prediction protocol based on a combination of first-principles and data modeling developed in previous work, which we employ on a large-scale PI candidate library generated with the ChemLG code. We deploy the virtual screening software ChemHTPS to automate the assessment of this extensive pool of PI structures in order to determine the performance potential of each candidate. This rapid and efficient approach yields a number of highly promising leads compounds. Using the data mining and machine learning program package ChemML, we analyze the top candidates with respect to prevalent structural features and feature combinations that distinguish them from less promising ones. In particular, we explore the utility of various strategies that introduce highly polarizable moieties into the PI backbone to increase its RI yield. The derived insights provide a foundation for rational and targeted design that goes beyond traditional trial-and-error searches.</div>


2019 ◽  
Vol 123 (23) ◽  
pp. 14610-14618 ◽  
Author(s):  
Mohammad Atif Faiz Afzal ◽  
Mojtaba Haghighatlari ◽  
Sai Prasad Ganesh ◽  
Chong Cheng ◽  
Johannes Hachmann

2002 ◽  
Vol 7 (4) ◽  
pp. 341-351 ◽  
Author(s):  
Michael F.M. Engels ◽  
Luc Wouters ◽  
Rudi Verbeeck ◽  
Greet Vanhoof

A data mining procedure for the rapid scoring of high-throughput screening (HTS) compounds is presented. The method is particularly useful for monitoring the quality of HTS data and tracking outliers in automated pharmaceutical or agrochemical screening, thus providing more complete and thorough structure-activity relationship (SAR) information. The method is based on the utilization of the assumed relationship between the structure of the screened compounds and the biological activity on a given screen expressed on a binary scale. By means of a data mining method, a SAR description of the data is developed that assigns probabilities of being a hit to each compound of the screen. Then, an inconsistency score expressing the degree of deviation between the adequacy of the SAR description and the actual biological activity is computed. The inconsistency score enables the identification of potential outliers that can be primed for validation experiments. The approach is particularly useful for detecting false-negative outliers and for identifying SAR-compliant hit/nonhit borderline compounds, both of which are classes of compounds that can contribute substantially to the development and understanding of robust SARs. In a first implementation of the method, one- and two-dimensional descriptors are used for encoding molecular structure information and logistic regression for calculating hits/nonhits probability scores. The approach was validated on three data sets, the first one from a publicly available screening data set and the second and third from in-house HTS screening campaigns. Because of its simplicity, robustness, and accuracy, the procedure is suitable for automation.


2005 ◽  
Vol 10 (7) ◽  
pp. 653-657 ◽  
Author(s):  
Nadine H. Elowe ◽  
Jan E. Blanchard ◽  
Jonathan D. Cechetto ◽  
Eric D. Brown

High-throughput screening (HTS) generates an abundance of data that are a valuable resource to be mined. Dockers and data miners can use “real-world” HTS data to test and further develop their tools. A screen of 50,000 diverse small molecules was carried out against Escherichia coli dihydrofolate reductase (DHFR) and compared with a previous screen of 50,000 compounds against the same target. Identical assays and conditions were maintained for both studies. Prior to the completion of the second screen, the original screening data were publicly released for use as a “training set,” and computational chemists and data analysts were challenged to predict the activity of compounds in this second “test set.” Upon completion, the primary screen of the test set generated no potent inhibitors of DHFR activity.


Author(s):  
Kamal Azzaoui ◽  
John P. Priestle ◽  
Thibault Varin ◽  
Ansgar Schuffenhauer ◽  
Jeremy L. Jenkins ◽  
...  

ChemInform ◽  
2007 ◽  
Vol 38 (7) ◽  
Author(s):  
S. Frank Yan ◽  
Frederick J. King ◽  
Yun He ◽  
Jeremy S. Caldwell ◽  
Yingyao Zhou

2019 ◽  
Author(s):  
Mohammad Atif Faiz Afzal ◽  
Mojtaba Haghighatlari ◽  
Sai Prasad Ganesh ◽  
Chong Cheng ◽  
Johannes Hachmann

<div>We present a high-throughput computational study to identify novel polyimides (PIs) with exceptional refractive index (RI) values for use as optic or optoelectronic materials. Our study utilizes an RI prediction protocol based on a combination of first-principles and data modeling developed in previous work, which we employ on a large-scale PI candidate library generated with the ChemLG code. We deploy the virtual screening software ChemHTPS to automate the assessment of this extensive pool of PI structures in order to determine the performance potential of each candidate. This rapid and efficient approach yields a number of highly promising leads compounds. Using the data mining and machine learning program package ChemML, we analyze the top candidates with respect to prevalent structural features and feature combinations that distinguish them from less promising ones. In particular, we explore the utility of various strategies that introduce highly polarizable moieties into the PI backbone to increase its RI yield. The derived insights provide a foundation for rational and targeted design that goes beyond traditional trial-and-error searches.</div>


2000 ◽  
Vol 22 (6) ◽  
pp. 171-173 ◽  
Author(s):  
Michael Entzeroth ◽  
Béatrice Chapelain ◽  
Jacques Guilbert ◽  
Valérie Hamon

High throughput screening has significantly contributed to advances in drug discovery. The great increase in the number of samples screened has been accompanied by increases in costs and in the data required for the investigated compounds. High throughput profiling addresses the issues of compound selectivity and specificity. It combines conventional screening with data mining technologies to give a full set of data, enabling development candidates to be more fully compared.


Sign in / Sign up

Export Citation Format

Share Document