Deriving Knowledge through Data Mining High-Throughput Screening Data

<div>We present a high-throughput computational study to identify novel polyimides (PIs) with exceptional refractive index (RI) values for use as optic or optoelectronic materials. Our study utilizes an RI prediction protocol based on a combination of first-principles and data modeling developed in previous work, which we employ on a large-scale PI candidate library generated with the ChemLG code. We deploy the virtual screening software ChemHTPS to automate the assessment of this extensive pool of PI structures in order to determine the performance potential of each candidate. This rapid and efficient approach yields a number of highly promising leads compounds. Using the data mining and machine learning program package ChemML, we analyze the top candidates with respect to prevalent structural features and feature combinations that distinguish them from less promising ones. In particular, we explore the utility of various strategies that introduce highly polarizable moieties into the PI backbone to increase its RI yield. The derived insights provide a foundation for rational and targeted design that goes beyond traditional trial-and-error searches.</div>

Download Full-text

Lessons Learned from High Throughput Screening of >250,000 Cells: New Cell Evaluation Methods and Data Mining Techniques

ECS Meeting Abstracts ◽

10.1149/ma2014-01/1/152 ◽

2014 ◽

Keyword(s):

Data Mining ◽

High Throughput ◽

High Throughput Screening ◽

Evaluation Methods ◽

Lessons Learned ◽

Data Mining Techniques

Download Full-text

Accelerated Discovery of High-Refractive-Index Polyimides via First-Principles Molecular Modeling, Virtual High-Throughput Screening, and Data Mining

The Journal of Physical Chemistry C ◽

10.1021/acs.jpcc.9b01147 ◽

2019 ◽

Vol 123 (23) ◽

pp. 14610-14618 ◽

Cited By ~ 9

Author(s):

Mohammad Atif Faiz Afzal ◽

Mojtaba Haghighatlari ◽

Sai Prasad Ganesh ◽

Chong Cheng ◽

Johannes Hachmann

Keyword(s):

Data Mining ◽

Refractive Index ◽

Molecular Modeling ◽

High Throughput ◽

First Principles ◽

High Throughput Screening ◽

High Refractive Index

Download Full-text

Outlier Mining in High Throughput Screening Experiments

CrossRef Listing of Deleted DOIs ◽

10.1177/108705710200700406 ◽

2002 ◽

Vol 7 (4) ◽

pp. 341-351 ◽

Cited By ~ 16

Author(s):

Michael F.M. Engels ◽

Luc Wouters ◽

Rudi Verbeeck ◽

Greet Vanhoof

Keyword(s):

Data Mining ◽

Biological Activity ◽

High Throughput ◽

High Throughput Screening ◽

False Negative ◽

Data Sets ◽

Data Set ◽

Structure Information ◽

Screening Experiments ◽

Mining Procedure

A data mining procedure for the rapid scoring of high-throughput screening (HTS) compounds is presented. The method is particularly useful for monitoring the quality of HTS data and tracking outliers in automated pharmaceutical or agrochemical screening, thus providing more complete and thorough structure-activity relationship (SAR) information. The method is based on the utilization of the assumed relationship between the structure of the screened compounds and the biological activity on a given screen expressed on a binary scale. By means of a data mining method, a SAR description of the data is developed that assigns probabilities of being a hit to each compound of the screen. Then, an inconsistency score expressing the degree of deviation between the adequacy of the SAR description and the actual biological activity is computed. The inconsistency score enables the identification of potential outliers that can be primed for validation experiments. The approach is particularly useful for detecting false-negative outliers and for identifying SAR-compliant hit/nonhit borderline compounds, both of which are classes of compounds that can contribute substantially to the development and understanding of robust SARs. In a first implementation of the method, one- and two-dimensional descriptors are used for encoding molecular structure information and logistic regression for calculating hits/nonhits probability scores. The approach was validated on three data sets, the first one from a publicly available screening data set and the second and third from in-house HTS screening campaigns. Because of its simplicity, robustness, and accuracy, the procedure is suitable for automation.

Download Full-text

Experimental Screening of Dihydrofolate Reductase Yields a “Test Set” of 50,000 Small Molecules for a Computational Data-Mining and Docking Competition

CrossRef Listing of Deleted DOIs ◽

10.1177/1087057105281173 ◽

2005 ◽

Vol 10 (7) ◽

pp. 653-657 ◽

Cited By ~ 22

Author(s):

Nadine H. Elowe ◽

Jan E. Blanchard ◽

Jonathan D. Cechetto ◽

Eric D. Brown

Keyword(s):

Escherichia Coli ◽

Data Mining ◽

Small Molecules ◽

High Throughput ◽

Dihydrofolate Reductase ◽

High Throughput Screening ◽

Valuable Resource ◽

Training Set ◽

Test Set ◽

Computational Data

High-throughput screening (HTS) generates an abundance of data that are a valuable resource to be mined. Dockers and data miners can use “real-world” HTS data to test and further develop their tools. A screen of 50,000 diverse small molecules was carried out against Escherichia coli dihydrofolate reductase (DHFR) and compared with a previous screen of 50,000 compounds against the same target. Identical assays and conditions were maintained for both studies. Prior to the completion of the second screen, the original screening data were publicly released for use as a “training set,” and computational chemists and data analysts were challenged to predict the activity of compounds in this second “test set.” Upon completion, the primary screen of the test set generated no potent inhibitors of DHFR activity.

Download Full-text

The Role of Data Mining in the Identification of Bioactive Compounds via High-Throughput Screening

Methods and Principles in Medicinal Chemistry - Data Mining in Drug Discovery ◽

10.1002/9783527655984.ch06 ◽

2013 ◽

pp. 131-154

Author(s):

Kamal Azzaoui ◽

John P. Priestle ◽

Thibault Varin ◽

Ansgar Schuffenhauer ◽

Jeremy L. Jenkins ◽

...

Keyword(s):

Data Mining ◽

Bioactive Compounds ◽

High Throughput ◽

High Throughput Screening

Download Full-text

Learning from the Data: Mining of Large High-Throughput Screening Databases.

ChemInform ◽

10.1002/chin.200707209 ◽

2007 ◽

Vol 38 (7) ◽

Author(s):

S. Frank Yan ◽

Frederick J. King ◽

Yun He ◽

Jeremy S. Caldwell ◽

Yingyao Zhou

Keyword(s):

Data Mining ◽

High Throughput ◽

High Throughput Screening

Download Full-text

Accelerated Discovery of High-Refractive-Index Polyimides via First-Principles Molecular Modeling, Virtual High-Throughput Screening, and Data Mining

10.26434/chemrxiv.7670903 ◽

2019 ◽

Author(s):

Mohammad Atif Faiz Afzal ◽

Mojtaba Haghighatlari ◽

Sai Prasad Ganesh ◽

Chong Cheng ◽

Johannes Hachmann

Keyword(s):

Data Mining ◽

Refractive Index ◽

High Throughput ◽

First Principles ◽

High Throughput Screening ◽

Large Scale ◽

Computational Study ◽

High Refractive Index ◽

Structural Features ◽

Learning Program

<div>We present a high-throughput computational study to identify novel polyimides (PIs) with exceptional refractive index (RI) values for use as optic or optoelectronic materials. Our study utilizes an RI prediction protocol based on a combination of first-principles and data modeling developed in previous work, which we employ on a large-scale PI candidate library generated with the ChemLG code. We deploy the virtual screening software ChemHTPS to automate the assessment of this extensive pool of PI structures in order to determine the performance potential of each candidate. This rapid and efficient approach yields a number of highly promising leads compounds. Using the data mining and machine learning program package ChemML, we analyze the top candidates with respect to prevalent structural features and feature combinations that distinguish them from less promising ones. In particular, we explore the utility of various strategies that introduce highly polarizable moieties into the PI backbone to increase its RI yield. The derived insights provide a foundation for rational and targeted design that goes beyond traditional trial-and-error searches.</div>

Download Full-text

High throughput drug profiling

Journal of Automated Methods and Management in Chemistry ◽

10.1155/s1463924600000304 ◽

2000 ◽

Vol 22 (6) ◽

pp. 171-173 ◽

Cited By ~ 6

Author(s):

Michael Entzeroth ◽

Béatrice Chapelain ◽

Jacques Guilbert ◽

Valérie Hamon

Keyword(s):

Data Mining ◽

Drug Discovery ◽

High Throughput ◽

High Throughput Screening ◽

Great Increase

High throughput screening has significantly contributed to advances in drug discovery. The great increase in the number of samples screened has been accompanied by increases in costs and in the data required for the investigated compounds. High throughput profiling addresses the issues of compound selectivity and specificity. It combines conventional screening with data mining technologies to give a full set of data, enabling development candidates to be more fully compared.

Download Full-text

Screening of Supports and Additives for Heteropoly Acid Catalysts for Friedel-Crafts Reaction by High-throughput Screening and Data Mining

Journal of the Japan Petroleum Institute ◽

10.1627/jpi.54.114 ◽

2011 ◽

Vol 54 (2) ◽

pp. 114-118 ◽

Cited By ~ 4

Author(s):

Kohji Omata

Keyword(s):

Data Mining ◽

High Throughput ◽

High Throughput Screening ◽

Heteropoly Acid ◽

Acid Catalysts

Download Full-text