Analysis of a High-Throughput Screening Data Set Using Potency-Scaled Molecular Similarity Algorithms

Quantitative high-throughput screening (qHTS) experiments can simultaneously produce concentration-response profiles for thousands of chemicals. In a typical qHTS study, a large chemical library is subjected to a primary screen to identify candidate hits for secondary screening, validation studies, or prediction modeling. Different algorithms, usually based on the Hill equation logistic model, have been used to classify compounds as active or inactive (or inconclusive). However, observed concentration-response activity relationships may not adequately fit a sigmoidal curve. Furthermore, it is unclear how to prioritize chemicals for follow-up studies given the large uncertainties that often accompany parameter estimates from nonlinear models. Weighted Shannon entropy can address these concerns by ranking compounds according to profile-specific statistics derived from estimates of the probability mass distribution of response at the tested concentration levels. This strategy can be used to rank all tested chemicals in the absence of a prespecified model structure, or the approach can complement existing activity call algorithms by ranking the returned candidate hits. The weighted entropy approach was evaluated here using data simulated from the Hill equation model. The procedure was then applied to a chemical genomics profiling data set interrogating compounds for androgen receptor agonist activity.

Download Full-text

Data Analysis of High-Throughput Screening Results: Application of Multidomain Clustering to the NCI Anti-HIV Data Set

Journal of Medicinal Chemistry ◽

10.1021/jm010535i ◽

2002 ◽

Vol 45 (14) ◽

pp. 3082-3093 ◽

Cited By ~ 22

Author(s):

Susan Y. Tamura ◽

Patricia A. Bacha ◽

Heather S. Gruver ◽

Ruth F. Nutt

Keyword(s):

Data Analysis ◽

High Throughput ◽

High Throughput Screening ◽

Data Set ◽

Anti Hiv

Download Full-text

Prioritization of high throughput screening data of compound mixtures using molecular similarity

Molecular Physics ◽

10.1080/0026897031000099862 ◽

2003 ◽

Vol 101 (9) ◽

pp. 1325-1328 ◽

Cited By ~ 4

Author(s):

MEIR GLICK ◽

ANTHONY E. KLON ◽

PIERRE ACKLIN ◽

JOHN W. DAVIES

Keyword(s):

High Throughput ◽

High Throughput Screening ◽

Molecular Similarity

Download Full-text

Hormesis in high-throughput screening of antibacterial compounds in E coli

Human & Experimental Toxicology ◽

10.1177/0960327109358917 ◽

2010 ◽

Vol 29 (8) ◽

pp. 667-677 ◽

Cited By ~ 40

Author(s):

Edward J Calabrese ◽

George R Hoffmann ◽

Edward J Stanek ◽

Marc A Nascarella

Keyword(s):

High Throughput ◽

High Throughput Screening ◽

Antibacterial Agents ◽

Threshold Model ◽

Concentration Effects ◽

Data Set ◽

E Coli ◽

Model Predictions ◽

High Throughput Study ◽

Repeat Measure

This article assesses the response below a toxicological threshold for 1888 antibacterial agents in Escherichia coli, using 11 concentrations with twofold concentration spacing in a high-throughput study. The data set had important strengths such as low variability in the control (2%—3% SD), a repeat measure of all wells, and a built-in replication. Bacterial growth at concentrations below the toxic threshold is significantly greater than that in the controls, consistent with a hormetic concentration response. These findings, along with analyses of published literature and complementary evaluations of concentration-response model predictions of low-concentration effects in yeast, indicate a lack of support for the broadly and historically accepted threshold model for responses to concentrations below the toxic threshold.

Download Full-text

Outlier Mining in High Throughput Screening Experiments

CrossRef Listing of Deleted DOIs ◽

10.1177/108705710200700406 ◽

2002 ◽

Vol 7 (4) ◽

pp. 341-351 ◽

Cited By ~ 16

Author(s):

Michael F.M. Engels ◽

Luc Wouters ◽

Rudi Verbeeck ◽

Greet Vanhoof

Keyword(s):

Data Mining ◽

Biological Activity ◽

High Throughput ◽

High Throughput Screening ◽

False Negative ◽

Data Sets ◽

Data Set ◽

Structure Information ◽

Screening Experiments ◽

Mining Procedure

A data mining procedure for the rapid scoring of high-throughput screening (HTS) compounds is presented. The method is particularly useful for monitoring the quality of HTS data and tracking outliers in automated pharmaceutical or agrochemical screening, thus providing more complete and thorough structure-activity relationship (SAR) information. The method is based on the utilization of the assumed relationship between the structure of the screened compounds and the biological activity on a given screen expressed on a binary scale. By means of a data mining method, a SAR description of the data is developed that assigns probabilities of being a hit to each compound of the screen. Then, an inconsistency score expressing the degree of deviation between the adequacy of the SAR description and the actual biological activity is computed. The inconsistency score enables the identification of potential outliers that can be primed for validation experiments. The approach is particularly useful for detecting false-negative outliers and for identifying SAR-compliant hit/nonhit borderline compounds, both of which are classes of compounds that can contribute substantially to the development and understanding of robust SARs. In a first implementation of the method, one- and two-dimensional descriptors are used for encoding molecular structure information and logistic regression for calculating hits/nonhits probability scores. The approach was validated on three data sets, the first one from a publicly available screening data set and the second and third from in-house HTS screening campaigns. Because of its simplicity, robustness, and accuracy, the procedure is suitable for automation.

Download Full-text

Artificial Neural Network—Based Analysis of High-Throughput Screening Data for Improved Prediction of Active Compounds

CrossRef Listing of Deleted DOIs ◽

10.1177/1087057109351312 ◽

2009 ◽

Vol 14 (10) ◽

pp. 1236-1244 ◽

Cited By ~ 5

Author(s):

Swapan Chakrabarti ◽

Stan R. Svojanovsky ◽

Romana Slavik ◽

Gunda I. Georg ◽

George S. Wilson ◽

...

Keyword(s):

High Throughput ◽

High Throughput Screening ◽

Back Propagation ◽

Large Data ◽

Classification Performance ◽

Data Sets ◽

Data Set ◽

Active Compounds ◽

Physicochemical Features ◽

Artificial Neural

Artificial neural networks (ANNs) are trained using high-throughput screening (HTS) data to recover active compounds from a large data set. Improved classification performance was obtained on combining predictions made by multiple ANNs. The HTS data, acquired from a methionine aminopeptidases inhibition study, consisted of a library of 43,347 compounds, and the ratio of active to nonactive compounds, R A/N, was 0.0321. Back-propagation ANNs were trained and validated using principal components derived from the physicochemical features of the compounds. On selecting the training parameters carefully, an ANN recovers one-third of all active compounds from the validation set with a 3-fold gain in R A/N value. Further gains in RA/N values were obtained upon combining the predictions made by a number of ANNs. The generalization property of the back-propagation ANNs was used to train those ANNs with the same training samples, after being initialized with different sets of random weights. As a result, only 10% of all available compounds were needed for training and validation, and the rest of the data set was screened with more than a 10-fold gain of the original RA/N value. Thus, ANNs trained with limited HTS data might become useful in recovering active compounds from large data sets.

Download Full-text

In Silico Model for Chemical-Induced Chromosomal Damages Elucidates Mode of Action and Irrelevant Positives

Genes ◽

10.3390/genes11101181 ◽

2020 ◽

Vol 11 (10) ◽

pp. 1181

Author(s):

Yurika Fujita ◽

Osamu Morita ◽

Hiroshi Honda

Keyword(s):

Functional Groups ◽

High Throughput ◽

Mode Of Action ◽

In Silico ◽

High Throughput Screening ◽

Data Set ◽

Chemical Structures ◽

In Silico Model ◽

Organic Functional Groups ◽

In Silico Tools

In silico tools to predict genotoxicity have become important for high-throughput screening of chemical substances. However, current in silico tools to evaluate chromosomal damage do not discriminate in vitro-specific positives that can be followed by in vivo tests. Herein, we establish an in silico model for chromosomal damages with the following approaches: (1) re-categorizing a previous data set into three groups (positives, negatives, and misleading positives) according to current reports that use weight-of-evidence approaches and expert judgments; (2) utilizing a generalized linear model (Elastic Net) that uses partial structures of chemicals (organic functional groups) as explanatory variables of the statistical model; and (3) interpreting mode of action in terms of chemical structures identified. The accuracy of our model was 85.6%, 80.3%, and 87.9% for positive, negative, and misleading positive predictions, respectively. Selected organic functional groups in the models for positive prediction were reported to induce genotoxicity via various modes of actions (e.g., DNA adduct formation), whereas those for misleading positives were not clearly related to genotoxicity (e.g., low pH, cytotoxicity induction). Therefore, the present model may contribute to high-throughput screening in material design or drug discovery to verify the relevance of estimated positives considering their mechanisms of action.

Download Full-text

Statistical models for identifying frequent hitters in high throughput screening

Scientific Reports ◽

10.1038/s41598-020-74139-0 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Samuel Goodwin ◽

Golnaz Shahtahmassebi ◽

Quentin S. Hanley

Keyword(s):

Drug Discovery ◽

High Throughput ◽

High Throughput Screening ◽

Negative Binomial ◽

Data Sets ◽

Individual Compound ◽

Data Set ◽

Binomial Distributions ◽

Compound Libraries ◽

Large Numbers

Abstract High throughput screening (HTS) interrogates compound libraries to find those that are “active” in an assay. To better understand compound behavior in HTS, we assessed an existing binomial survivor function (BSF) model of “frequent hitters” using 872 publicly available HTS data sets. We found large numbers of “infrequent hitters” using this model leading us to reject the BSF for identifying “frequent hitters.” As alternatives, we investigated generalized logistic, gamma, and negative binomial distributions as models for compound behavior. The gamma model reduced the proportion of both frequent and infrequent hitters relative to the BSF. Within this data set, conclusions about individual compound behavior were limited by the number of times individual compounds were tested (1–1613 times) and disproportionate testing of some compounds. Specifically, most tests (78%) were on a 309,847-compound subset (17.6% of compounds) each tested ≥ 300 times. We concluded that the disproportionate retesting of some compounds represents compound repurposing at scale rather than drug discovery. The approach to drug discovery represented by these 872 data sets characterizes the assays well by challenging them with many compounds while each compound is characterized poorly with a single assay. Aggregating the testing information from each compound across the multiple screens yielded a continuum with no clear boundary between normal and frequent hitting compounds.

Download Full-text

Introducing Bayesian Thinking to High-Throughput Screening for False-Negative Rate Estimation

CrossRef Listing of Deleted DOIs ◽

10.1177/1087057113491495 ◽

2013 ◽

Vol 18 (9) ◽

pp. 1121-1131 ◽

Cited By ~ 1

Author(s):

Xin Wei ◽

Lin Gao ◽

Xiaolei Zhang ◽

Hong Qian ◽

Karen Rowan ◽

...

Keyword(s):

High Throughput ◽

High Throughput Screening ◽

False Negative ◽

False Negative Rate ◽

Training Data ◽

Distribution Profile ◽

False Negatives ◽

Data Set ◽

Active Compounds ◽

Negative Rate

High-throughput screening (HTS) has been widely used to identify active compounds (hits) that bind to biological targets. Because of cost concerns, the comprehensive screening of millions of compounds is typically conducted without replication. Real hits that fail to exhibit measurable activity in the primary screen due to random experimental errors will be lost as false-negatives. Conceivably, the projected false-negative rate is a parameter that reflects screening quality. Furthermore, it can be used to guide the selection of optimal numbers of compounds for hit confirmation. Therefore, a method that predicts false-negative rates from the primary screening data is extremely valuable. In this article, we describe the implementation of a pilot screen on a representative fraction (1%) of the screening library in order to obtain information about assay variability as well as a preliminary hit activity distribution profile. Using this training data set, we then developed an algorithm based on Bayesian logic and Monte Carlo simulation to estimate the number of true active compounds and potential missed hits from the full library screen. We have applied this strategy to five screening projects. The results demonstrate that this method produces useful predictions on the numbers of false negatives.

Download Full-text