Analysis of a High-Throughput Screening Data Set Using Potency-Scaled Molecular Similarity Algorithms

2007 ◽  
Vol 47 (2) ◽  
pp. 367-375 ◽  
Author(s):  
Ingo Vogt ◽  
Jürgen Bajorath
2001 ◽  
Vol 3 (3) ◽  
pp. 267-277 ◽  
Author(s):  
A. Michiel van Rhee ◽  
Jon Stocker ◽  
David Printzenhoff ◽  
Chris Creech ◽  
P. Kay Wagoner ◽  
...  

2013 ◽  
Vol 19 (3) ◽  
pp. 344-353 ◽  
Author(s):  
Keith R. Shockley

Quantitative high-throughput screening (qHTS) experiments can simultaneously produce concentration-response profiles for thousands of chemicals. In a typical qHTS study, a large chemical library is subjected to a primary screen to identify candidate hits for secondary screening, validation studies, or prediction modeling. Different algorithms, usually based on the Hill equation logistic model, have been used to classify compounds as active or inactive (or inconclusive). However, observed concentration-response activity relationships may not adequately fit a sigmoidal curve. Furthermore, it is unclear how to prioritize chemicals for follow-up studies given the large uncertainties that often accompany parameter estimates from nonlinear models. Weighted Shannon entropy can address these concerns by ranking compounds according to profile-specific statistics derived from estimates of the probability mass distribution of response at the tested concentration levels. This strategy can be used to rank all tested chemicals in the absence of a prespecified model structure, or the approach can complement existing activity call algorithms by ranking the returned candidate hits. The weighted entropy approach was evaluated here using data simulated from the Hill equation model. The procedure was then applied to a chemical genomics profiling data set interrogating compounds for androgen receptor agonist activity.


2002 ◽  
Vol 45 (14) ◽  
pp. 3082-3093 ◽  
Author(s):  
Susan Y. Tamura ◽  
Patricia A. Bacha ◽  
Heather S. Gruver ◽  
Ruth F. Nutt

2003 ◽  
Vol 101 (9) ◽  
pp. 1325-1328 ◽  
Author(s):  
MEIR GLICK ◽  
ANTHONY E. KLON ◽  
PIERRE ACKLIN ◽  
JOHN W. DAVIES

2010 ◽  
Vol 29 (8) ◽  
pp. 667-677 ◽  
Author(s):  
Edward J Calabrese ◽  
George R Hoffmann ◽  
Edward J Stanek ◽  
Marc A Nascarella

This article assesses the response below a toxicological threshold for 1888 antibacterial agents in Escherichia coli, using 11 concentrations with twofold concentration spacing in a high-throughput study. The data set had important strengths such as low variability in the control (2%—3% SD), a repeat measure of all wells, and a built-in replication. Bacterial growth at concentrations below the toxic threshold is significantly greater than that in the controls, consistent with a hormetic concentration response. These findings, along with analyses of published literature and complementary evaluations of concentration-response model predictions of low-concentration effects in yeast, indicate a lack of support for the broadly and historically accepted threshold model for responses to concentrations below the toxic threshold.


2002 ◽  
Vol 7 (4) ◽  
pp. 341-351 ◽  
Author(s):  
Michael F.M. Engels ◽  
Luc Wouters ◽  
Rudi Verbeeck ◽  
Greet Vanhoof

A data mining procedure for the rapid scoring of high-throughput screening (HTS) compounds is presented. The method is particularly useful for monitoring the quality of HTS data and tracking outliers in automated pharmaceutical or agrochemical screening, thus providing more complete and thorough structure-activity relationship (SAR) information. The method is based on the utilization of the assumed relationship between the structure of the screened compounds and the biological activity on a given screen expressed on a binary scale. By means of a data mining method, a SAR description of the data is developed that assigns probabilities of being a hit to each compound of the screen. Then, an inconsistency score expressing the degree of deviation between the adequacy of the SAR description and the actual biological activity is computed. The inconsistency score enables the identification of potential outliers that can be primed for validation experiments. The approach is particularly useful for detecting false-negative outliers and for identifying SAR-compliant hit/nonhit borderline compounds, both of which are classes of compounds that can contribute substantially to the development and understanding of robust SARs. In a first implementation of the method, one- and two-dimensional descriptors are used for encoding molecular structure information and logistic regression for calculating hits/nonhits probability scores. The approach was validated on three data sets, the first one from a publicly available screening data set and the second and third from in-house HTS screening campaigns. Because of its simplicity, robustness, and accuracy, the procedure is suitable for automation.


2009 ◽  
Vol 14 (10) ◽  
pp. 1236-1244 ◽  
Author(s):  
Swapan Chakrabarti ◽  
Stan R. Svojanovsky ◽  
Romana Slavik ◽  
Gunda I. Georg ◽  
George S. Wilson ◽  
...  

Artificial neural networks (ANNs) are trained using high-throughput screening (HTS) data to recover active compounds from a large data set. Improved classification performance was obtained on combining predictions made by multiple ANNs. The HTS data, acquired from a methionine aminopeptidases inhibition study, consisted of a library of 43,347 compounds, and the ratio of active to nonactive compounds, R A/N, was 0.0321. Back-propagation ANNs were trained and validated using principal components derived from the physicochemical features of the compounds. On selecting the training parameters carefully, an ANN recovers one-third of all active compounds from the validation set with a 3-fold gain in R A/N value. Further gains in RA/N values were obtained upon combining the predictions made by a number of ANNs. The generalization property of the back-propagation ANNs was used to train those ANNs with the same training samples, after being initialized with different sets of random weights. As a result, only 10% of all available compounds were needed for training and validation, and the rest of the data set was screened with more than a 10-fold gain of the original RA/N value. Thus, ANNs trained with limited HTS data might become useful in recovering active compounds from large data sets.


Genes ◽  
2020 ◽  
Vol 11 (10) ◽  
pp. 1181
Author(s):  
Yurika Fujita ◽  
Osamu Morita ◽  
Hiroshi Honda

In silico tools to predict genotoxicity have become important for high-throughput screening of chemical substances. However, current in silico tools to evaluate chromosomal damage do not discriminate in vitro-specific positives that can be followed by in vivo tests. Herein, we establish an in silico model for chromosomal damages with the following approaches: (1) re-categorizing a previous data set into three groups (positives, negatives, and misleading positives) according to current reports that use weight-of-evidence approaches and expert judgments; (2) utilizing a generalized linear model (Elastic Net) that uses partial structures of chemicals (organic functional groups) as explanatory variables of the statistical model; and (3) interpreting mode of action in terms of chemical structures identified. The accuracy of our model was 85.6%, 80.3%, and 87.9% for positive, negative, and misleading positive predictions, respectively. Selected organic functional groups in the models for positive prediction were reported to induce genotoxicity via various modes of actions (e.g., DNA adduct formation), whereas those for misleading positives were not clearly related to genotoxicity (e.g., low pH, cytotoxicity induction). Therefore, the present model may contribute to high-throughput screening in material design or drug discovery to verify the relevance of estimated positives considering their mechanisms of action.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Samuel Goodwin ◽  
Golnaz Shahtahmassebi ◽  
Quentin S. Hanley

Abstract High throughput screening (HTS) interrogates compound libraries to find those that are “active” in an assay. To better understand compound behavior in HTS, we assessed an existing binomial survivor function (BSF) model of “frequent hitters” using 872 publicly available HTS data sets. We found large numbers of “infrequent hitters” using this model leading us to reject the BSF for identifying “frequent hitters.” As alternatives, we investigated generalized logistic, gamma, and negative binomial distributions as models for compound behavior. The gamma model reduced the proportion of both frequent and infrequent hitters relative to the BSF. Within this data set, conclusions about individual compound behavior were limited by the number of times individual compounds were tested (1–1613 times) and disproportionate testing of some compounds. Specifically, most tests (78%) were on a 309,847-compound subset (17.6% of compounds) each tested ≥ 300 times. We concluded that the disproportionate retesting of some compounds represents compound repurposing at scale rather than drug discovery. The approach to drug discovery represented by these 872 data sets characterizes the assays well by challenging them with many compounds while each compound is characterized poorly with a single assay. Aggregating the testing information from each compound across the multiple screens yielded a continuum with no clear boundary between normal and frequent hitting compounds.


2013 ◽  
Vol 18 (9) ◽  
pp. 1121-1131 ◽  
Author(s):  
Xin Wei ◽  
Lin Gao ◽  
Xiaolei Zhang ◽  
Hong Qian ◽  
Karen Rowan ◽  
...  

High-throughput screening (HTS) has been widely used to identify active compounds (hits) that bind to biological targets. Because of cost concerns, the comprehensive screening of millions of compounds is typically conducted without replication. Real hits that fail to exhibit measurable activity in the primary screen due to random experimental errors will be lost as false-negatives. Conceivably, the projected false-negative rate is a parameter that reflects screening quality. Furthermore, it can be used to guide the selection of optimal numbers of compounds for hit confirmation. Therefore, a method that predicts false-negative rates from the primary screening data is extremely valuable. In this article, we describe the implementation of a pilot screen on a representative fraction (1%) of the screening library in order to obtain information about assay variability as well as a preliminary hit activity distribution profile. Using this training data set, we then developed an algorithm based on Bayesian logic and Monte Carlo simulation to estimate the number of true active compounds and potential missed hits from the full library screen. We have applied this strategy to five screening projects. The results demonstrate that this method produces useful predictions on the numbers of false negatives.


Sign in / Sign up

Export Citation Format

Share Document