Abstract
Background: 3D similarity is useful to predict the profiles of unprecedented molecular frameworks, 2D dissimilar to known compounds. Basically, when comparing compound pairs, 3D similarity of the pairs depends on conformational sampling of compounds, alignment method, chosen descriptors, and metric to show limited discriminative power. In addition to four factors, 3D chemocentric target prediction of an unknown compound requires compound - target associations. The associations for the target prediction replace compound-to-compound comparison with compound-to-target comparison. Results: Quantitative comparison of query compounds to target classes (one-to-group) could be acquired using two type similarity distributions: one is from maximum likelihood (ML) estimation of queries and another is from Gaussian mixture model (GMM) of target classes. While Jaccard-Tanimoto similarity of query-to-ligand pairs could be transformed into query distribution through ML estimation, the similarity of ligand pairs within each target class could be transformed into the representative distribution of a target class through GMM, hyperparameterized through expectation-maximization (EM) algorithm. To quantify the discriminativeness of a query ligand against target classes, Kullback-Leibler (K-L) divergence was calculated between two distributions.Conclusions: Stratified sampled 14K ligands from four target classes, estrogen receptor alpha (ESR), vitamin D receptor (VDR), cyclooxygenase-2 (COX2), and cathepsin D (CTSD) presented whether or not each query can be a representative ligand of each target through compared K-L divergence value. The feasibility index, Fm and the probability, from K-L divergence could summarize 3D chemocentric relationship between target classes.