<p>In recent years, deep molecular generative models have emerged as novel methods for <i>de
novo</i> molecular design. Thanks to the rapid advance of deep learning techniques, deep learning architectures such as recurrent neural networks,
generative autoencoders, and adversarial networks, to give a few examples,
have been employed for constructing generative models. However, so far the metrics used to evaluate these
deep generative models are not discriminative enough to separate the performance of various
state-of-the-art generative models. This work presents a novel metric for evaluating
deep molecular generative models; this new metric is based on the chemical
space coverage of a reference database, and compares not only the molecular
structures, but also the ring systems and functional groups, reproduced from a
reference dataset of 1M structures. In this study, the performance of 7
different molecular generative models was compared by calculating their
structure and substructure coverage of the GDB-13 database while using a 1M
subset of GDB-13 for training. Our study shows that the performance of various
generative models varies significantly using the benchmarking metrics
introduced herein, such that generalization capability of the generative model
can be clearly differentiated. Additionally, the coverage of ring systems and
functional groups existing in GDB-13 was also compared between the models. Our
study provides a useful new metric
that can be used for evaluating and comparing generative models.</p>