<p>Maximum diversification
of data is a central theme in building generalized and accurate machine
learning (ML) models. In chemistry, ML has been used to develop models for
predicting molecular properties, for example quantum mechanics (QM) calculated
potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx
ML-based eneral-purpose potentials for organic molecules were developed through
active learning; an automated data diversification process. Here, we describe
the ANI-1x and ANI-1ccx data sets. To demonstrate data set diversity, we
visualize them with a dimensionality reduction scheme, and contrast against
existing data sets. The ANI-1x data set contains multiple QM properties from 5M
density functional theory calculations, while the ANI-1ccx data set contains
500k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately
14 million CPU core-hours were expended to generate this data. Multiple QM
properties from density functional theory and coupled cluster are provided:
energies, atomic forces, multipole moments, atomic charges, and more. We
provide this data to the community to aid research and development of ML models
for chemistry.</p>