The
acid dissociation constant p<i>K</i><sub>a</sub>
dictates a molecule’s ionic status, and is a critical physicochemical property
in rationalizing acid-base chemistry in solution and in many biological
contexts. Although numerous theoretic approaches have been developed for
predicating aqueous p<i>K</i><sub>a</sub>, fast
and accurate prediction of non-aqueous p<i>K</i><sub>a</sub>s
has remained a major challenge. On the basis of <i>i</i>BonD experimental p<i>K</i><sub>a</sub>
database curated across 39 solvents, a holistic p<i>K</i><sub>a</sub> prediction model was established by using machine
learning approach. Structural and physical organic parameters combined
descriptors (SPOC) were introduced to represent the electronic and structural
features of molecules. With SPOC and ionic status labelling (ISL), the holistic models trained with neural network or XGBoost algorithm
showed the best prediction performance <a>with MAE value as
low as 0.87</a> p<i>K</i><sub>a</sub> unit. The
holistic model showed better performance than all the tested single-solvent
models (SSMs), verifying the transfer learning features. The capability of
prediction in diverse solvents allows for a comprehensive mapping of all the
possible p<i>K</i><sub>a</sub> correlations
between different solvents. The <i>i</i>BonD
holistic model was validated by prediction of aqueous p<i>K</i><sub>a</sub> and micro-p<i>K</i><sub>a</sub>
of pharmaceutical molecules and p<i>K</i><sub>a</sub>s
of organocatalysts in DMSO and MeCN with high accuracy. An on-line prediction platform
(<a href="http://pka.luoszgroup.com/">http://pka.luoszgroup.com</a>) was constructed based on the current model.