A curated diverse molecular database of blood-brain barrier permeability with chemical descriptors
AbstractThe highly-selective blood-brain barrier (BBB) prevents neurotoxic substances in blood from crossing into the extracellular fluid of the central nervous system (CNS). As such, the BBB has a close relationship with CNS disease development and treatment, so predicting whether a substance crosses the BBB is a key task in lead discovery for CNS drugs. Machine learning (ML) is a promising strategy for predicting the BBB permeability, but existing studies have been limited by small datasets with limited chemical diversity. To mitigate this issue, we present a large benchmark dataset, B3DB, complied from 50 published resources and categorized based on experimental uncertainty. A subset of the molecules in B3DB has numerical log BB values (1058 compounds), while the whole dataset has categorical (BBB+ or BBB−) BBB permeability labels (7807). The dataset is freely available at https://github.com/theochem/B3DB and 10.6084/m9.figshare.15634230.v3 (version 3). We also provide some physicochemical properties of the molecules. By analyzing these properties, we can demonstrate some physiochemical similarities and differences between BBB+ and BBB− compounds.