Abstract. Cloud condensation nuclei (CCN) number concentrations are an important aspect of aerosol–cloud interactions and the subsequent climate effects; however, their measurements are very limited. We use a machine learning tool, random decision forests, to develop a Random Forest Regression Model (RFRM) to derive CCN at 0.4 % supersaturation ([CCN0.4]) from commonly available measurements. The RFRM is trained on the long-term simulations in a global size-resolved particle microphysics model. Using atmospheric state and composition variables as predictors, through associations of their variabilities, the RFRM is able to learn the underlying dependence of [CCN0.4] on these predictors, which are: 8 fractions of PM2.5 (NH4, SO4, NO3, secondary organic aerosol (SOA), black carbon (BC), primary organic carbon (POC), dust, and salt), 7 gaseous species (NOx, NH3, O3, SO2, OH, isoprene, and monoterpene), and 4 meteorological variables (temperature (T), relative humidity (RH), precipitation, and solar radiation). The RFRM is highly robust: median mean fractional bias (MFB) of 4.4 % with ~ 96.33 % of the derived [CCN0.4] within a good agreement range of −60 %