Predicting natural language descriptions of smells
There has been recent progress in predicting whether common verbal descriptors such as “fishy”, “floral” or “fruity” apply to the smell of odorous molecules. However, the number of descriptors for which such a prediction is possible to date is very small compared to the large number of descriptors that have been suggested for the profiling of smells. We show here that the use of natural language semantic representations on a small set of general olfactory perceptual descriptors allows for the accurate inference of perceptual ratings for mono-molecular odorants over a large and potentially arbitrary set of descriptors. This is a noteworthy approach given that the prevailing view is that human’s capacity to identify or characterize odors by name is poor [1, 2, 3, 4, 5]. Our methods, when combined with a molecule-to-ratings model using chemoinformatic features, also allow for the zero-shot learning inference [6, 7] of perceptual ratings for arbitrary molecules. We successfully applied our semantics-based approach to predict perceptual ratings with an accuracy higher than 0.5 for up to 70 olfactory perceptual descriptors in a well-known dataset, a ten-fold increase in the number of descriptors from previous attempts. Moreover we accurately predict paradigm odors of four common families of molecules with an AUC of up to 0.75. Our approach solves the need for the consuming task of handcrafting domain specific sets of descriptors in olfaction and collecting ratings for large numbers of descriptors and odorants [8, 9, 10, 11] while establishing that the semantic distance between descriptors defines the equivalent of an odorwheel.