AbstractAnimals are able to reach a desired state in an environment by controlling various behavioral patterns. Identification of the behavioral strategy used for this control is important for understanding animals’ decision-making and is fundamental to dissect information processing done by the nervous system. However, methods for quantifying such behavioral strategies have not been fully established. In this study, we developed an inverse reinforcement-learning (IRL) framework to identify an animal’s behavioral strategy from behavioral time-series data. As a particular target, we applied this framework to C. elegans thermotactic behavior; after cultivation at a constant temperature with or without food, the fed and starved worms prefer and avoid from the cultivation temperature on a thermal gradient, respectively. Our IRL approach revealed that the fed worms used both absolute and temporal derivative of temperature and that their strategy comprised mixture of two strategies: directed migration (DM) and isothermal migration (IM). The DM is a strategy that the worms efficiently reach to specific temperature, which explained thermotactic behaviors of the fed worms. The IM is a strategy that the worms track along a constant temperature, which reflects isothermal tracking well observed in previous studies. We also showed the neural basis underlying the strategies, by applying our method to thermosensory neuron-deficient worms. In contrast to fed animals, the strategy of starved animals indicated that they escaped the cultivation temperature using only absolute, but not temporal derivative of temperature. Thus, our IRL-based approach is capable of identifying animal strategies from behavioral time-series data and will be applicable to wide range of behavioral studies, including decision-making of other organisms.Author SummaryUnderstanding animal decision-making has been a fundamental problem in neuroscience and behavioral ecology. Many studies analyze actions that represent decision-making in behavioral tasks, in which rewards are artificially designed with specific objectives. However, it is impossible to extend this artificially designed experiment to a natural environment, because in a natural environment, the rewards for freely-behaving animals cannot be clearly defined. To this end, we must reverse the current paradigm so that rewards are identified from behavioral data. Here, we propose a new reverse-engineering approach (inverse reinforcement learning) that can estimate a behavioral strategy from time-series data of freely-behaving animals. By applying this technique with thermotaxis in C. elegans, we successfully identified the reward-based behavioral strategy.