In interactive e-learning environments such as Intelligent Tutoring Systems, there are pedagogical decisions to make at two main levels of granularity: whole problems and single steps. In recent years, there is growing interest in applying data-driven techniques for adaptive decision making that can dynamically tailor students' learning experiences. Most existing data-driven approaches, however, treat these pedagogical decisions equally, or independently, disregarding the long-term impact that tutor decisions may have across these two levels of granularity. In this paper, we propose and apply an offline Gaussian Processes based Hierarchical Reinforcement Learning (HRL) framework to induce a hierarchical pedagogical policy that makes decisions at both problem and step levels. An empirical classroom study shows that the HRL policy is significantly more effective than a Deep Q-Network (DQN) induced policy and a random yet reasonable baseline policy.