Impact study of data locality on task-based applications through the Heteroprio scheduler
The task-based approach has gained much attention to use modern heterogeneous computing nodes. It allows parallelizing with an abstraction of the hardware by delegating task distribution and load balancing to a dynamic scheduler. In this organization, the scheduler is the most critical component that solves the DAG-scheduling problem in order to select the right processing unit for the computation of each task. In this work, we extend our Heteroprio scheduler that was originally created to execute the fast multipole method on multi-GPUs nodes. We improve Heteroprio by taking into account data locality during task assignation. The main principle is to use different task-lists for the different memory nodes and to investigate how locality affinity between the tasks and the different memory nodes can be evaluated without looking at the tasks' dependencies. The interest of the present method was evaluated on two linear algebra applications and a stencil code. It was deduced that simple heuristics can provide significant performance improvement and cut by more than half the total memory transfer of an execution.