Formalizing Data Locality in Task Parallel Applications

Scheduling Task-parallel Applications in Dynamically Asymmetric Environments

49th International Conference on Parallel Processing - ICPP : Workshops ◽

10.1145/3409390.3409408 ◽

2020 ◽

Author(s):

Jing Chen ◽

Pirah Noor Soomro ◽

Mustafa Abduljabbar ◽

Madhavan Manivannan ◽

Miquel Pericas

Keyword(s):

Parallel Applications ◽

Task Parallel

Download Full-text

Characterizing and Mitigating Work Time Inflation in Task Parallel Programs

Scientific Programming ◽

10.1155/2013/898597 ◽

2013 ◽

Vol 21 (3-4) ◽

pp. 123-136 ◽

Cited By ~ 1

Author(s):

Stephen L. Olivier ◽

Bronis R. de Supinski ◽

Martin Schulz ◽

Jan F. Prins

Keyword(s):

Poor Performance ◽

Data Access ◽

Parallel Applications ◽

Task Parallelism ◽

Improve Performance ◽

Additional Time ◽

Work Time ◽

Task Parallel ◽

Time Required ◽

Data Access Latency

Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, andwork time inflation– additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems. Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.

Download Full-text

Extreme-scale scripting: Opportunities for large task-parallel applications on petascale computers

Journal of Physics Conference Series ◽

10.1088/1742-6596/180/1/012046 ◽

2009 ◽

Vol 180 ◽

pp. 012046 ◽

Cited By ~ 13

Author(s):

Michael Wilde ◽

Ioan Raicu ◽

Allan Espinosa ◽

Zhao Zhang ◽

Ben Clifford ◽

...

Keyword(s):

Parallel Applications ◽

Task Parallel ◽

Extreme Scale

Download Full-text

Interactive visualization of cross-layer performance anomalies in dynamic task-parallel applications and systems

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) ◽

10.1109/ispass.2016.7482102 ◽

2016 ◽

Cited By ~ 5

Author(s):

Andi Drebes ◽

Antoniu Pop ◽

Karine Heydemann ◽

Albert Cohen

Keyword(s):

Interactive Visualization ◽

Parallel Applications ◽

Cross Layer ◽

Dynamic Task ◽

Task Parallel

Download Full-text

Impact study of data locality on task-based applications through the Heteroprio scheduler

PeerJ Computer Science ◽

10.7717/peerj-cs.190 ◽

2019 ◽

Vol 5 ◽

pp. e190 ◽

Cited By ~ 3

Author(s):

Bérenger Bramas

Keyword(s):

Heterogeneous Computing ◽

Fast Multipole Method ◽

Data Locality ◽

Parallel Applications ◽

Processing Unit ◽

Task Distribution ◽

Significant Performance ◽

The Right ◽

Dynamic Scheduler ◽

Simple Heuristics

The task-based approach has emerged as a viable way to effectively use modern heterogeneous computing nodes. It allows the development of parallel applications with an abstraction of the hardware by delegating task distribution and load balancing to a dynamic scheduler. In this organization, the scheduler is the most critical component that solves the DAG scheduling problem in order to select the right processing unit for the computation of each task. In this work, we extend our Heteroprio scheduler that was originally created to execute the fast multipole method on multi-GPUs nodes. We improve Heteroprio by taking into account data locality during task distribution. The main principle is to use different task-lists for the different memory nodes and to investigate how locality affinity between the tasks and the different memory nodes can be evaluated without looking at the tasks’ dependencies. We evaluate the benefit of our method on two linear algebra applications and a stencil code. We show that simple heuristics can provide significant performance improvement and cut by more than half the total memory transfer of an execution.

Download Full-text

User-defined Tools for Characterizing Task-Parallel Applications and Predicting Load Imbalance

10.1109/acomp53746.2021.00020 ◽

2021 ◽

Author(s):

Minh Thanh Chung ◽

Dieter Kranzlmuller

Keyword(s):

Parallel Applications ◽

Load Imbalance ◽

Task Parallel

Download Full-text

COHESION — A microkernel based Desktop Grid platform for irregular task-parallel applications

Future Generation Computer Systems ◽

10.1016/j.future.2007.06.005 ◽

2008 ◽

Vol 24 (5) ◽

pp. 354-370 ◽

Cited By ~ 17

Author(s):

Sven Schulz ◽

Wolfgang Blochinger ◽

Markus Held ◽

Clemens Dangelmayr

Keyword(s):

Parallel Applications ◽

Desktop Grid ◽

Task Parallel ◽

Grid Platform

Download Full-text

CHRT: A criticality- and heterogeneity-aware runtime system for task-parallel applications

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 ◽

10.23919/date.2017.7927126 ◽

2017 ◽

Cited By ~ 2

Author(s):

Myeonggyun Han ◽

Jinsu Park ◽

Woongki Baek

Keyword(s):

Parallel Applications ◽

Runtime System ◽

Task Parallel

Download Full-text

Visualization Aided Performance Tuning of Irregular Task-Parallel Computations

Information Visualization ◽

10.1057/palgrave.ivs.9500123 ◽

2006 ◽

Vol 5 (2) ◽

pp. 81-94 ◽

Cited By ~ 1

Author(s):

Wolfgang Blochinger ◽

Michael Kaufmann ◽

Martin Siebenhaller

Keyword(s):

Parallel Computations ◽

Parallel Programs ◽

Work Flow ◽

Performance Tuning ◽

Parallel Applications ◽

Development Environment ◽

Integrated Development ◽

Task Parallel ◽

Automatic Layout ◽

Individual Root

This paper deals with a visualization-based approach to performance analyzing and tuning of highly irregular task-parallel applications. At its core lies a novel automatic layout algorithm for execution graphs which is based on Sugiyama's framework. Our visualization enables the application designer to reliably detect manifestations of parallel overhead and to investigate on their individual root causes. We particularly focus on structural properties of task-parallel computations which are hard to detect in a more analytical way, for example, false sharing and false parallelism. In addition, we discuss embedding our visualization into an integrated development environment, realizing a seamless work-flow for implementation, execution, analysis, and tuning of parallel programs.

Download Full-text