Tiling arbitrarily nested loops by means of the transitive

Abstract A novel approach to generation of tiled code for arbitrarily nested loops is presented. It is derived via a combination of the polyhedral and iteration space slicing frameworks. Instead of program transformations represented by a set of affine functions, one for each statement, it uses the transitive closure of a loop nest dependence graph to carry out corrections of original rectangular tiles so that all dependences of the original loop nest are preserved under the lexicographic order of target tiles. Parallel tiled code can be generated on the basis of valid serial tiled code by means of applying affine transformations or transitive closure using on input an inter-tile dependence graph whose vertices are represented by target tiles while edges connect dependent target tiles. We demonstrate how a relation describing such a graph can be formed. The main merit of the presented approach in comparison with the well-known ones is that it does not require full permutability of loops to generate both serial and parallel tiled codes; this increases the scope of loop nests to be tiled.

Download Full-text

GENERALISING THE UNIMODULAR APPROACH TO RESTRUCTURE IMPERFECTLY NESTED LOOPS

Parallel Processing Letters ◽

10.1142/s0129626496000388 ◽

1996 ◽

Vol 06 (03) ◽

pp. 401-414

Author(s):

Jingling Xue

Keyword(s):

Code Generation ◽

Lexicographic Order ◽

Generation Algorithm ◽

Loop Nest ◽

Loop Nests ◽

Iteration Space ◽

New Concepts ◽

Affine Constraints ◽

Nested Loops ◽

Unimodular Transformations

Although overcoming some limitations of the generate-and-test approach, unimodular transformations are limited to perfect loop nests only. Extending the unimodular approach, this paper describes a framework that enables the use of unimodular transformations to restructure imperfect loop nests. The concepts used previously for perfect loop nests, such as iteration vector, iteration space and lexicographic order, are generalised and some new concepts like preorder tree are introduced. Multiple unimodular transformations are allowed, one each statement in the loop nest. A code generation algorithm is provided that produces a possibly imperfect loop nest to scan an iteration space that is given as a union of sets of affine constraints.

Download Full-text

Parallel tiled Nussinov RNA folding loop nest generated using both dependence graph transitive closure and loop skewing

BMC Bioinformatics ◽

10.1186/s12859-017-1707-8 ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 9

Author(s):

Marek Palkowski ◽

Wlodzimierz Bielecki

Keyword(s):

Rna Folding ◽

Transitive Closure ◽

Dependence Graph ◽

Loop Nest ◽

Loop Skewing

Download Full-text

TRANSFORMATION OF NESTED LOOPS WITH MODULO INDEXING TO AFFINE RECURRENCES

Parallel Processing Letters ◽

10.1142/s0129626494000260 ◽

1994 ◽

Vol 04 (03) ◽

pp. 271-280 ◽

Cited By ~ 6

Author(s):

FLORIN BALASA ◽

FRANK H.M. FRANSSEN ◽

FRANCKY V.M. CATTHOOR ◽

HUGO J. DE MAN

Keyword(s):

Code Generation ◽

State Of The Art ◽

Transformation Method ◽

Control Flow ◽

Code Optimization ◽

Transformation Techniques ◽

Hermite Normal Form ◽

Nested Loops ◽

Affine Functions ◽

Systems Transformation

For multi-dimensional (M-D) signal and data processing systems, transformation of algorithmic specifications is a major instrument both in code optimization and code generation for parallelizing compilers and in control flow optimization as a preprocessor for architecture synthesis. State-of-the-art transformation techniques are limited to affine index expressions. This is however not sufficient for many important applications in image, speech and numerical processing. In this paper, a novel transformation method is introduced, oriented to the subclass of algorithm specifications that contains modulo expressions of affine functions to index M-D signals. The method employs extensively the concept of Hermite normal form. The transformation method can be carried out in polynomial time, applying only integer arithmetic.

Download Full-text

Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling

Parallel Processing Letters ◽

10.1142/s0129626497000383 ◽

1997 ◽

Vol 07 (04) ◽

pp. 379-392 ◽

Cited By ~ 20

Author(s):

Alain Darte ◽

Georges-André Silber ◽

Frédéric Vivien

Keyword(s):

Dependence Graph ◽

Loop Tiling ◽

Loop Parallelization ◽

Fine Grain ◽

Loop Body ◽

Nested Loops ◽

Single Block ◽

The Way

Tiling is a technique used for exploiting medium-grain parallelism in nested loops. It relies on a first step that detects sets of permutable nested loops. All algorithms developed so far consider the statements of the loop body as a single block, in other words, they are not able to take advantage of the structure of dependences between different statements. In this paper, we overcame this limitation by showing how the structure of the reduced dependence graph can be taken into account for detecting more permutable loops. Our method combines graph retiming techniques and graph scheduling techniques. It can be viewed as an extension of Wolf and Lam's algorithm to the case of loops with multiple statements. Loan independent dependences play a particular role in our study, and we show how the way we handle them can be useful for fine-grain loop parallelization as well.

Download Full-text

Transitive Closure of a Union of Dependence Relations for Parameterized Perfectly-Nested Loops

Lecture Notes in Computer Science - Parallel Computing Technologies ◽

10.1007/978-3-642-39958-9_4 ◽

2013 ◽

pp. 37-50 ◽

Cited By ~ 2

Author(s):

Włodzimierz Bielecki ◽

Krzysztof Kraska ◽

Tomasz Klimek

Keyword(s):

Transitive Closure ◽

Dependence Relations ◽

Nested Loops

Download Full-text

Tile Merging Technique to Generate Valid Tiled Code by Means of the Transitive Closure of a Dependence Graph

Hard and Soft Computing for Artificial Intelligence, Multimedia and Security - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-319-48429-7_29 ◽

2016 ◽

pp. 315-327

Author(s):

Włodzimierz Bielecki ◽

Piotr Skotnicki

Keyword(s):

Transitive Closure ◽

Dependence Graph

Download Full-text

Concurrent Start Tiling of Stencil Computations based on the Transitive Closure of a Data Dependence Graph

PRZEGLĄD ELEKTROTECHNICZNY ◽

10.15199/48.2015.11.41 ◽

2015 ◽

Vol 1 (11) ◽

pp. 169-172 ◽

Cited By ~ 1

Author(s):

Włodzimierz BIELECKI

Keyword(s):

Transitive Closure ◽

Data Dependence ◽

Dependence Graph ◽

Stencil Computations

Download Full-text

ON THE ALIGNMENT PROBLEM

Parallel Processing Letters ◽

10.1142/s0129626494000259 ◽

1994 ◽

Vol 04 (03) ◽

pp. 259-270 ◽

Cited By ~ 12

Author(s):

ALAIN DARTE ◽

YVES ROBERT

Keyword(s):

Distributed Memory ◽

Parallel Computers ◽

Dependence Graph ◽

Communication Graph ◽

Mapping Problem ◽

Loop Nests ◽

Alignment Problem ◽

Np Complete ◽

Alignment Heuristic

This paper deals with the problem of aligning data and computations when mapping uniform or affine loop nests onto SPMD distributed memory parallel computers. For affine loop nests we formulate the problem by introducing the communication graph, which can be viewed as the counterpart for the mapping problem of the dependence graph for scheduling. We illustrate the approach with several examples to show the difficulty of the problem. In the simplest case, that of perfect loop nests with uniform dependences, we show that minimizing the number of communications is NP-complete, although we are able to derive a good alignment heuristic in most practical cases.

Download Full-text

Obtaining Affine Transformations to Improve Locality of Loop Nests

Programming and Computer Software ◽

10.1007/s11086-005-0036-2 ◽

2005 ◽

Vol 31 (5) ◽

pp. 270-281

Author(s):

N. A. Likhoded ◽

S. V. Bakhanovich ◽

A. V. Zherelo

Keyword(s):

Affine Transformations ◽

Loop Nests

Download Full-text

Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3485137 ◽

2022 ◽

Vol 19 (1) ◽

pp. 1-26

Author(s):

Prasanth Chatarasi ◽

Hyoukjun Kwon ◽

Angshuman Parashar ◽

Michael Pellauer ◽

Tushar Krishna ◽

...

Keyword(s):

Deep Learning ◽

Cost Model ◽

Cost Models ◽

Mapping Space ◽

Loop Nest ◽

Loop Nests ◽

Higher Dimensional ◽

On Chip ◽

The Cost ◽

Dimensional Mapping

A spatial accelerator’s efficiency depends heavily on both its mapper and cost models to generate optimized mappings for various operators of DNN models. However, existing cost models lack a formal boundary over their input programs (operators) for accurate and tractable cost analysis of the mappings, and this results in adaptability challenges to the cost models for new operators. We consider the recently introduced Maestro Data-Centric (MDC) notation and its analytical cost model to address this challenge because any mapping expressed in the notation is precisely analyzable using the MDC’s cost model. In this article, we characterize the set of input operators and their mappings expressed in the MDC notation by introducing a set of conformability rules . The outcome of these rules is that any loop nest that is perfectly nested with affine tensor subscripts and without conditionals is conformable to the MDC notation. A majority of the primitive operators in deep learning are such loop nests. In addition, our rules enable us to automatically translate a mapping expressed in the loop nest form to MDC notation and use the MDC’s cost model to guide upstream mappers. Our conformability rules over the input operators result in a structured mapping space of the operators, which enables us to introduce a mapper based on our decoupled off-chip/on-chip approach to accelerate mapping space exploration. Our mapper decomposes the original higher-dimensional mapping space of operators into two lower-dimensional off-chip and on-chip subspaces and then optimizes the off-chip subspace followed by the on-chip subspace. We implemented our overall approach in a tool called Marvel , and a benefit of our approach is that it applies to any operator conformable with the MDC notation. We evaluated Marvel over major DNN operators and compared it with past optimizers.

Download Full-text