scholarly journals Tiling arbitrarily nested loops by means of the transitive

2016 ◽  
Vol 26 (4) ◽  
pp. 919-939 ◽  
Author(s):  
Włodzimierz Bielecki ◽  
Marek Pałkowski

Abstract A novel approach to generation of tiled code for arbitrarily nested loops is presented. It is derived via a combination of the polyhedral and iteration space slicing frameworks. Instead of program transformations represented by a set of affine functions, one for each statement, it uses the transitive closure of a loop nest dependence graph to carry out corrections of original rectangular tiles so that all dependences of the original loop nest are preserved under the lexicographic order of target tiles. Parallel tiled code can be generated on the basis of valid serial tiled code by means of applying affine transformations or transitive closure using on input an inter-tile dependence graph whose vertices are represented by target tiles while edges connect dependent target tiles. We demonstrate how a relation describing such a graph can be formed. The main merit of the presented approach in comparison with the well-known ones is that it does not require full permutability of loops to generate both serial and parallel tiled codes; this increases the scope of loop nests to be tiled.

1996 ◽  
Vol 06 (03) ◽  
pp. 401-414
Author(s):  
Jingling Xue

Although overcoming some limitations of the generate-and-test approach, unimodular transformations are limited to perfect loop nests only. Extending the unimodular approach, this paper describes a framework that enables the use of unimodular transformations to restructure imperfect loop nests. The concepts used previously for perfect loop nests, such as iteration vector, iteration space and lexicographic order, are generalised and some new concepts like preorder tree are introduced. Multiple unimodular transformations are allowed, one each statement in the loop nest. A code generation algorithm is provided that produces a possibly imperfect loop nest to scan an iteration space that is given as a union of sets of affine constraints.


1994 ◽  
Vol 04 (03) ◽  
pp. 271-280 ◽  
Author(s):  
FLORIN BALASA ◽  
FRANK H.M. FRANSSEN ◽  
FRANCKY V.M. CATTHOOR ◽  
HUGO J. DE MAN

For multi-dimensional (M-D) signal and data processing systems, transformation of algorithmic specifications is a major instrument both in code optimization and code generation for parallelizing compilers and in control flow optimization as a preprocessor for architecture synthesis. State-of-the-art transformation techniques are limited to affine index expressions. This is however not sufficient for many important applications in image, speech and numerical processing. In this paper, a novel transformation method is introduced, oriented to the subclass of algorithm specifications that contains modulo expressions of affine functions to index M-D signals. The method employs extensively the concept of Hermite normal form. The transformation method can be carried out in polynomial time, applying only integer arithmetic.


1997 ◽  
Vol 07 (04) ◽  
pp. 379-392 ◽  
Author(s):  
Alain Darte ◽  
Georges-André Silber ◽  
Frédéric Vivien

Tiling is a technique used for exploiting medium-grain parallelism in nested loops. It relies on a first step that detects sets of permutable nested loops. All algorithms developed so far consider the statements of the loop body as a single block, in other words, they are not able to take advantage of the structure of dependences between different statements. In this paper, we overcame this limitation by showing how the structure of the reduced dependence graph can be taken into account for detecting more permutable loops. Our method combines graph retiming techniques and graph scheduling techniques. It can be viewed as an extension of Wolf and Lam's algorithm to the case of loops with multiple statements. Loan independent dependences play a particular role in our study, and we show how the way we handle them can be useful for fine-grain loop parallelization as well.


1994 ◽  
Vol 04 (03) ◽  
pp. 259-270 ◽  
Author(s):  
ALAIN DARTE ◽  
YVES ROBERT

This paper deals with the problem of aligning data and computations when mapping uniform or affine loop nests onto SPMD distributed memory parallel computers. For affine loop nests we formulate the problem by introducing the communication graph, which can be viewed as the counterpart for the mapping problem of the dependence graph for scheduling. We illustrate the approach with several examples to show the difficulty of the problem. In the simplest case, that of perfect loop nests with uniform dependences, we show that minimizing the number of communications is NP-complete, although we are able to derive a good alignment heuristic in most practical cases.


2005 ◽  
Vol 31 (5) ◽  
pp. 270-281
Author(s):  
N. A. Likhoded ◽  
S. V. Bakhanovich ◽  
A. V. Zherelo

2022 ◽  
Vol 19 (1) ◽  
pp. 1-26
Author(s):  
Prasanth Chatarasi ◽  
Hyoukjun Kwon ◽  
Angshuman Parashar ◽  
Michael Pellauer ◽  
Tushar Krishna ◽  
...  

A spatial accelerator’s efficiency depends heavily on both its mapper and cost models to generate optimized mappings for various operators of DNN models. However, existing cost models lack a formal boundary over their input programs (operators) for accurate and tractable cost analysis of the mappings, and this results in adaptability challenges to the cost models for new operators. We consider the recently introduced Maestro Data-Centric (MDC) notation and its analytical cost model to address this challenge because any mapping expressed in the notation is precisely analyzable using the MDC’s cost model. In this article, we characterize the set of input operators and their mappings expressed in the MDC notation by introducing a set of conformability rules . The outcome of these rules is that any loop nest that is perfectly nested with affine tensor subscripts and without conditionals is conformable to the MDC notation. A majority of the primitive operators in deep learning are such loop nests. In addition, our rules enable us to automatically translate a mapping expressed in the loop nest form to MDC notation and use the MDC’s cost model to guide upstream mappers. Our conformability rules over the input operators result in a structured mapping space of the operators, which enables us to introduce a mapper based on our decoupled off-chip/on-chip approach to accelerate mapping space exploration. Our mapper decomposes the original higher-dimensional mapping space of operators into two lower-dimensional off-chip and on-chip subspaces and then optimizes the off-chip subspace followed by the on-chip subspace. We implemented our overall approach in a tool called Marvel , and a benefit of our approach is that it applies to any operator conformable with the MDC notation. We evaluated Marvel over major DNN operators and compared it with past optimizers.


Sign in / Sign up

Export Citation Format

Share Document