loop fusion Latest Research Papers

On the Transformation Optimization for Stencil Computation

Electronics ◽

10.3390/electronics11010038 ◽

2021 ◽

Vol 11 (1) ◽

pp. 38

Author(s):

Huayou Su ◽

Kaifang Zhang ◽

Songzhu Mei

Keyword(s):

Load Balance ◽

Loop Transformation ◽

Redundancy Elimination ◽

Stencil Computation ◽

Loop Unrolling ◽

Loop Fusion ◽

Potential Benefits ◽

Successful Employment ◽

2D And 3D

Stencil computation optimizations have been investigated quite a lot, and various approaches have been proposed. Loop transformation is a vital kind of optimization in modern production compilers and has proved successful employment within compilers. In this paper, we combine the two aspects to study the potential benefits some common transformation recipes may have for stencils. The recipes consist of loop unrolling, loop fusion, address precalculation, redundancy elimination, instruction reordering, load balance, and a forward and backward update algorithm named semi-stencil. Experimental evaluations of diverse stencil kernels, including 1D, 2D, and 3D computation patterns, on two typical ARM and Intel platforms, demonstrate the respective effects of the transformation recipes. An average speedup of 1.65× is obtained, and the best is 1.88× for the single transformation recipes we analyze. The compound recipes demonstrate a maximum speedup of 1.92×.

Effective Loop Fusion in Polyhedral Compilation Using Fusion Conflict Graphs

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3416510 ◽

2020 ◽

Vol 17 (4) ◽

pp. 1-26

Author(s):

Aravind Acharya ◽

Uday Bondhugula ◽

Albert Cohen

Keyword(s):

Loop Fusion ◽

Conflict Graphs

The loop fusion for data localization

Program systems theory and applications ◽

10.25209/2079-3316-2020-11-3-17-31 ◽

2020 ◽

Vol 11 (3) ◽

pp. 17-31

Author(s):

Борис Яковлевич Штейнберг ◽

Олег Борисович Штейнберг ◽

Александр Александрович Василенко

Keyword(s):

Loop Fusion

Для улучшения локализации данных используется слияние циклов. Слияние циклов, имеющих общие переменные, может ускорить исполнение за счёт уменьшения количества кэш-промахов. Это преобразование известно давно, но компиляторы выполняют его лишь для простейших случаев. Наши улучшенные алгоритмы используют предварительные преобразования для корректного слияния циклов, имеющих разное количество итераций и информационные зависимости.

Refactoring the Memory Access Pattern to Improve Computational Performance in NEMO

10.5194/egusphere-egu2020-9732 ◽

2020 ◽

Author(s):

Italo Epicoco ◽

Francesca Mele ◽

Silvia Mocavero ◽

Marco Chiarelli ◽

Alessandro D'Anca ◽

...

Keyword(s):

Memory Performance ◽

Optimization Technique ◽

Main Memory ◽

The Body ◽

Cache Memory ◽

Loop Transformation ◽

Data Dependencies ◽

The Third ◽

Computational Performance ◽

Loop Fusion

In the roadmap of modern parallel architectures development, the computing power of a node grows much more quickly than main memory performance (capacity, bandwidth). This leads to an even much higher gap between computing and memory resources. An efficient use of the cache memory is becoming ever more essential as optimization technique. The NEMO model uses a finite difference integration method and a regular cartesian grid for space discretization. The NEMO code reflects this choice: a generic field is represented in memory as a 3D array; and the code is mainly composed of three-level nested loops. These loops often include only a few operations in the body; the results are stored in a temporary 3D array and then used in subsequent loops until the final calculation. The aim of this work is to make better use of the cache memory by fusing DO loops together. The loop fusion is a transformation which takes two or more adjacent loops that have the same iteration space traversal and combines their bodies into a single loop. The fusion of the loops is not trivial, and it could require introducing additional redundant operations to solve data dependencies. Unfortunately, this leads to a drawback of the overall performance. To avoid the redundant operation, we can adopt pointers to arrays and implement a pointer rotation at each loop iteration. We have developed the loop fusion transformation in an advection kernel extracted from the NEMO oceanic model. We have compared 3 different versions of the optimized advection kernel, with 3 different levels of loop fusion. The first prototype refers to the implementation where the extreme fusion is applied, and all loops in the routine have been fused. In this version, the operations are replicated up to 3 times. In the second prototype the buffer rotation has been applied only in the outermost loop. In the third prototype, the buffer rotation has also been implemented for the second dimension, and this version introduces only a limited amount of redundant operations.The tests have been performed on the Athena cluster located at the CMCC supercomputing center. The supercomputing infrastructure is based on the Intel Xeon E5-2670 processors. The memory hierarchy is composed of 32KB of L1 cache, 256KB of L2 and 20MB L3 cache shared among the cores. The results clearly proved the effectiveness of the loop fusion approach that reaches a speedup of 2x with a high number of cores. The third prototype has proven to be the most promising solution. Prototypes 1 and 2 provide a good improvement up to 256 cores then the redundant operations lead to a loss of performance. A deeper analysis measuring the Last Level Cache misses also showed how the loop transformation significantly reduced the number of cache misses. Despite the good results achieved with the loop fusion optimization, we can remark that this optimization is strictly linked to the computing architecture. A fully portable performance improvement can be ensured by the adoption of a DSL (Domain Specific Language).

Using an evolutionary approach based on shortest common supersequence problem for loop fusion

Soft Computing ◽

10.1007/s00500-019-04338-z ◽

2019 ◽

Vol 24 (10) ◽

pp. 7231-7252

Author(s):

Mahsa Ziraksima ◽

Shahriar Lotfi ◽

Habib Izadkhah

Keyword(s):

Evolutionary Approach ◽

Loop Fusion ◽

Shortest Common Supersequence

From Loop Fusion to Kernel Fusion: A Domain-Specific Approach to Locality Optimization

2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) ◽

10.1109/cgo.2019.8661176 ◽

2019 ◽

Cited By ~ 8

Author(s):

Bo Qiao ◽

Oliver Reiche ◽

Frank Hannig ◽

Jirgen Teich

Keyword(s):

Specific Approach ◽

Domain Specific ◽

Kernel Fusion ◽

Loop Fusion

Optimal Design Based on Closed-Loop Fusion for Velocity Bandwidth Expansion of Optical Target Tracking System

Sensors ◽

10.3390/s19010133 ◽

2019 ◽

Vol 19 (1) ◽

pp. 133 ◽

Cited By ~ 2

Author(s):

Yao Mao ◽

Wei Ren ◽

Yong Luo ◽

Zhijun Li

Keyword(s):

Optimal Design ◽

Target Tracking ◽

Sensor Fusion ◽

Closed Loop ◽

Tracking System ◽

Micro Electro Mechanical System ◽

Measurement Range ◽

Improve Measurement ◽

Loop Fusion ◽

Bandwidth Expansion

Micro-electro-mechanical system (MEMS) gyro is one of the extensively used inertia sensors in the field of optical target tracking (OTT). However, velocity closed-loop bandwidth of the OTT system is limited due to the resonance and measurement range issues of MEMS gyro. In this paper, the generalized sensor fusion framework, named the closed-loop fusion (CLF), is analyzed, and the optimal design principle of filter is proposed in detail in order to improve measurement of the bandwidth of MEMS gyro by integrating information of MEMS accelerometers. The fusion error optimization problem, which is the core issue of fusion design, can be solved better through the feedback compensation law of CLF framework and fusion filter optimal design. Differently from conventional methods, the fusion filter of CLF can be simply and accurately designed, and the determination of superposition of fusion information can also be effectively avoided. To show the validity of the proposed method, both sensor fusion simulations and closed-loop experiments of optical target tracking system have yielded excellent results.

Optimal Design of Closed-loop Fusion for Sensor Signal Expansion

10.20944/preprints201811.0497.v1 ◽

2018 ◽

Author(s):

Yao Mao ◽

Wei Ren ◽

Yong Luo ◽

ZhiJun Li

Keyword(s):

Optimal Design ◽

Sensor Fusion ◽

Closed Loop ◽

Feedback Compensation ◽

Loop Fusion ◽

Core Issue ◽

Fusion Framework ◽

Compensation Law ◽

Error Optimization

Sensor fusion technology is one of extensive used methods in the field of robot, aerospace and target tracking control. In this paper, the generalized sensor fusion framework, named the closed-loop fusion (CLF) is analyzed and the optimal design principle of filter is proposed in detail. Fusion error optimization problem, which is the core issue of fusion design, is also solved better through the feedback compensation law of CLF framework. Differently from conventional methods, the fusion filter of CLF can be optimally designed and the determination of superposition of fusion information is avoided. To show the validity, simulation and experimental results are to be submitted.

Improving efficency of mathematical functions in image processing by loop fusion

2018 5th International Conference on Electrical and Electronic Engineering (ICEEE) ◽

10.1109/iceee2.2018.8391357 ◽

2018 ◽

Cited By ~ 1

Author(s):

Shapour Joudi Bigdello ◽

Manoochehr Joodi Bigdello ◽

Hekmat Mohammadzadeh

Keyword(s):

Image Processing ◽

Mathematical Functions ◽

Loop Fusion

A guess-and-assume approach to loop fusion for program verification

Proceedings of the ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation - PEPM '18 ◽

10.1145/3175493.3162070 ◽

2018 ◽

Author(s):

Akifumi Imanishi ◽

Kohei Suenaga ◽

Atsushi Igarashi

Keyword(s):

Program Verification ◽

Loop Fusion

loop fusion
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

On the Transformation Optimization for Stencil Computation

Effective Loop Fusion in Polyhedral Compilation Using Fusion Conflict Graphs

The loop fusion for data localization

Refactoring the Memory Access Pattern to Improve Computational Performance in NEMO

Using an evolutionary approach based on shortest common supersequence problem for loop fusion

From Loop Fusion to Kernel Fusion: A Domain-Specific Approach to Locality Optimization

Optimal Design Based on Closed-Loop Fusion for Velocity Bandwidth Expansion of Optical Target Tracking System

Optimal Design of Closed-loop Fusion for Sensor Signal Expansion

Improving efficency of mathematical functions in image processing by loop fusion

A guess-and-assume approach to loop fusion for program verification

Export Citation Format

loop fusionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

On the Transformation Optimization for Stencil Computation

Effective Loop Fusion in Polyhedral Compilation Using Fusion Conflict Graphs

The loop fusion for data localization

Refactoring the Memory Access Pattern to Improve Computational Performance in NEMO

Using an evolutionary approach based on shortest common supersequence problem for loop fusion

From Loop Fusion to Kernel Fusion: A Domain-Specific Approach to Locality Optimization

Optimal Design Based on Closed-Loop Fusion for Velocity Bandwidth Expansion of Optical Target Tracking System

Optimal Design of Closed-loop Fusion for Sensor Signal Expansion

Improving efficency of mathematical functions in image processing by loop fusion

A guess-and-assume approach to loop fusion for program verification

loop fusion
Recently Published Documents