loop fusion
Recently Published Documents


TOTAL DOCUMENTS

64
(FIVE YEARS 7)

H-INDEX

13
(FIVE YEARS 2)

Electronics ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 38
Author(s):  
Huayou Su ◽  
Kaifang Zhang ◽  
Songzhu Mei

Stencil computation optimizations have been investigated quite a lot, and various approaches have been proposed. Loop transformation is a vital kind of optimization in modern production compilers and has proved successful employment within compilers. In this paper, we combine the two aspects to study the potential benefits some common transformation recipes may have for stencils. The recipes consist of loop unrolling, loop fusion, address precalculation, redundancy elimination, instruction reordering, load balance, and a forward and backward update algorithm named semi-stencil. Experimental evaluations of diverse stencil kernels, including 1D, 2D, and 3D computation patterns, on two typical ARM and Intel platforms, demonstrate the respective effects of the transformation recipes. An average speedup of 1.65× is obtained, and the best is 1.88× for the single transformation recipes we analyze. The compound recipes demonstrate a maximum speedup of 1.92×.


2020 ◽  
Vol 17 (4) ◽  
pp. 1-26
Author(s):  
Aravind Acharya ◽  
Uday Bondhugula ◽  
Albert Cohen
Keyword(s):  

2020 ◽  
Vol 11 (3) ◽  
pp. 17-31
Author(s):  
Борис Яковлевич Штейнберг ◽  
Олег Борисович Штейнберг ◽  
Александр Александрович Василенко
Keyword(s):  

Для улучшения локализации данных используется слияние циклов. Слияние циклов, имеющих общие переменные, может ускорить исполнение за счёт уменьшения количества кэш-промахов. Это преобразование известно давно, но компиляторы выполняют его лишь для простейших случаев. Наши улучшенные алгоритмы используют предварительные преобразования для корректного слияния циклов, имеющих разное количество итераций и информационные зависимости.


2020 ◽  
Author(s):  
Italo Epicoco ◽  
Francesca Mele ◽  
Silvia Mocavero ◽  
Marco Chiarelli ◽  
Alessandro D'Anca ◽  
...  

<p>In the roadmap of modern parallel architectures development, the computing power of a node grows much more quickly than main memory performance (capacity, bandwidth). This leads to an even much higher gap between computing and memory resources. An efficient use of the cache memory is becoming ever more essential as optimization technique.<br>The NEMO model uses a finite difference integration method and a regular cartesian grid for space discretization. The NEMO code reflects this choice: a generic field is represented in memory as a 3D array; and the code is mainly composed of three-level nested loops. These loops often include only a few operations in the body; the results are stored in a temporary 3D array and then used in subsequent loops until the final calculation.<br>The aim of this work is to make better use of the cache memory by fusing DO loops together. The loop fusion is a transformation which takes two or more adjacent loops that have the same iteration space traversal and combines their bodies into a single loop.<br>The fusion of the loops is not trivial, and it could require introducing additional redundant operations to solve data dependencies. Unfortunately, this leads to a drawback of the overall performance. To avoid the redundant operation, we can adopt pointers to arrays and implement a pointer rotation at each loop iteration.<br>We have developed the loop fusion transformation in an advection kernel extracted from the NEMO oceanic model. We have compared 3 different versions of the optimized advection kernel, with 3 different levels of loop fusion.<br>The first prototype refers to the implementation where the extreme fusion is applied, and all loops in the routine have been fused. In this version, the operations are replicated up to 3 times. In the second prototype the buffer rotation has been applied only in the outermost loop. In the third prototype, the buffer rotation has also been implemented for the second dimension, and this version introduces only a limited amount of redundant operations.</p><p>The tests have been performed on the Athena cluster located at the CMCC supercomputing center. The supercomputing infrastructure is based on the Intel Xeon E5-2670 processors. The memory hierarchy is composed of 32KB of L1 cache, 256KB of L2 and 20MB L3 cache shared among the cores. The results clearly proved the effectiveness of the loop fusion approach that reaches a speedup of 2x with a high number of cores. The third prototype has proven to be the most promising solution. Prototypes 1 and 2 provide a good improvement up to 256 cores then the redundant operations lead to a loss of performance.<br>A deeper analysis measuring the Last Level Cache misses also showed how the loop transformation significantly reduced the number of cache misses.<br>Despite the good results achieved with the loop fusion optimization, we can remark that this optimization is strictly linked to the computing architecture. A fully portable performance improvement can be ensured by the adoption of a DSL (Domain Specific Language).</p>


2019 ◽  
Vol 24 (10) ◽  
pp. 7231-7252
Author(s):  
Mahsa Ziraksima ◽  
Shahriar Lotfi ◽  
Habib Izadkhah

Sensors ◽  
2019 ◽  
Vol 19 (1) ◽  
pp. 133 ◽  
Author(s):  
Yao Mao ◽  
Wei Ren ◽  
Yong Luo ◽  
Zhijun Li

Micro-electro-mechanical system (MEMS) gyro is one of the extensively used inertia sensors in the field of optical target tracking (OTT). However, velocity closed-loop bandwidth of the OTT system is limited due to the resonance and measurement range issues of MEMS gyro. In this paper, the generalized sensor fusion framework, named the closed-loop fusion (CLF), is analyzed, and the optimal design principle of filter is proposed in detail in order to improve measurement of the bandwidth of MEMS gyro by integrating information of MEMS accelerometers. The fusion error optimization problem, which is the core issue of fusion design, can be solved better through the feedback compensation law of CLF framework and fusion filter optimal design. Differently from conventional methods, the fusion filter of CLF can be simply and accurately designed, and the determination of superposition of fusion information can also be effectively avoided. To show the validity of the proposed method, both sensor fusion simulations and closed-loop experiments of optical target tracking system have yielded excellent results.


Author(s):  
Yao Mao ◽  
Wei Ren ◽  
Yong Luo ◽  
ZhiJun Li

Sensor fusion technology is one of extensive used methods in the field of robot, aerospace and target tracking control. In this paper, the generalized sensor fusion framework, named the closed-loop fusion (CLF) is analyzed and the optimal design principle of filter is proposed in detail. Fusion error optimization problem, which is the core issue of fusion design, is also solved better through the feedback compensation law of CLF framework. Differently from conventional methods, the fusion filter of CLF can be optimally designed and the determination of superposition of fusion information is avoided. To show the validity, simulation and experimental results are to be submitted.


Sign in / Sign up

Export Citation Format

Share Document