Augmenting loop tiling with data alignment for improved cache performance

1999 ◽  
Vol 48 (2) ◽  
pp. 142-149 ◽  
Author(s):  
P.R. Panda ◽  
H. Nakamura ◽  
N.D. Dutt ◽  
A. Nicolau
Author(s):  
Preeti Ranjan Panda ◽  
Hiroshi Nakamura ◽  
Nikil D. Dutt ◽  
Alexandru Nicolau

Author(s):  
Shuo Wang ◽  
Zhongtao Shen ◽  
Shuwen Wang ◽  
Changqing Feng ◽  
Shubin Liu

Author(s):  
B. Shameedha Begum ◽  
N. Ramasubramanian

Embedded systems are designed for a variety of applications ranging from Hard Real Time applications to mobile computing, which demands various types of cache designs for better performance. Since real-time applications place stringent requirements on performance, the role of the cache subsystem assumes significance. Reconfigurable caches meet performance requirements under this context. Existing reconfigurable caches tend to use associativity and size for maximizing cache performance. This article proposes a novel approach of a reconfigurable and intelligent data cache (L1) based on replacement algorithms. An intelligent embedded data cache and a dynamic reconfigurable intelligent embedded data cache have been implemented using Verilog 2001 and tested for cache performance. Data collected by enabling the cache with two different replacement strategies have shown that the hit rate improves by 40% when compared to LRU and 21% when compared to MRU for sequential applications which will significantly improve performance of embedded real time application.


2013 ◽  
Vol 74 (2) ◽  
pp. 137-150
Author(s):  
Yi Wang ◽  
Linfeng Pan ◽  
Zili Shao ◽  
Yong Guan ◽  
Minyi Guo

1997 ◽  
Vol 07 (04) ◽  
pp. 379-392 ◽  
Author(s):  
Alain Darte ◽  
Georges-André Silber ◽  
Frédéric Vivien

Tiling is a technique used for exploiting medium-grain parallelism in nested loops. It relies on a first step that detects sets of permutable nested loops. All algorithms developed so far consider the statements of the loop body as a single block, in other words, they are not able to take advantage of the structure of dependences between different statements. In this paper, we overcame this limitation by showing how the structure of the reduced dependence graph can be taken into account for detecting more permutable loops. Our method combines graph retiming techniques and graph scheduling techniques. It can be viewed as an extension of Wolf and Lam's algorithm to the case of loops with multiple statements. Loan independent dependences play a particular role in our study, and we show how the way we handle them can be useful for fine-grain loop parallelization as well.


2007 ◽  
Vol 42 (7) ◽  
pp. 227-236
Author(s):  
Qin Wang ◽  
Junpu Chen ◽  
Weihua Zhang ◽  
Min Yang ◽  
Binyu Zang

Sign in / Sign up

Export Citation Format

Share Document