Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling

Coarse-Grained Reconfigurable Architectures (CGRAs) have gained currency in recent years due to their abundant parallelism and flexibility. To utilize the parallelism found in CGRAs, this paper proposes a fast and efficient Modulo-Constrained Hybrid Particle Swarm Optimization (MCHPSO) scheduling algorithm to exploit loop-level parallelism in applications. This paper shows that Particle Swarm Optimization (PSO) is capable of software pipelining loops by overlapping placement, scheduling and routing of successive loop iterations and executing them in parallel. The proposed algorithm has been experimentally validated on various DSP benchmarks under two different architecture configurations. These experiments indicate that the proposed MCHPSO algorithm can find schedules with small initiation intervals within a reasonable amount of time. The MCHPSO scheduling algorithm was analyzed with different topologies and Functional Unit (FU) configurations. The authors have tested the parallelizability of the algorithm and found that it exhibits a nearly linear speedup on a multi-core CPU.

Download Full-text

A predicate-aware modulo scheduling for improving resource efficiency of coarse grained reconfigurable architectures

7th IEEE International Symposium on Industrial Embedded Systems (SIES'12) ◽

10.1109/sies.2012.6356604 ◽

2012 ◽

Author(s):

Jhin-Bin Jiang ◽

Kuen-Cheng Chiang ◽

Jean Jyh-Jiun Shann

Keyword(s):

Resource Efficiency ◽

Coarse Grained ◽

Reconfigurable Architectures ◽

Modulo Scheduling

Download Full-text

Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems - LCTES '09 ◽

10.1145/1542452.1542456 ◽

2009 ◽

Cited By ~ 20

Author(s):

Taewook Oh ◽

Bernhard Egger ◽

Hyunchul Park ◽

Scott Mahlke

Keyword(s):

Coarse Grained ◽

Reconfigurable Architectures ◽

Modulo Scheduling

Download Full-text

Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA

Electronics ◽

10.3390/electronics10182210 ◽

2021 ◽

Vol 10 (18) ◽

pp. 2210

Author(s):

Zhongyuan Zhao ◽

Weiguang Sheng ◽

Jinchao Li ◽

Pengfei Ye ◽

Qin Wang ◽

...

Keyword(s):

Energy Efficiency ◽

High Energy ◽

Coarse Grained ◽

Context Word ◽

Large Area ◽

Modulo Scheduling ◽

Area Efficiency ◽

Reconfigurable Array ◽

Index Value ◽

Level Parallelism

Modulo-scheduled coarse-grained reconfigurable array (CGRA) processors have shown their potential for exploiting loop-level parallelism at high energy efficiency. However, these CGRAs need frequent reconfiguration during their execution, which makes them suffer from large area and power overhead for context memory and context-fetching. To tackle this challenge, this paper uses an architecture/compiler co-designed method for context reduction. From an architecture perspective, we carefully partition the context into several subsections and only fetch the subsections that are different to the former context word whenever fetching the new context. We package each different subsection with an opcode and index value to formulate a context-fetching primitive (CFP) and explore the hardware design space by providing the centralized and distributed CFP-fetching CGRA to support this CFP-based context-fetching scheme. From the software side, we develop a similarity-aware tuning algorithm and integrate it into state-of-the-art modulo scheduling and memory access conflict optimization algorithms. The whole compilation flow can efficiently improve the similarities between contexts in each PE for the purpose of reducing both context-fetching latency and context footprint. Experimental results show that our HW/SW co-designed framework can improve the area efficiency and energy efficiency to at most 34% and 21% higher with only 2% performance overhead.

Download Full-text