scholarly journals Design Space Exploration of Deeply Nested Loop 2D Filtering and 6 Level FSBM Algorithm Mapped onto Systolic Array

VLSI Design ◽  
2012 ◽  
Vol 2012 ◽  
pp. 1-15 ◽  
Author(s):  
B. Bala Tripura Sundari

The high integration density in today's VLSI chips offers enormous computing power to be utilized by the design of parallel computing hardware. The implementation of computationally intensive algorithms represented by -dimensional (-D) nested loop algorithms, onto parallel array architecture is termed as mapping. The methodologies adopted for mapping these algorithms onto parallel hardware often use heuristic search that requires a lot of computational effort to obtain near optimal solutions. We propose a new mapping procedure wherein a lower dimensional subspace (of the -D problem space) of inner loop is identified, in which lies the computational expression that generates the output or outputs of the -D problem. The processing elements (PE array) are assigned to the identified sub-space and the reuse of the PE array is through the assignment of the PE array to the successive sub-spaces in consecutive clock cycles/periods (CPs) to complete the computational tasks of the -D problem. The above is used to develop our proposed modified heuristic search to arrive at optimal design and the complexity comparisons are given. The MATLAB results of the new search and the design space trade-off analysis using the high-level synthesis tool are presented for two typical computationally intensive nested loop algorithms—the 6D FSBM and the 4D edge detection alternatively known as the 2D filtering algorithm.

Author(s):  
Umar Ibrahim Minhas ◽  
Roger Woods ◽  
Georgios Karakonstantis

AbstractWhilst FPGAs have been used in cloud ecosystems, it is still extremely challenging to achieve high compute density when mapping heterogeneous multi-tasks on shared resources at runtime. This work addresses this by treating the FPGA resource as a service and employing multi-task processing at the high level, design space exploration and static off-line partitioning in order to allow more efficient mapping of heterogeneous tasks onto the FPGA. In addition, a new, comprehensive runtime functional simulator is used to evaluate the effect of various spatial and temporal constraints on both the existing and new approaches when varying system design parameters. A comprehensive suite of real high performance computing tasks was implemented on a Nallatech 385 FPGA card and show that our approach can provide on average 2.9 × and 2.3 × higher system throughput for compute and mixed intensity tasks, while 0.2 × lower for memory intensive tasks due to external memory access latency and bandwidth limitations. The work has been extended by introducing a novel scheduling scheme to enhance temporal utilization of resources when using the proposed approach. Additional results for large queues of mixed intensity tasks (compute and memory) show that the proposed partitioning and scheduling approach can provide higher than 3 × system speedup over previous schemes.


2021 ◽  
Author(s):  
Aakriti Tarun Sharma

The process of converting a behavioral specification of an application to its equivalent system architecture is referred to as High Level-Synthesis (HLS). A crucial stage in embedded systems design involves finding the trade off between resource utilization and performance. An exhaustive search would yield the required results, but would take a huge amount of time to arrive at the solution even for smaller designs. This would result in a high time complexity. We employ the use of Design Space Exploration (DSE) in order to reduce the complexity of the design space and to reach the desired results in less time. In reality, there are multiple constraints defined by the user that need to be satisfied simultaneously. Thus, the nature of the task at hand is referred to as Multi-Objective Optimization. In this thesis, the design process of DSP benchmarks was analyzed based on user defined constraints such as power and execution time. The analyzed outcome was compared with the existing approaches in DSE and an optimal design solution was derived in a shorter time period.


2014 ◽  
Vol 27 (2) ◽  
pp. 235-249 ◽  
Author(s):  
Anirban Sengupta ◽  
Reza Sedaghat ◽  
Vipul Mishra

Design space exploration is an indispensable segment of High Level Synthesis (HLS) design of hardware accelerators. This paper presents a novel technique for Area-Execution time tradeoff using residual load decoding heuristics in genetic algorithms (GA) for integrated design space exploration (DSE) of scheduling and allocation. This approach is also able to resolve issues encountered during DSE of data paths for hardware accelerators, such as accuracy of the solution found, as well as the total exploration time during the process. The integrated solution found by the proposed approach satisfies the user specified constraints of hardware area and total execution time (not just latency), while at the same time offers a twofold unified solution of chaining based schedule and allocation. The cost function proposed in the genetic algorithm approach takes into account the functional units, multiplexers and demultiplexers needed during implementation. The proposed exploration system (ExpSys) was tested on a large number of benchmarks drawn from the literature for assessment of its efficiency. Results indicate an average improvement in Quality of Results (QoR) greater than 26% when compared to a recent well known GA based exploration method.


Author(s):  
Lorenzo Ferretti ◽  
Jihye Kwon ◽  
Giovanni Ansaloni ◽  
Giuseppe Di Guglielmo ◽  
Luca P. Carloni ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document