Measuring the dynamic energy efficiency of FPGAs over processors

10.32920/ryerson.14653593.v1 ◽

2021 ◽

Author(s):

Muhammad Umair Zafar

Keyword(s):

Energy Efficiency ◽

Functional Unit ◽

Parallel Execution ◽

Data Path ◽

Sequential Execution ◽

Memory Instruction ◽

Data Memory ◽

Execution Model ◽

Functional Units ◽

Latency Insensitive

This work investigates the dynamic energy efficiency of the parallel execution model of an FPGA and the sequential execution model of a processor, for latency-insensitive applications. We create the temporal implementations (sequential instructions) of the MCNC benchmarks to be executed on a processor that employs a 4LUT as its functional unit. This processor is ~716 times inefficient for dynamic energy than a 4LUT FPGA, mainly due to the large amount of memory (instruction/data) that is required to encode the 4LUT based instructions. The size of the memory (instruction/data) can be reduced by increasing the data-path width and the logic complexity of the ASIC-based functional units of the processor. Particularly, at 64-bit data-path width and when the (instruction/data) memory sizes are reduced to less than ~9% of their corresponding 4LUT-based instructions, the processor with ASIC-based complex functional unit can achieve higher dynamic energy efficiency than the FPGA for MCNC benchmarks.

Download Full-text

A Paradigm of Extensions of Parallel Execution Model

2012 International Conference on Computer Science and Service System ◽

10.1109/csss.2012.553 ◽

2012 ◽

Author(s):

Zhao Jun ◽

Zhang Li-Lun ◽

Song Jun-Qiang

Keyword(s):

Parallel Execution ◽

Execution Model

Download Full-text

A parallel execution model of logic programs

ACM SIGARCH Computer Architecture News ◽

10.1145/1067651.801673 ◽

1983 ◽

Vol 11 (3) ◽

pp. 349-355 ◽

Cited By ~ 3

Author(s):

Shinji Umeyama ◽

Koichiro Tamura

Keyword(s):

Parallel Execution ◽

Logic Programs ◽

Execution Model

Download Full-text

Energy Efficiency Evaluation of Parallel Execution of DEVS Models in Multicore Architectures

2020 Winter Simulation Conference (WSC) ◽

10.1109/wsc48552.2020.9384117 ◽

2020 ◽

Author(s):

Guillermo G. Trabes ◽

Veronica Gil Costa ◽

Gabriel A. Wainer

Keyword(s):

Energy Efficiency ◽

Parallel Execution ◽

Efficiency Evaluation ◽

Multicore Architectures ◽

Energy Efficiency Evaluation

Download Full-text

General Parallel Execution Model for Large Matrix Workloads

Advances in Intelligent Systems and Computing - The 8th International Conference on Computer Engineering and Networks (CENet2018) ◽

10.1007/978-3-030-14680-1_3 ◽

2019 ◽

pp. 22-28

Author(s):

Song Deng ◽

Xueke Xu ◽

Fan Zhou ◽

Haojing Weng ◽

Wen Luo

Keyword(s):

Parallel Execution ◽

Large Matrix ◽

Execution Model

Download Full-text

Balancing resiliency and energy efficiency of functional units in ultra-low power systems

2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC) ◽

10.1109/aspdac.2018.8297394 ◽

2018 ◽

Author(s):

Mohammad Saber Golanbari ◽

Anteneh Gebregiorgis ◽

Elyas Moradi ◽

Saman Kiamehr ◽

Mehdi B. Tahoori

Keyword(s):

Energy Efficiency ◽

Power Systems ◽

Low Power ◽

Ultra Low Power ◽

Functional Units

Download Full-text

Shared memory multiprocessor support for functional array processing in SAC

Journal of Functional Programming ◽

10.1017/s0956796805005538 ◽

2005 ◽

Vol 15 (3) ◽

pp. 353-401 ◽

Cited By ~ 29

Author(s):

CLEMENS GRELCK

Keyword(s):

Shared Memory ◽

Array Processing ◽

Numerical Data ◽

Parallel Execution ◽

Real Performance ◽

Execution Model ◽

Series Of Experiments ◽

High Level ◽

Performance Gains ◽

The Impact

Classical application domains of parallel computing are dominated by processing large arrays of numerical data. Whereas most functional languages focus on lists and trees rather than on arrays, SAC is tailor-made in design and in implementation for efficient high-level array processing. Advanced compiler optimizations yield performance levels that are often competitive with low-level imperative implementations. Based on SAC, we develop compilation techniques and runtime system support for the compiler-directed parallel execution of high-level functional array processing code on shared memory architectures. Competitive sequential performance gives us the opportunity to exploit the conceptual advantages of the functional paradigm for achieving real performance gains with respect to existing imperative implementations, not only in comparison with uniprocessor runtimes. While the design of SAC facilitates parallelization, the particular challenge of high sequential performance is that realization of satisfying speedups through parallelization becomes substantially more difficult. We present an initial compilation scheme and multi-threaded execution model, which we step-wise refine to reduce organizational overhead and to improve parallel performance. We close with a detailed analysis of the impact of certain design decisions on runtime performance, based on a series of experiments.

Download Full-text