scholarly journals HP-DAEMON: High Performance Distributed Adaptive Energy-efficient Matrix-multiplicatiON

2014 ◽  
Vol 29 ◽  
pp. 599-613 ◽  
Author(s):  
Li Tan ◽  
Longxiang Chen ◽  
Zizhong Chen ◽  
Ziliang Zong ◽  
Rong Ge ◽  
...  
2021 ◽  
Vol 21 (2) ◽  
pp. e09
Author(s):  
Federico Favaro ◽  
Ernesto Dufrechou ◽  
Pablo Ezzatti ◽  
Juan Pablo Oliver

The dissemination of multi-core architectures and the later irruption of massively parallel devices, led to a revolution in High-Performance Computing (HPC) platforms in the last decades. As a result, Field-Programmable Gate Arrays (FPGAs) are re-emerging as a versatile and more energy-efficient alternative to other platforms. Traditional FPGA design implies using low-level Hardware Description Languages (HDL) such as VHDL or Verilog, which follow an entirely different programming model than standard software languages, and their use requires specialized knowledge of the underlying hardware. In the last years, manufacturers started to make big efforts to provide High-Level Synthesis (HLS) tools, in order to allow a grater adoption of FPGAs in the HPC community.Our work studies the use of multi-core hardware and different FPGAs to address Numerical Linear Algebra (NLA) kernels such as the general matrix multiplication GEMM and the sparse matrix-vector multiplication SpMV. Specifically, we compare the behavior of fine-tuned kernels in a multi-core CPU processor and HLS implementations on FPGAs. We perform the experimental evaluation of our implementations on a low-end and a cutting-edge FPGA platform, in terms of runtime and energy consumption, and compare the results against the Intel MKL library in CPU.  


2015 ◽  
Vol 1 (4) ◽  
pp. 1-12
Author(s):  
Chidadala Janardhan ◽  
◽  
Bhagath Pyda ◽  
J. Manohar ◽  
K. V. Ramanaiah ◽  
...  

2019 ◽  
Vol 15 (4) ◽  
pp. 1-21
Author(s):  
Bing Li ◽  
Mengjie Mao ◽  
Xiaoxiao Liu ◽  
Tao Liu ◽  
Zihao Liu ◽  
...  

Nano Energy ◽  
2021 ◽  
Vol 82 ◽  
pp. 105717
Author(s):  
Min-Ci Wu ◽  
Jui-Yuan Chen ◽  
Yi-Hsin Ting ◽  
Chih-Yang Huang ◽  
Wen-Wei Wu

Author(s):  
Wei-Song Hung ◽  
Subrahmanya T M ◽  
Po Ting Lin ◽  
Yu-Hsuan Chiao ◽  
Januar Widakdo ◽  
...  

Membrane distillation (MD) based desalination process is thought to be a promising strategy to address global challenges such as safe water-energy crisis and environmental pollution. Here, we demonstrate a novel...


2021 ◽  
Vol 151 ◽  
pp. 70-85
Author(s):  
Cody Rivera ◽  
Jieyang Chen ◽  
Nan Xiong ◽  
Jing Zhang ◽  
Shuaiwen Leon Song ◽  
...  

2014 ◽  
Vol 626 ◽  
pp. 127-135 ◽  
Author(s):  
D. Jessintha ◽  
M. Kannan ◽  
P.L. Srinivasan

Discrete Cosine Transform (DCT) is commonly used in image compression. In the history of DCT, a milestone was the Distributed Arithmetic (DA) technique. Due to the technology dependency a multiplier-less computation was built with DA based technique. It occupied less area but the throughput is less. Later, due to the technology scaling, multiplier based architectures can be easily adapted for low-power and high-performance architecture. Fixed width multipliers [1]-[7] reduces hardware and time complexity. In this work, Radix 4 fixed width multiplier is adapted with DCT architecture due to low power consumption and saves 30% power. In order to reduce truncation errors caused during fixed width multiplication, an estimation circuit is designed based on conditional probability theory.


Sign in / Sign up

Export Citation Format

Share Document