Power efficient high performance modular hardware accelerator architecture

AbstractUltracompact and low-power-consumption optical switches are desired for high-performance telecommunication networks and data centers. Here, we demonstrate an on-chip power-efficient 2 × 2 thermo-optic switch unit by using a suspended photonic crystal nanobeam structure. A submilliwatt switching power of 0.15 mW is obtained with a tuning efficiency of 7.71 nm/mW in a compact footprint of 60 μm × 16 μm. The bandwidth of the switch is properly designed for a four-level pulse amplitude modulation signal with a 124 Gb/s raw data rate. To the best of our knowledge, the proposed switch is the most power-efficient resonator-based thermo-optic switch unit with the highest tuning efficiency and data ever reported.

Download Full-text

Hardware Accelerator Integration Tradeoffs for High-Performance Computing: A Case Study of GEMM Acceleration in N-Body Methods

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2021.3056045 ◽

2021 ◽

Vol 32 (8) ◽

pp. 2035-2048

Author(s):

Mochamad Asri ◽

Dhairya Malhotra ◽

Jiajun Wang ◽

George Biros ◽

Lizy K. John ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Hardware Accelerator ◽

Performance Computing

Download Full-text

Environmental Sound Recognition on Embedded Systems: From FPGAs to TPUs

Electronics ◽

10.3390/electronics10212622 ◽

2021 ◽

Vol 10 (21) ◽

pp. 2622

Author(s):

Jurgen Vandendriessche ◽

Nick Wouters ◽

Bruno da Silva ◽

Mimoun Lamrini ◽

Mohamed Yassin Chkouri ◽

...

Keyword(s):

Machine Learning ◽

High Performance ◽

Machine Learning Techniques ◽

Sound Recognition ◽

Learning Approaches ◽

Environmental Sound ◽

Embedded Devices ◽

Power Efficient ◽

Computationally Intensive ◽

Environmental Sound Recognition

In recent years, Environmental Sound Recognition (ESR) has become a relevant capability for urban monitoring applications. The techniques for automated sound recognition often rely on machine learning approaches, which have increased in complexity in order to achieve higher accuracy. Nonetheless, such machine learning techniques often have to be deployed on resource and power-constrained embedded devices, which has become a challenge with the adoption of deep learning approaches based on Convolutional Neural Networks (CNNs). Field-Programmable Gate Arrays (FPGAs) are power efficient and highly suitable for computationally intensive algorithms like CNNs. By fully exploiting their parallel nature, they have the potential to accelerate the inference time as compared to other embedded devices. Similarly, dedicated architectures to accelerate Artificial Intelligence (AI) such as Tensor Processing Units (TPUs) promise to deliver high accuracy while achieving high performance. In this work, we evaluate existing tool flows to deploy CNN models on FPGAs as well as on TPU platforms. We propose and adjust several CNN-based sound classifiers to be embedded on such hardware accelerators. The results demonstrate the maturity of the existing tools and how FPGAs can be exploited to outperform TPUs.

Download Full-text

Power-efficient and high-performance block I/O framework for mobile virtualization systems

The Journal of Supercomputing ◽

10.1007/s11227-016-1810-z ◽

2016 ◽

Vol 73 (4) ◽

pp. 1307-1321 ◽

Cited By ~ 1

Author(s):

Kihong Lee ◽

DongWoo Lee ◽

Sungkil Lee ◽

Young Ik Eom

Keyword(s):

High Performance ◽

Power Efficient

Download Full-text

Design of high performance power efficient flip flops using transmission gates

2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT) ◽

10.1109/iccpct.2016.7530270 ◽

2016 ◽

Cited By ~ 2

Author(s):

V. K. Aravind Lakshman ◽

R. Sakthivel

Keyword(s):

High Performance ◽

Power Efficient ◽

Transmission Gates

Download Full-text

A Power-Efficient Specific Emitter Identification Hardware Accelerator With SNR-Aware Adaptive Precision Reconfiguration

10.1109/icta53157.2021.9661751 ◽

2021 ◽

Author(s):

Jiayan Gan ◽

Shafei Wang ◽

Zhipeng Qu ◽

Ang Hu ◽

Zhanxiang Yang ◽

...

Keyword(s):

Hardware Accelerator ◽

Power Efficient ◽

Specific Emitter Identification

Download Full-text

Design of Power Efficient and High-Performance Architecture to Spectrum Sensing Applications Using Cyclostationary Feature Detection

Cognitive Informatics and Soft Computing - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-15-1451-7_1 ◽

2020 ◽

pp. 1-11

Author(s):

Kadavergu Aishwarya ◽

T. Jagannadha Swamy

Keyword(s):

Spectrum Sensing ◽

High Performance ◽

Feature Detection ◽

Power Efficient ◽

Sensing Applications ◽

Cyclostationary Feature Detection

Download Full-text

Analyzing the Robustness of HPC Applications Using a Fine-Grained Soft Error Fault Injection Tool

Innovative Research and Applications in Next-Generation High Performance Computing - Advances in Systems Analysis, Software Engineering, and High Performance Computing ◽

10.4018/978-1-5225-0287-6.ch011 ◽

2016 ◽

pp. 277-305

Author(s):

Qiang Guan ◽

Nathan DeBardeleben ◽

Sean Blanchard ◽

Song Fu ◽

Claude H. Davis IV ◽

...

Keyword(s):

High Performance ◽

Fault Injection ◽

Soft Errors ◽

Small Degree ◽

Soft Error ◽

Power Efficient ◽

Fine Grained ◽

Different Characteristics ◽

The Impact ◽

Performance Computing

As the high performance computing (HPC) community continues to push towards exascale computing, HPC applications of today are only affected by soft errors to a small degree but we expect that this will become a more serious issue as HPC systems grow. We propose F-SEFI, a Fine-grained Soft Error Fault Injector, as a tool for profiling software robustness against soft errors. We utilize soft error injection to mimic the impact of errors on logic circuit behavior. Leveraging the open source virtual machine hypervisor QEMU, F-SEFI enables users to modify emulated machine instructions to introduce soft errors. F-SEFI can control what application, which sub-function, when and how to inject soft errors with different granularities, without interference to other applications that share the same environment. We demonstrate use cases of F-SEFI on several benchmark applications with different characteristics to show how data corruption can propagate to incorrect results. The findings from the fault injection campaign can be used for designing robust software and power-efficient hardware.

Download Full-text

High Level Design of a Flexible PCA Hardware Accelerator Using a New Block-Streaming Method

Electronics ◽

10.3390/electronics9030449 ◽

2020 ◽

Vol 9 (3) ◽

pp. 449

Author(s):

Mohammad Amir Mansoori ◽

Mario R. Casu

Keyword(s):

High Performance ◽

Principal Component ◽

Hardware Acceleration ◽

Design Flow ◽

Hardware Accelerator ◽

Field Programmable ◽

Point Solution ◽

Active Research ◽

High Level ◽

Many Core

Principal Component Analysis (PCA) is a technique for dimensionality reduction that is useful in removing redundant information in data for various applications such as Microwave Imaging (MI) and Hyperspectral Imaging (HI). The computational complexity of PCA has made the hardware acceleration of PCA an active research topic in recent years. Although the hardware design flow can be optimized using High Level Synthesis (HLS) tools, efficient high-performance solutions for complex embedded systems still require careful design. In this paper we propose a flexible PCA hardware accelerator in Field-Programmable Gate Arrays (FPGA) that we designed entirely in HLS. In order to make the internal PCA computations more efficient, a new block-streaming method is also introduced. Several HLS optimization strategies are adopted to create an efficient hardware. The flexibility of our design allows us to use it for different FPGA targets, with flexible input data dimensions, and it also lets us easily switch from a more accurate floating-point implementation to a higher speed fixed-point solution. The results show the efficiency of our design compared to state-of-the-art implementations on GPUs, many-core CPUs, and other FPGA approaches in terms of resource usage, execution time and power consumption.

Download Full-text