Argus CNN Accelerator Based on Kernel Clustering and Resource-Aware Pruning

Paper proposes a two-step Convolutional Neural Network (CNN) pruning algorithm and resource-efficient Field-programmable gate array (FPGA) CNN accelerator named “Argus”. The proposed CNN pruning algorithm first combines similar kernels into clusters, which are then pruned using the same regular pruning pattern. The pruning algorithm is carefully tailored for FPGAs, considering their resource characteristics. Regular sparsity results in high Multiply-accumulate (MAC) efficiency, reducing the amount of logic required to balance workloads among different MAC units. As a result, the Argus accelerator requires about 170 Look-up tables (LUTs) per Digital Signal Processor (DSP) block. This number is close to the average LUT/DPS ratio for various FPGA families, enabling balanced resource utilization when implementing Argus. Benchmarks conducted using Xilinx Zynq Ultrascale + Multi-Processor System-on-Chip (MPSoC) indicate that Argus is achieving up to 25 times higher frames per second than NullHop, 2 and 2.5 times higher than NEURAghe and Snowflake, respectively, and 2 times higher than NVDLA. Argus shows comparable performance to MIT’s Eyeriss v2 and Caffeine, requiring up to 3 times less memory bandwidth and utilizing 4 times fewer DSP blocks, respectively. Besides the absolute performance, Argus has at least 1.3 and 2 times better GOP/s/DSP and GOP/s/Block-RAM (BRAM) ratios, while being competitive in terms of GOP/s/LUT, compared to some of the state-of-the-art solutions.

Download Full-text

A four-channel digital signal processor in 1.2- mu m CMOS with on-chip D/A and A/D conversion serving four speech channels in a new-generation subscriber line circuit

IEEE Journal of Solid-State Circuits ◽

10.1109/4.92024 ◽

1991 ◽

Vol 26 (7) ◽

pp. 1038-1046 ◽

Cited By ~ 3

Author(s):

D. Haspeslagh ◽

J. Sevenhans ◽

A. Delarbre ◽

L. Kiss ◽

E. Moerman

Keyword(s):

Digital Signal Processor ◽

Digital Signal ◽

Subscriber Line ◽

On Chip ◽

New Generation ◽

Signal Processor

Download Full-text

An Improved Method Based on FPGA and DSP for Designing FBG Sensor Analyzer

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.760-762.70 ◽

2013 ◽

Vol 760-762 ◽

pp. 70-75

Author(s):

Xiao Qing Luo ◽

Rong Hu ◽

Bing Hui Zheng

Keyword(s):

Digital Signal Processor ◽

Digital Signal ◽

External Parameter ◽

Wavelength Shift ◽

Sensing Technology ◽

Field Programmable ◽

Signal Peak ◽

Key Steps ◽

Bragg Grating Sensor ◽

Temperature Strain

Fiber Bragg sensors become research focus of sensing technology, and have been widely used in many applications. This paper proposed a novel Fiber Bragg Grating sensor analyzer based on FPGA (Field Programmable Gate Array) and DSP (Digital Signal Processor) platform, which converted external parameter changes into wavelength shift in fiber Bragg gratings. The system can measure real-time temperature, strain, pressure, displacement and others through key steps including data acquisition, clutter Filtering, signal peak detection, Gaussian curve fitting and weighted wavelength calculation to carry out wavelength demodulation. Moreover, it is able to achieve fault diagnosis and positioning of the fiber link. Experimental results show that the system has advantages of low power consumption, good linearity, strong robustness, high precision and resolution on wavelength demodulation. And the system is still stable and reliable after a long test under different conditions.

Download Full-text

xDNN: Inference for Deep Convolutional Neural Networks

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3473334 ◽

2022 ◽

Vol 15 (2) ◽

pp. 1-29

Author(s):

Paolo D'Alberto ◽

Victor Wu ◽

Aaron Ng ◽

Rahul Nimaiyar ◽

Elliott Delaye ◽

...

Keyword(s):

Neural Networks ◽

Power Efficiency ◽

Digital Signal ◽

Fpga Design ◽

Deep Convolutional Neural Networks ◽

Parametric Function ◽

Field Programmable ◽

Scale Down ◽

On Chip ◽

Numerical Precision

We present xDNN, an end-to-end system for deep-learning inference based on a family of specialized hardware processors synthesized on Field-Programmable Gate Array (FPGAs) and Convolution Neural Networks (CNN). We present a design optimized for low latency, high throughput, and high compute efficiency with no batching. The design is scalable and a parametric function of the number of multiply-accumulate units, on-chip memory hierarchy, and numerical precision. The design can produce a scale-down processor for embedded devices, replicated to produce more cores for larger devices, or resized to optimize efficiency. On Xilinx Virtex Ultrascale+ VU13P FPGA, we achieve 800 MHz that is close to the Digital Signal Processing maximum frequency and above 80% efficiency of on-chip compute resources. On top of our processor family, we present a runtime system enabling the execution of different networks for different input sizes (i.e., from 224× 224 to 2048× 1024). We present a compiler that reads CNNs from native frameworks (i.e., MXNet, Caffe, Keras, and Tensorflow), optimizes them, generates codes, and provides performance estimates. The compiler combines quantization information from the native environment and optimizations to feed the runtime with code as efficient as any hardware expert could write. We present tools partitioning a CNN into subgraphs for the division of work to CPU cores and FPGAs. Notice that the software will not change when or if the FPGA design becomes an ASIC, making our work vertical and not just a proof-of-concept FPGA project. We show experimental results for accuracy, latency, and power for several networks: In summary, we can achieve up to 4 times higher throughput, 3 times better power efficiency than the GPUs, and up to 20 times higher throughput than the latest CPUs. To our knowledge, we provide solutions faster than any previous FPGA-based solutions and comparable to any other top-of-the-shelves solutions.

Download Full-text

ADI's revolutionary BF60x vision focused digital signal processor system on chip: 25 billion operations/sec @ 80 mW and zero bandwidth

2012 IEEE Hot Chips 24 Symposium (HCS) ◽

10.1109/hotchips.2012.7476488 ◽

2012 ◽

Cited By ~ 1

Author(s):

Robert Bushey

Keyword(s):

Digital Signal Processor ◽

Digital Signal ◽

System On Chip ◽

On Chip ◽

Signal Processor

Download Full-text

Convolution Accelerator Designs Using Fast Algorithms

Algorithms ◽

10.3390/a12050112 ◽

2019 ◽

Vol 12 (5) ◽

pp. 112 ◽

Cited By ~ 5

Author(s):

Yulin Zhao ◽

Donghui Wang ◽

Leiou Wang

Keyword(s):

Power Consumption ◽

Digital Signal Processor ◽

Fast Algorithms ◽

Digital Signal ◽

Practical Implementation ◽

Great Success ◽

Field Programmable ◽

Limited Power ◽

And Performance ◽

Logic Resource

Convolutional neural networks (CNNs) have achieved great success in image processing. However, the heavy computational burden it imposes makes it difficult for use in embedded applications that have limited power consumption and performance. Although there are many fast convolution algorithms that can reduce the computational complexity, they increase the difficulty of practical implementation. To overcome these difficulties, this paper proposes several convolution accelerator designs using fast algorithms. The designs are based on the field programmable gate array (FPGA) and display a better balance between the digital signal processor (DSP) and the logic resource, while also requiring lower power consumption. The implementation results show that the power consumption of the accelerator design based on the Strassen–Winograd algorithm is 21.3% less than that of conventional accelerators.

Download Full-text

A digital magnetic resonance imaging spectrometer using digital signal processor and field programmable gate array

Review of Scientific Instruments ◽

10.1063/1.4803007 ◽

2013 ◽

Vol 84 (5) ◽

pp. 054702 ◽

Cited By ~ 9

Author(s):

Xiao Liang ◽

Sun Binghe ◽

Ma Yueping ◽

Zhao Ruyan

Keyword(s):

Magnetic Resonance Imaging ◽

Magnetic Resonance ◽

Digital Signal Processor ◽

Field Programmable Gate Array ◽

Digital Signal ◽

Resonance Imaging ◽

Imaging Spectrometer ◽

Field Programmable ◽

Gate Array ◽

Signal Processor

Download Full-text

Research of a Mixed-Signal Programmable SoC Based on FPAA

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.556-562.1741 ◽

2014 ◽

Vol 556-562 ◽

pp. 1741-1744

Author(s):

Jun Deng ◽

Hua Yong Tan ◽

Lun Cai Liu ◽

Lin Tao Liu

Keyword(s):

Digital Signal ◽

Intermediate Frequency ◽

Mixed Signal ◽

Programmable Analog ◽

Baseband Signal ◽

Field Programmable ◽

Baseband Signal Processing ◽

Good Potential ◽

On Chip ◽

Field Programmable Analog Array

This paper presents a novel architecture for mixed-signal SoC, which integrates a Field Programmable Analog Array (FPAA) into a SoC based on 32-bit RISC CPU. The FPAA unit can be configured as Filter, Comparator, Gain Amplifier, and so on. The proposed mixed-signal SoC can transform the intermediate frequency (IF) analog signal to baseband digital signal and realize the real-time baseband signal processing, besides this, which can transmit the modulated IF signals which are converted from baseband signals by digital up-conversion (DUC). The proposed mixed-signal SoC is a transceiver on chip actually, due to the internal integrated IPs, such as ADC, DAC, DDC and DUC, which can provide smaller board area, lower power consumption and the system cost for the product development of transceiver. This design will have a good potential for wireless communication applications.

Download Full-text

Design and Implementation of a Direct Torque Control of Induction Machine utilizing a Digital Signal Processor and the Field Programmable Gate Arrays

2005 International Conference on Power Electronics and Drives Systems ◽

10.1109/peds.2005.1619663 ◽

2006 ◽

Author(s):

C.L. Toh ◽

N.R.N. Idris ◽

A.H.M. Yatim ◽

F. Patkar

Keyword(s):

Digital Signal Processor ◽

Direct Torque Control ◽

Induction Machine ◽

Digital Signal ◽

Field Programmable Gate Arrays ◽

Torque Control ◽

Gate Arrays ◽

Design And Implementation ◽

Field Programmable ◽

Programmable Gate Arrays

Download Full-text

Implementaciones en Hardware de técnicas de Radiogoniometría

Ingeniería y Región ◽

10.25054/22161325.690 ◽

2016 ◽

Vol 14 (2) ◽

pp. 23

Author(s):

Tibisay Sánchez ◽

Alfredo David Redondo ◽

Andrés Felipe García ◽

Cristina Gómez ◽

Leonardo Betancur ◽

...

Keyword(s):

Digital Signal Processor ◽

Field Programmable Gate Array ◽

Software Defined Radio ◽

Digital Signal ◽

Direction Finding ◽

Field Programmable ◽

Gate Array ◽

Signal Processor

En este artículo se presenta una revisión bibliográfica y un análisis comparativo de implementaciones en hardware de técnicas de Radiogoniometría, también conocidas como Radio Direction Finding (RDF), que permiten identificar la mejor opción para implementar estas funcionalidad en actividades de gestión del espectro en países en vía de desarrollo. Dentro de las implementaciones tratadas se incluyen técnicas clásicas como Pseudo-Doppler y técnicas avanzadas de alta resolución como MUSIC. Se presentan diferentes alternativas de hardware para realizar las implementaciones las cuales incluyen SDR (Software Defined Radio), FPGA (Field Programmable Gate Array) y DSP (Digital Signal Processor); a la vez que se incluyen algunas configuraciones híbridas dónde se mezcla el software y el hardware con el fin de optimizar recursos de tiempo y dinero. Adicionalmente se muestran algunas aplicaciones comerciales que emplean técnicas de geolocalización basadas en información de ángulos de llegada, tiempos de llegada u otros parámetros que permiten realizar el proceso de triangulación o trilateración según sea el caso.

Download Full-text

High-Performance Computing Using FPGAs for Improving the DTC Performances of Induction Motors

Advances in Systems Analysis, Software Engineering, and High Performance Computing - FPGA Algorithms and Applications for the Internet of Things ◽

10.4018/978-1-5225-9806-0.ch007 ◽

2020 ◽

pp. 133-153

Author(s):

Saber Krim ◽

Mohamed Faouzi Mimouni

Keyword(s):

Digital Signal Processor ◽

High Performance ◽

Induction Motors ◽

Control Method ◽

Direct Torque Control ◽

Digital Signal ◽

Sampling Frequency ◽

Torque Control ◽

Field Programmable ◽

Torque Ripples

The conventional direct torque control (DTC) of induction motors has become the most used control strategy. This control method is known by its simplicity, fast torque response, and its lack of dependence on machine parameters. Despite the cited advantages, the conventional DTC suffers from several limitations, like the torque ripples. This chapter aims to improve the conventional DTC performances by keeping its advantages. These ripples depend on the hysteresis bandwidth of the torque and the sampling frequency. The conventional DTC limitations can be prevented by increasing the sampling frequency. Nevertheless, the operation with higher sampling frequency is not possible with the software solutions, like the digital signal processor (DSP), due to the serial processing of the implemented algorithm. To overcome the DSP limitations, the field programmable gate array (FPGA) can be chosen as an alternative solution to implement the DTC algorithm with shorter execution time. In this chapter, the FPGA is chosen thanks to its parallel processing.

Download Full-text