Custom Bus-Based On-Chip Communication Architecture Design

Self-aware and adaptive Network-on-Chip (NoC) with dual monitoring networks is presented. Proper monitoring interface is an essential prerequisite to adaptive system reconfiguration in parallel on-chip computing. This work proposes a DMC (dual monitoring communication) architecture to support self-awareness on the NoC platform. One type of monitoring communication is integrated with data channel, in order to trace the run-time profile of data communication in high-speed on-chip networking. The other type is separate from the data communication, and is needed to report the run-time profile to the supervising monitor. Direct latency monitoring on mesochronous NoC is presented as a case study and is directly traced in the integrated communication with a novel latency monitoring table in each router. The latency information is reported by the separate monitoring communication to the supervising monitor, which reconfigures the system to adjust the latency, for instance by dynamic voltage and frequency scaling. With quantitative evaluation using synthetic traces and real applications, the effectiveness and efficiency of direct latency monitoring with DMC architecture is demonstrated. The area overhead of DMC architecture is estimated to be small in 65nm CMOS technology.

Download Full-text

A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs

Electronics ◽

10.3390/electronics8010065 ◽

2019 ◽

Vol 8 (1) ◽

pp. 65 ◽

Cited By ~ 8

Author(s):

Zhiqiang Liu ◽

Paul Chow ◽

Jinwei Xu ◽

Jingfei Jiang ◽

Yong Dou ◽

...

Keyword(s):

Design Space ◽

State Of The Art ◽

Three Dimensional ◽

Architecture Design ◽

Computer Vision Applications ◽

Computationally Intensive ◽

On Chip ◽

3D Cnn ◽

High Level ◽

2D And 3D

Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design space for 3D CNN acceleration has been further expanded since one more dimension is introduced, making it a big challenge to accelerate 3D CNNs on FPGAs. Motivated by the finding that the computation patterns of 2D and 3D CNNs are very similar, we propose a uniform architecture design for accelerating both 2D and 3D CNNs in this paper. The uniform architecture is based on the idea of mapping convolutions to matrix multiplications. A customized mapping module is developed to generate the feature matrix tilings with no need to store the entire enlarged feature matrix on-chip or off-chip, a splitting strategy is adopted to reconstruct a convolutional layer to adapt to the on-chip memory capacity, and a 2D multiply-and-accumulate (MAC) array is adopted to compute matrix multiplications efficiently. For demonstration, we implement an accelerator prototype with a high-level synthesis (HLS) methodology on a Xilinx VC709 board and test the accelerator on three typical CNN models: AlexNet, VGG16, and C3D. Experimental results show that the accelerator achieves state-of-the-art throughput performance on both 2D and 3D CNNs, with much better energy efficiency than the CPU and GPU.

Download Full-text