Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip

Today, convolutional and deconvolutional neural network models are exceptionally popular thanks to the impressive accuracies they have been proven in several computer-vision applications. To speed up the overall tasks of these neural networks, purpose-designed accelerators are highly desirable. Unfortunately, the high computational complexity and the huge memory demand make the design of efficient hardware architectures, as well as their deployment in resource- and power-constrained embedded systems, still quite challenging. This paper presents a novel purpose-designed hardware accelerator to perform 2D deconvolutions. The proposed structure applies a hardware-oriented computational approach that overcomes the issues of traditional deconvolution methods, and it is suitable for being implemented within any virtually system-on-chip based on field-programmable gate array devices. In fact, the novel accelerator is simply scalable to comply with resources available within both high- and low-end devices by adequately scaling the adopted parallelism. As an example, when exploited to accelerate the Deep Convolutional Generative Adversarial Network model, the novel accelerator, running as a standalone unit implemented within the Xilinx Zynq XC7Z020 System-on-Chip (SoC) device, performs up to 72 GOPs. Moreover, it dissipates less than 500mW@200MHz and occupies 5.6%, 4.1%, 17%, and 96%, respectively, of the look-up tables, flip-flops, random access memory, and digital signal processors available on-chip. When accommodated within the same device, the whole embedded system equipped with the novel accelerator performs up to 54 GOPs and dissipates less than 1.8W@150MHz. Thanks to the increased parallelism exploitable, more than 900 GOPs can be executed when the high-end Virtex-7 XC7VX690T device is used as the implementation platform. Moreover, in comparison with state-of-the-art competitors implemented within the Zynq XC7Z045 device, the system proposed here reaches a computational capability up to 20% higher, and saves more than 60% and 80% of power consumption and logic resources requirement, respectively, using 5.7× fewer on-chip memory resources.

Download Full-text

A Novel Design of Software System on Chip for Embedded System

Journal of Signal Processing Systems ◽

10.1007/s11265-015-1099-9 ◽

2016 ◽

Vol 86 (2-3) ◽

pp. 135-147 ◽

Cited By ~ 3

Author(s):

Wei Hu ◽

Hong Guo ◽

Hongna Geng ◽

Kai Zhang ◽

Jun Liu ◽

...

Keyword(s):

Embedded System ◽

System On Chip ◽

Software System ◽

On Chip ◽

Novel Design

Download Full-text

Embedded system-on-chip design of atrial fibrillation classifier

2017 International SoC Design Conference (ISOCC) ◽

10.1109/isocc.2017.8368784 ◽

2017 ◽

Author(s):

Huey Woan Lim ◽

Yuan Wen Hau ◽

Mohd Afzan Othman ◽

Chiao Wen Lim

Keyword(s):

Atrial Fibrillation ◽

Embedded System ◽

System On Chip ◽

Chip Design ◽

On Chip

Download Full-text

A Parallel Connected Component Labeling Architecture for Heterogeneous Systems-on-Chip

Electronics ◽

10.3390/electronics9020292 ◽

2020 ◽

Vol 9 (2) ◽

pp. 292

Author(s):

Stefania Perri ◽

Fanny Spagnolo ◽

Pasquale Corsonello

Keyword(s):

Embedded System ◽

Clock Cycle ◽

Heterogeneous Systems ◽

Image Understanding ◽

Input Image ◽

Fourth Generation ◽

Connected Component ◽

Connected Component Labeling ◽

Labeling Approach ◽

On Chip

Connected component labeling is one of the most important processes for image analysis, image understanding, pattern recognition, and computer vision. It performs inherently sequential operations to scan a binary input image and to assign a unique label to all pixels of each object. This paper presents a novel hardware-oriented labeling approach able to process input pixels in parallel, thus speeding up the labeling task with respect to state-of-the-art competitors. For purposes of comparison with existing designs, several hardware implementations are characterized for different image sizes and realization platforms. The obtained results demonstrate that frame rates and resource efficiency significantly higher than existing counterparts are achieved. The proposed hardware architecture is purposely designed to comply with the fourth generation of the advanced extensible interface (AXI4) protocol and to store intermediate and final outputs within an off-chip memory. Therefore, it can be directly integrated as a custom accelerator in virtually any modern heterogeneous embedded system-on-chip (SoC). As an example, when integrated within the Xilinx Zynq-7000 X C7Z020 SoC, the novel design processes more than 1.9 pixels per clock cycle, thus furnishing more than 30 2k × 2k labeled frames per second by using 3688 Look-Up Tables (LUTs), 1415 Flip Flops (FFs), and 10 kb of on-chip memory.

Download Full-text

LiveCheckHSI: A hardware/software co-verification tool for hyperspectral imaging systems with embedded system-on-chip instrument avionics

2018 IEEE Aerospace Conference ◽

10.1109/aero.2018.8396667 ◽

2018 ◽

Cited By ~ 2

Author(s):

Irene Wang ◽

Didier Keymeulen ◽

Danny Tran ◽

Elliott Liggett ◽

Matthew Klimesh ◽

...

Keyword(s):

Embedded System ◽

Hyperspectral Imaging ◽

System On Chip ◽

Imaging Systems ◽

Verification Tool ◽

On Chip

Download Full-text

A Novel Operating System on Chip with Information Security Support for Embedded System

Journal of Software ◽

10.4304/jsw.4.10.1053-1060 ◽

2009 ◽

Vol 4 (10) ◽

Author(s):

Wei Hu ◽

Tianzhou Chen ◽

Qingsong Shi ◽

Gang Wang ◽

Nan Zhang ◽

...

Keyword(s):

Operating System ◽

Information Security ◽

Embedded System ◽

System On Chip ◽

On Chip

Download Full-text

System on Chip Design for Multi-Principle of Relay Protection in the FPGA

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.668-669.857 ◽

2014 ◽

Vol 668-669 ◽

pp. 857-861

Author(s):

Peng Fei Hu ◽

Yu Xiang Yuan ◽

Zhi Juan Qu ◽

Xue Ping Jiang

Keyword(s):

Signal Processing ◽

Digital Signal Processing ◽

Relay Protection ◽

Digital Signal ◽

System On Chip ◽

Process Scheduling ◽

Chip Design ◽

Protection Devices ◽

On Chip ◽

Set Up

To improve the reliability and integration of relay protection devices in power, the system on chip design for multi-principle of relay protection on FPGA is proposed. The data acquisition, digital signal processing, hardware protection algorithm, FPGA and MCU process scheduling, MCU and peripheral devices communication are designed, the hardware compilation model is set up by QuartusII on FPGA, and the simulation and experimental verification are performed. The results show that the proposed system can improve the speed of hardware protection and reduce the volume of the device, and has reconstruction on architecture.

Download Full-text

A novel PUF-based encryption protocol for embedded System on Chip

2016 International Conference on Development and Application Systems (DAS) ◽

10.1109/daas.2016.7492566 ◽

2016 ◽

Author(s):

Alexandra Stanciu ◽

Florin Dumitru Moldoveanu ◽

Marcian Cirstea

Keyword(s):

Embedded System ◽

System On Chip ◽

On Chip

Download Full-text

ARM-Cortex M3-Based Two-Wheel Robot for Assessing Grid Cell Model of Medial Entorhinal Cortex: Progress towards Building Robots with Biologically Inspired Navigation-Cognitive Maps

Journal of Robotics ◽

10.1155/2017/8069654 ◽

2017 ◽

Vol 2017 ◽

pp. 1-9

Author(s):

J. Cuneo ◽

L. Barboni ◽

N. Blanco ◽

M. del Castillo ◽

J. Quagliotti

Keyword(s):

Embedded System ◽

Entorhinal Cortex ◽

Cell Model ◽

Systems Design ◽

Grid Cell ◽

Network Models ◽

Time Algorithm ◽

Neural Network Models ◽

Biologically Inspired ◽

Medial Entorhinal Cortex

This article presents the implementation and use of a two-wheel autonomous robot and its effectiveness as a tool for studying the recently discovered use of grid cells as part of mammalian’s brains space-mapping circuitry (specifically the medial entorhinal cortex). A proposed discrete-time algorithm that emulates the medial entorhinal cortex is programed into the robot. The robot freely explores a limited laboratory area in the manner of a rat or mouse and reports information to a PC, thus enabling research without the use of live individuals. Position coordinate neural maps are achieved as mathematically predicted although for a reduced number of implemented neurons (i.e., 200 neurons). However, this type of computational embedded system (robot’s microcontroller) is found to be insufficient for simulating huge numbers of neurons in real time (as in the medial entorhinal cortex). It is considered that the results of this work provide an insight into achieving an enhanced embedded systems design for emulating and understanding mathematical neural network models to be used as biologically inspired navigation system for robots.

Download Full-text

Hardware-software partitioning using three-level hybrid algorithm for system-on-chip platform

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v10i1.2201 ◽

2021 ◽

Vol 10 (1) ◽

pp. 466-473

Author(s):

Tiong Reng Xian ◽

Zaini Abdul Halim ◽

Ching Chia Leong ◽

Tan Jiunn Gim

Keyword(s):

Embedded System ◽

Hybrid Algorithm ◽

High Capacity ◽

Minimum Cost ◽

System On Chip ◽

Swarm Optimization ◽

Low Diversity ◽

Field Programmable ◽

On Chip ◽

Software Partitioning

This study discusses hardware-software partitioning, which is useful for system-on-chip (SoC) applications. Hardware-software partitioning attempts to obtain the lowest execution time by combining a hardware processor system and a field programmable gate array on the SoC platform in embedded system applications. A three-level hybrid algorithm called GAGAPSO is proposed in this study. The algorithm consists of two successive genetic algorithms (GAs) and one particle swarm optimization (PSO). The drawbacks of these two algorithms are GA has low convergence speed and PSO has premature convergence because of low diversity. These algorithms are combined in this study to achieve high-capacity global convergence and enhanced search efficiency. In this study, three algorithms are developed, namely, GA, GAPSO and GAGAPSO using MATLAB. These algorithms are evaluated on the basis of the number of nodes and the minimum cost that can be achieved. The number of nodes varies from 10 to 1000 nodes. The minimum cost and the number of iterations to achieve the minimum cost are recorded. Results show that GAGAPSO can converge faster than GA and GAPSO. Furthermore, GAGAPSO can achieve the lowest cost for all nodes.

Download Full-text

System-Level Modelling and Design Space Exploration for Multiprocessor Embedded System-on-Chip Architectures

10.5117/9789056294557 ◽

2006 ◽

Cited By ~ 17

Author(s):

Cagkan Erbas

Keyword(s):

Embedded System ◽

Design Space Exploration ◽

Design Space ◽

Space Exploration ◽

System On Chip ◽

System Level ◽

On Chip

Download Full-text