scholarly journals Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip

2020 ◽  
Vol 6 (9) ◽  
pp. 85
Author(s):  
Stefania Perri ◽  
Cristian Sestito ◽  
Fanny Spagnolo ◽  
Pasquale Corsonello

Today, convolutional and deconvolutional neural network models are exceptionally popular thanks to the impressive accuracies they have been proven in several computer-vision applications. To speed up the overall tasks of these neural networks, purpose-designed accelerators are highly desirable. Unfortunately, the high computational complexity and the huge memory demand make the design of efficient hardware architectures, as well as their deployment in resource- and power-constrained embedded systems, still quite challenging. This paper presents a novel purpose-designed hardware accelerator to perform 2D deconvolutions. The proposed structure applies a hardware-oriented computational approach that overcomes the issues of traditional deconvolution methods, and it is suitable for being implemented within any virtually system-on-chip based on field-programmable gate array devices. In fact, the novel accelerator is simply scalable to comply with resources available within both high- and low-end devices by adequately scaling the adopted parallelism. As an example, when exploited to accelerate the Deep Convolutional Generative Adversarial Network model, the novel accelerator, running as a standalone unit implemented within the Xilinx Zynq XC7Z020 System-on-Chip (SoC) device, performs up to 72 GOPs. Moreover, it dissipates less than 500mW@200MHz and occupies 5.6%, 4.1%, 17%, and 96%, respectively, of the look-up tables, flip-flops, random access memory, and digital signal processors available on-chip. When accommodated within the same device, the whole embedded system equipped with the novel accelerator performs up to 54 GOPs and dissipates less than 1.8W@150MHz. Thanks to the increased parallelism exploitable, more than 900 GOPs can be executed when the high-end Virtex-7 XC7VX690T device is used as the implementation platform. Moreover, in comparison with state-of-the-art competitors implemented within the Zynq XC7Z045 device, the system proposed here reaches a computational capability up to 20% higher, and saves more than 60% and 80% of power consumption and logic resources requirement, respectively, using 5.7× fewer on-chip memory resources.

2016 ◽  
Vol 86 (2-3) ◽  
pp. 135-147 ◽  
Author(s):  
Wei Hu ◽  
Hong Guo ◽  
Hongna Geng ◽  
Kai Zhang ◽  
Jun Liu ◽  
...  

Electronics ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 292
Author(s):  
Stefania Perri ◽  
Fanny Spagnolo ◽  
Pasquale Corsonello

Connected component labeling is one of the most important processes for image analysis, image understanding, pattern recognition, and computer vision. It performs inherently sequential operations to scan a binary input image and to assign a unique label to all pixels of each object. This paper presents a novel hardware-oriented labeling approach able to process input pixels in parallel, thus speeding up the labeling task with respect to state-of-the-art competitors. For purposes of comparison with existing designs, several hardware implementations are characterized for different image sizes and realization platforms. The obtained results demonstrate that frame rates and resource efficiency significantly higher than existing counterparts are achieved. The proposed hardware architecture is purposely designed to comply with the fourth generation of the advanced extensible interface (AXI4) protocol and to store intermediate and final outputs within an off-chip memory. Therefore, it can be directly integrated as a custom accelerator in virtually any modern heterogeneous embedded system-on-chip (SoC). As an example, when integrated within the Xilinx Zynq-7000 X C7Z020 SoC, the novel design processes more than 1.9 pixels per clock cycle, thus furnishing more than 30 2k × 2k labeled frames per second by using 3688 Look-Up Tables (LUTs), 1415 Flip Flops (FFs), and 10 kb of on-chip memory.


2009 ◽  
Vol 4 (10) ◽  
Author(s):  
Wei Hu ◽  
Tianzhou Chen ◽  
Qingsong Shi ◽  
Gang Wang ◽  
Nan Zhang ◽  
...  

2014 ◽  
Vol 668-669 ◽  
pp. 857-861
Author(s):  
Peng Fei Hu ◽  
Yu Xiang Yuan ◽  
Zhi Juan Qu ◽  
Xue Ping Jiang

To improve the reliability and integration of relay protection devices in power, the system on chip design for multi-principle of relay protection on FPGA is proposed. The data acquisition, digital signal processing, hardware protection algorithm, FPGA and MCU process scheduling, MCU and peripheral devices communication are designed, the hardware compilation model is set up by QuartusII on FPGA, and the simulation and experimental verification are performed. The results show that the proposed system can improve the speed of hardware protection and reduce the volume of the device, and has reconstruction on architecture.


2017 ◽  
Vol 2017 ◽  
pp. 1-9
Author(s):  
J. Cuneo ◽  
L. Barboni ◽  
N. Blanco ◽  
M. del Castillo ◽  
J. Quagliotti

This article presents the implementation and use of a two-wheel autonomous robot and its effectiveness as a tool for studying the recently discovered use of grid cells as part of mammalian’s brains space-mapping circuitry (specifically the medial entorhinal cortex). A proposed discrete-time algorithm that emulates the medial entorhinal cortex is programed into the robot. The robot freely explores a limited laboratory area in the manner of a rat or mouse and reports information to a PC, thus enabling research without the use of live individuals. Position coordinate neural maps are achieved as mathematically predicted although for a reduced number of implemented neurons (i.e., 200 neurons). However, this type of computational embedded system (robot’s microcontroller) is found to be insufficient for simulating huge numbers of neurons in real time (as in the medial entorhinal cortex). It is considered that the results of this work provide an insight into achieving an enhanced embedded systems design for emulating and understanding mathematical neural network models to be used as biologically inspired navigation system for robots.


2021 ◽  
Vol 10 (1) ◽  
pp. 466-473
Author(s):  
Tiong Reng Xian ◽  
Zaini Abdul Halim ◽  
Ching Chia Leong ◽  
Tan Jiunn Gim

This study discusses hardware-software partitioning, which is useful for system-on-chip (SoC) applications. Hardware-software partitioning attempts to obtain the lowest execution time by combining a hardware processor system and a field programmable gate array on the SoC platform in embedded system applications. A three-level hybrid algorithm called GAGAPSO is proposed in this study. The algorithm consists of two successive genetic algorithms (GAs) and one particle swarm optimization (PSO). The drawbacks of these two algorithms are GA has low convergence speed and PSO has premature convergence because of low diversity. These algorithms are combined in this study to achieve high-capacity global convergence and enhanced search efficiency. In this study, three algorithms are developed, namely, GA, GAPSO and GAGAPSO using MATLAB. These algorithms are evaluated on the basis of the number of nodes and the minimum cost that can be achieved. The number of nodes varies from 10 to 1000 nodes. The minimum cost and the number of iterations to achieve the minimum cost are recorded. Results show that GAGAPSO can converge faster than GA and GAPSO. Furthermore, GAGAPSO can achieve the lowest cost for all nodes. 


Sign in / Sign up

Export Citation Format

Share Document