A Parallel Connected Component Labeling Architecture for Heterogeneous Systems-on-Chip

Connected component labeling is one of the most important processes for image analysis, image understanding, pattern recognition, and computer vision. It performs inherently sequential operations to scan a binary input image and to assign a unique label to all pixels of each object. This paper presents a novel hardware-oriented labeling approach able to process input pixels in parallel, thus speeding up the labeling task with respect to state-of-the-art competitors. For purposes of comparison with existing designs, several hardware implementations are characterized for different image sizes and realization platforms. The obtained results demonstrate that frame rates and resource efficiency significantly higher than existing counterparts are achieved. The proposed hardware architecture is purposely designed to comply with the fourth generation of the advanced extensible interface (AXI4) protocol and to store intermediate and final outputs within an off-chip memory. Therefore, it can be directly integrated as a custom accelerator in virtually any modern heterogeneous embedded system-on-chip (SoC). As an example, when integrated within the Xilinx Zynq-7000 X C7Z020 SoC, the novel design processes more than 1.9 pixels per clock cycle, thus furnishing more than 30 2k × 2k labeled frames per second by using 3688 Look-Up Tables (LUTs), 1415 Flip Flops (FFs), and 10 kb of on-chip memory.

Download Full-text

A Memory-Efficient Hardware Architecture for Connected Component Labeling in Embedded System

IEEE Transactions on Circuits and Systems for Video Technology ◽

10.1109/tcsvt.2019.2937189 ◽

2020 ◽

Vol 30 (9) ◽

pp. 3238-3252 ◽

Cited By ~ 1

Author(s):

Chen Zhao ◽

Wu Gao ◽

Feiping Nie

Keyword(s):

Embedded System ◽

Hardware Architecture ◽

Connected Component ◽

Connected Component Labeling ◽

Memory Efficient

Download Full-text

FPGA-Based Object Detection and Motion Tracking in Micro- and Nanorobotics

Nanotechnology ◽

10.4018/978-1-4666-5125-8.ch010 ◽

2014 ◽

pp. 251-261

Author(s):

Claas Diederichs ◽

Sergej Fatikow

Keyword(s):

Image Processing ◽

Object Detection ◽

High Speed ◽

Motion Tracking ◽

Principal Component ◽

Fpga Implementation ◽

Connected Components ◽

Connected Component ◽

Connected Component Labeling ◽

Labeling Approach

Object-detection and classification is a key task in micro- and nanohandling. The microscopic imaging is often the only available sensing technique to detect information about the positions and orientations of objects. FPGA-based image processing is superior to state of the art PC-based image processing in terms of achievable update rate, latency and jitter. A connected component labeling algorithm is presented and analyzed for its high speed object detection and classification feasibility. The features of connected components are discussed and analyzed for their feasibility with a single-pass connected component labeling approach, focused on principal component analysis-based features. It is shown that an FPGA implementation of the algorithm can be used for high-speed tool tracking as well as object classification inside optical microscopes. Furthermore, it is shown that an FPGA implementation of the algorithm can be used to detect and classify carbon-nanotubes (CNTs) during image acquisition in a scanning electron microscope, allowing fast object detection before the whole image is captured.

Download Full-text

FPGA-Based Object Detection and Motion Tracking in Micro- and Nanorobotics

International Journal of Intelligent Mechatronics and Robotics ◽

10.4018/ijimr.2013010103 ◽

2013 ◽

Vol 3 (1) ◽

pp. 27-37 ◽

Cited By ~ 3

Author(s):

Claas Diederichs ◽

Sergej Fatikow

Keyword(s):

Image Processing ◽

Object Detection ◽

High Speed ◽

Motion Tracking ◽

Principal Component ◽

Fpga Implementation ◽

Connected Components ◽

Connected Component ◽

Connected Component Labeling ◽

Labeling Approach

Download Full-text

Connected Component Labeling Using Components Neighbors-Scan Labeling Approach

Journal of Computer Science ◽

10.3844/jcssp.2010.1099.1107 ◽

2010 ◽

Vol 6 (10) ◽

pp. 1099-1107 ◽

Cited By ~ 6

Author(s):

Rakhmadi

Keyword(s):

Connected Component ◽

Connected Component Labeling ◽

Labeling Approach

Download Full-text

An Efficient Hardware-Oriented Single-Pass Approach for Connected Component Analysis

Sensors ◽

10.3390/s19143055 ◽

2019 ◽

Vol 19 (14) ◽

pp. 3055 ◽

Cited By ~ 3

Author(s):

Fanny Spagnolo ◽

Stefania Perri ◽

Pasquale Corsonello

Keyword(s):

Component Analysis ◽

Input Image ◽

Image Resolution ◽

Streaming Data ◽

Frame Rate ◽

Connected Component ◽

Connected Component Analysis ◽

Single Pass ◽

Field Programmable ◽

On Chip

Connected Component Analysis (CCA) plays an important role in several image analysis and pattern recognition algorithms. Being one of the most time-consuming tasks in such applications, specific hardware accelerator for the CCA are highly desirable. As its main characteristic, the design of such an accelerator must be able to complete a run-time process of the input image frame without suspending the input streaming data-flow, by using a reasonable amount of hardware resources. This paper presents a new approach that allows virtually any feature of interest to be extracted in a single-pass from the input image frames. The proposed method has been validated by a proper system hardware implemented in a complete heterogeneous design, within a Xilinx Zynq-7000 Field Programmable Gate Array (FPGA) System on Chip (SoC) device. For processing 640 × 480 input image resolution, only 760 LUTs and 787 FFs were required. Moreover, a frame-rate of ~325 fps and a throughput of 95.37 Mp/s were achieved. When compared to several recent competitors, the proposed design exhibits the most favorable performance-resources trade-off.

Download Full-text

Efficient Deconvolution Architecture for Heterogeneous Systems-on-Chip

Journal of Imaging ◽

10.3390/jimaging6090085 ◽

2020 ◽

Vol 6 (9) ◽

pp. 85

Author(s):

Stefania Perri ◽

Cristian Sestito ◽

Fanny Spagnolo ◽

Pasquale Corsonello

Keyword(s):

Embedded System ◽

Heterogeneous Systems ◽

Random Access ◽

Network Models ◽

Digital Signal ◽

System On Chip ◽

The Novel ◽

Neural Network Models ◽

Generative Adversarial Network ◽

On Chip

Today, convolutional and deconvolutional neural network models are exceptionally popular thanks to the impressive accuracies they have been proven in several computer-vision applications. To speed up the overall tasks of these neural networks, purpose-designed accelerators are highly desirable. Unfortunately, the high computational complexity and the huge memory demand make the design of efficient hardware architectures, as well as their deployment in resource- and power-constrained embedded systems, still quite challenging. This paper presents a novel purpose-designed hardware accelerator to perform 2D deconvolutions. The proposed structure applies a hardware-oriented computational approach that overcomes the issues of traditional deconvolution methods, and it is suitable for being implemented within any virtually system-on-chip based on field-programmable gate array devices. In fact, the novel accelerator is simply scalable to comply with resources available within both high- and low-end devices by adequately scaling the adopted parallelism. As an example, when exploited to accelerate the Deep Convolutional Generative Adversarial Network model, the novel accelerator, running as a standalone unit implemented within the Xilinx Zynq XC7Z020 System-on-Chip (SoC) device, performs up to 72 GOPs. Moreover, it dissipates less than 500mW@200MHz and occupies 5.6%, 4.1%, 17%, and 96%, respectively, of the look-up tables, flip-flops, random access memory, and digital signal processors available on-chip. When accommodated within the same device, the whole embedded system equipped with the novel accelerator performs up to 54 GOPs and dissipates less than 1.8W@150MHz. Thanks to the increased parallelism exploitable, more than 900 GOPs can be executed when the high-end Virtex-7 XC7VX690T device is used as the implementation platform. Moreover, in comparison with state-of-the-art competitors implemented within the Zynq XC7Z045 device, the system proposed here reaches a computational capability up to 20% higher, and saves more than 60% and 80% of power consumption and logic resources requirement, respectively, using 5.7× fewer on-chip memory resources.

Download Full-text