A Manycore Vision Processor for Real-Time Smart Cameras

Real-time image processing and computer vision systems are now in the mainstream of technologies enabling applications for cyber-physical systems, Internet of Things, augmented reality, and Industry 4.0. These applications bring the need for Smart Cameras for local real-time processing of images and videos. However, the massive amount of data to be processed within short deadlines cannot be handled by most commercial cameras. In this work, we show the design and implementation of a manycore vision processor architecture to be used in Smart Cameras. With massive parallelism exploration and application-specific characteristics, our architecture is composed of distributed processing elements and memories connected through a Network-on-Chip. The architecture was implemented as an FPGA overlay, focusing on optimized hardware utilization. The parameterized architecture was characterized by its hardware occupation, maximum operating frequency, and processing frame rate. Different configurations ranging from one to eighty-one processing elements were implemented and compared to several works from the literature. Using a System-on-Chip composed of an FPGA integrated into a general-purpose processor, we showcase the flexibility and efficiency of the hardware/software architecture. The results show that the proposed architecture successfully allies programmability and performance, being a suitable alternative for future Smart Cameras.

Download Full-text

SoC-FPGA systems for the acquisition and processing of electroencephalographic signals

International Journal of Reconfigurable and Embedded Systems (IJRES) ◽

10.11591/ijres.v10.i3.pp237-248 ◽

2021 ◽

Vol 10 (3) ◽

pp. 237

Author(s):

Matias Javier Oliva ◽

Pablo Andrés García ◽

Enrique Mario Spinelli ◽

Alejandro Luis Veiga

Keyword(s):

Embedded System ◽

Real Time ◽

General Purpose ◽

System Response ◽

Single Chip ◽

Real Time Processing ◽

General Purpose Processor ◽

Time Operation ◽

Electroencephalographic Signals ◽

High Level

<span lang="EN-US">Real-time acquisition and processing of electroencephalographic signals have promising applications in the implementation of brain-computer interfaces. These devices allow the user to control a device without performing motor actions, and are usually made up of a biopotential acquisition stage and a personal computer (PC). This structure is very flexible and appropriate for research, but for final users it is necessary to migrate to an embedded system, eliminating the PC from the scheme. The strict real-time processing requirements of such systems justify the choice of a system on a chip field-programmable gate arrays (SoC-FPGA) for its implementation. This article proposes a platform for the acquisition and processing of electroencephalographic signals using this type of device, which combines the parallelism and speed capabilities of an FPGA with the simplicity of a general-purpose processor on a single chip. In this scheme, the FPGA is in charge of the real-time operation, acquiring and processing the signals, while the processor solves the high-level tasks, with the interconnection between processing elements solved by buses integrated into the chip. The proposed scheme was used to implement a brain-computer interface based on steady-state visual evoked potentials, which was used to command a speller. The first tests of the system show that a selection time of 5 seconds per command can be achieved. The time delay between the user’s selection and the system response has been estimated at 343 µs.</span>

Download Full-text

Digital Vein Mapping Using Augmented Reality

International Journal of Intelligent Engineering and Systems ◽

10.22266/ijies2020.1231.45 ◽

2020 ◽

Vol 13 (6) ◽

pp. 512-521

Author(s):

Mohamed Taha ◽

◽

Mohamed Ibrahim ◽

Hala Zayed ◽

◽

...

Keyword(s):

Augmented Reality ◽

Real Time ◽

Infrared Radiation ◽

Low Cost ◽

Economic Cost ◽

Frame Rate ◽

Infrared Light ◽

Real Time Processing ◽

And Gender ◽

Time Systems

Vein detection is an important issue for the medical field. There are some commercial devices for detecting veins using infrared radiation. However, most of these commercial solutions are cost-prohibitive. Recently, veins detection has attracted much attention from research teams. The main focus is on developing real-time systems with low-cost hardware. Systems developed to reduce costs suffer from low frame rates. This, in turn, makes these systems not suitable for real-world applications. On the other hand, systems that use powerful processors to produce high frame rates suffer from high costs and a lack of mobility. In this paper, a real-time vein mapping prototype using augmented reality is proposed. The proposed prototype provides a compromised solution to produce high frame rates with a low-cost system. It consists of a USB camera attached to an Android smartphone used for real-time detection. Infrared radiation is employed to differentiate the veins using 20 Infrared Light Emitting Diodes (LEDs). The captured frames are processed to enhance vein detection using light computational algorithms to improve real-time processing and increase frame rate. Finally, the enhanced view of veins appears on the smartphone screen. Portability and economic cost are taken into consideration while developing the proposed prototype. The proposed prototype is tested with people of different ages and gender, as well as using mobile devices of different specifications. The results show a high vein detection rate and a high frame rate compared to other existing systems.

Download Full-text

Stick Based Speckle Reduction for Real-Time Processing of OCT Images on an FPGA

Acta Polytechnica ◽

10.14311/984 ◽

2007 ◽

Vol 47 (4-5) ◽

Author(s):

H. Luecken ◽

G. Tech ◽

R. Schwann ◽

G. Kappen

Keyword(s):

Real Time ◽

Frame Rate ◽

Speckle Reduction ◽

Image Artifacts ◽

Single Chip ◽

Throughput Rate ◽

Real Time Processing ◽

Time Processing ◽

Envelope Detection ◽

Processing Steps

This paper presents an FPGA based real-time implementation of an adaptive speckle reduction algorithm. Applied to the log-compressed image of a high-resolution optical coherence tomography (OCT) system, all related signal processing steps from envelope detection to VGA video signal generation are executed on a single chip. Images from measured OCT data show that the chosen algorithm produces a smooth, detailed image with fewer image artifacts than comparable approaches. An estimation of the hardware effort, the possible throughput rate and the resulting image frame rate is given for different window sizes used here in speckle reduction.

Download Full-text

A Big Data Platform for Real Time Analysis of Signs of Depression in Social Media

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17134752 ◽

2020 ◽

Vol 17 (13) ◽

pp. 4752 ◽

Cited By ~ 1

Author(s):

Rodrigo Martínez-Castaño ◽

Juan C. Pichel ◽

David E. Losada

Keyword(s):

Social Media ◽

Real Time ◽

Public Health Surveillance ◽

Time Analysis ◽

Social Media Data ◽

Real Time Processing ◽

Processing Elements ◽

Real Time Analysis ◽

Data Platform ◽

Media Data

In this paper we propose a scalable platform for real-time processing of Social Media data. The platform ingests huge amounts of contents, such as Social Media posts or comments, and can support Public Health surveillance tasks. The processing and analytical needs of multiple screening tasks can easily be handled by incorporating user-defined execution graphs. The design is modular and supports different processing elements, such as crawlers to extract relevant contents or classifiers to categorise Social Media. We describe here an implementation of a use case built on the platform that monitors Social Media users and detects early signs of depression.

Download Full-text

Stereo Vision VLSI Processor Based on Pixel-Serial and Window-Parallel Architecture

Journal of Robotics and Mechatronics ◽

10.20965/jrm.2000.p0521 ◽

2000 ◽

Vol 12 (5) ◽

pp. 521-526

Author(s):

Masanori Hariyama ◽

◽

Michitaka Kameyama

Keyword(s):

Stereo Vision ◽

Stereo Matching ◽

Parallel Architecture ◽

Window Size ◽

Input Image ◽

General Purpose ◽

Image Size ◽

Matching Algorithm ◽

Processing Elements ◽

General Purpose Processor

This article presents a stereo-matching algorithm to establish reliable correspondence between images by selecting a desirable window size for SAD (Sum of Absolute Differences) computation. In SAD computation, parallelism between pixels in a window changes depending on its window size, while parallelism between windows is predetermined by the input-image size. Based on this consideration, a window-parallel and pixel-serial architecture is proposed to achieve 100% utilization of processing elements. Performance of the VLSI processor is evaluated to be more than 10,000 times higher than that of a general-purpose processor.

Download Full-text

Design Concept of Responsive Multithreaded Processor for Distributed Real-Time Control

Journal of Robotics and Mechatronics ◽

10.20965/jrm.2004.p0194 ◽

2004 ◽

Vol 16 (2) ◽

pp. 194-199 ◽

Cited By ~ 4

Author(s):

Nobuyuki Yamasaki ◽

Keyword(s):

Real Time ◽

Design Concept ◽

Processing Unit ◽

Home Automation ◽

Real Time Control ◽

Real Time Processing ◽

Time Control ◽

Multithreaded Processor ◽

On Chip ◽

Asic Chip

This paper describes the design concept of Responsive MultiThreaded (RMT) Processor for distributed real-time control that controls various embedded systems including robots, home automation, factory automation, etc. RMT processor integrates an 8-way multithreaded processor (RMT processing unit) for real-time processing, four sets of Responsive Link II for real-time communication, and I/O peripherals including DDR SDRAM I/Fs, DMAC, PCI64, USB2.0, IEEE1394, PWM generators, pulse counters, etc., into an ASIC chip. System designers can use various on-chip functions easily by connecting required I/Os to this chip directly. The designers can also realize distributed control systems by connecting several RMT processors with their own functions via Responsive Link II.

Download Full-text

General purpose electronics for real-time processing and encoding of non-MR data in MR acquisitions

Concepts in Magnetic Resonance Part B Magnetic Resonance Engineering ◽

10.1002/cmr.b.21385 ◽

2018 ◽

Vol 48B (2) ◽

pp. e21385 ◽

Cited By ~ 2

Author(s):

Jan Ole Pedersen ◽

Christian G. Hanson ◽

Rong Xue ◽

Lars G. Hanson

Keyword(s):

Real Time ◽

General Purpose ◽

Real Time Processing ◽

Time Processing

Download Full-text

On-chip data organization and access strategy for spaceborne SAR real-time imaging processor

Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University ◽

10.1051/jnwpu/20213910126 ◽

2021 ◽

Vol 39 (1) ◽

pp. 126-134

Author(s):

Shiyu Wang ◽

Shengbing Zhang ◽

Xiaoping Huang ◽

Hao Lyu

Keyword(s):

Power Consumption ◽

Real Time ◽

Imaging System ◽

Data Organization ◽

Real Time Processing ◽

Time Processing ◽

Chip Data ◽

Application Systems ◽

Sar Imaging ◽

On Chip

Spaceborne SAR(synthetic aperture radar) imaging requires real-time processing of enormous amount of input data with limited power consumption. Designing advanced heterogeneous array processors is an effective way to meet the requirements of power constraints and real-time processing of application systems. To design an efficient SAR imaging processor, the on-chip data organization structure and access strategy are of critical importance. Taking the typical SAR imaging algorithm-chirp scaling algorithm-as the targeted algorithm, this paper analyzes the characteristics of each calculation stage engaged in the SAR imaging process, and extracts the data flow model of SAR imaging, and proposes a storage strategy of cross-region cross-placement and data sorting synchronization execution to ensure FFT/IFFT calculation pipelining parallel operation. The memory wall problem can be alleviated through on-chip multi-level data buffer structure, ensuring the sufficient data providing of the imaging calculation pipeline. Based on this memory organization and access strategy, the SAR imaging pipeline process that effectively supports FFT/IFFT and phase compensation operations is therefore optimized. The processor based on this storage strategy can realize the throughput of up to 115.2 GOPS, and the energy efficiency of up to 254 GOPS/W can be achieved by implementing 65 nm technology. Compared with conventional CPU+GPU acceleration solutions, the performance to power consumption ratio is increased by 63.4 times. The proposed architecture can not only improve the real-time performance, but also reduces the design complexity of the SAR imaging system, which facilitates excellent performance in tailoring and scalability, satisfying the practical needs of different SAR imaging platforms.

Download Full-text

Implementation of FFT on General-Purpose Architectures for FPGA

Computer Engineering ◽

10.4018/978-1-61350-456-7.ch310 ◽

2012 ◽

pp. 658-676

Author(s):

Fabio Garzia ◽

Roberto Airoldi ◽

Jari Nurmi

Keyword(s):

General Purpose ◽

Reference Architecture ◽

Processor Core ◽

General Purpose Processor ◽

Programmable Architecture ◽

Reconfigurable Array ◽

Field Programmable ◽

Speed Up ◽

On Chip ◽

High Level

This paper describes two general-purpose architectures targeted to Field Programmable Gate Array (FPGA) implementation. The first architecture is based on the coupling of a coarse-grain reconfigurable array with a general-purpose processor core. The second architecture is a homogeneous multi-processor system-on-chip (MP-SoC). Both architectures have been mapped onto two different Altera FPGA devices, a StratixII and a StratixIV. Although mapping onto the StratixIV results in higher operating frequencies, the capabilities of the device are not fully exploited. The implementation of a FFT on the two platforms shows a considerable speed-up in comparison with a single-processor reference architecture. The speed-up is higher in the reconfigurable solution but the MP-SoC provides an easier programming interface that is completely based on C language. The authors’ approach proves that implementing a programmable architecture on FPGA and then programming it using a high-level software language is a viable alternative to designing a dedicated hardware block with a hardware description language (HDL) and mapping it on FPGA.

Download Full-text

FPGA Implementations of Algorithms for Preprocessing of High Frame Rate and High Resolution Image Streams in Real Time

Annals of Emerging Technologies in Computing ◽

10.33166/aetic.2021.02.005 ◽

2021 ◽

Vol 5 (2) ◽

pp. 50-61

Author(s):

Uroš Hudomalj ◽

Christopher Mandla ◽

Markus Plattner

Keyword(s):

High Resolution ◽

Real Time ◽

Image Filtering ◽

Performance Comparison ◽

Frame Rate ◽

Real Time Processing ◽

High Frame Rate ◽

Resolution Image ◽

High Resolution Image ◽

Image Streams

This paper presents FPGA implementations of image filtering and image averaging – two widely applied image preprocessing algorithms. The implementations are targeted for real time processing of high frame rate and high resolution image streams. The developed implementations are evaluated in terms of resource usage, power consumption, and achievable frame rates. For the evaluation, Microsemi’s Smartfusion2 Advanced Development Kit is used. It includes a SmartFusion2 M2S150 SoC FPGA. The performance of the developed implementation of image filtering algorithm is compared to a solution provided by MATLAB’s Vision HDL Toolbox, which is evaluated on the same platform. The performance of the developed implementations are also compared with FPGA implementations found in existing publications, although those are evaluated on different FPGA platforms. Difficulties with performance comparison between implementations on different platforms are addressed and limitations of processing image streams with FPGA platforms discussed.

Download Full-text