Exploring RGBDepth Fusion for Real-Time Object Detection

In this paper, we investigate whether fusing depth information on top of normal RGB data for camera-based object detection can help to increase the performance of current state-of-the-art single-shot detection networks. Indeed, depth sensing is easily acquired using depth cameras such as a Kinect or stereo setups. We investigate the optimal manner to perform this sensor fusion with a special focus on lightweight single-pass convolutional neural network (CNN) architectures, enabling real-time processing on limited hardware. For this, we implement a network architecture allowing us to parameterize at which network layer both information sources are fused together. We performed exhaustive experiments to determine the optimal fusion point in the network, from which we can conclude that fusing towards the mid to late layers provides the best results. Our best fusion models significantly outperform the baseline RGB network in both accuracy and localization of the detections.

Download Full-text

A Residual Network and FPGA Based Real-Time Depth Map Enhancement System

Entropy ◽

10.3390/e23050546 ◽

2021 ◽

Vol 23 (5) ◽

pp. 546

Author(s):

Zhenni Li ◽

Haoyi Sun ◽

Yuliang Gao ◽

Jiao Wang

Keyword(s):

Real Time ◽

Super Resolution ◽

Depth Map ◽

Acquisition System ◽

Depth Image ◽

Fpga Design ◽

Depth Sensing ◽

Residual Network ◽

Real Time Processing ◽

Depth Maps

Depth maps obtained through sensors are often unsatisfactory because of their low-resolution and noise interference. In this paper, we propose a real-time depth map enhancement system based on a residual network which uses dual channels to process depth maps and intensity maps respectively and cancels the preprocessing process, and the algorithm proposed can achieve real-time processing speed at more than 30 fps. Furthermore, the FPGA design and implementation for depth sensing is also introduced. In this FPGA design, intensity image and depth image are captured by the dual-camera synchronous acquisition system as the input of neural network. Experiments on various depth map restoration shows our algorithms has better performance than existing LRMC, DE-CNN and DDTF algorithms on standard datasets and has a better depth map super-resolution, and our FPGA completed the test of the system to ensure that the data throughput of the USB 3.0 interface of the acquisition system is stable at 226 Mbps, and support dual-camera to work at full speed, that is, 54 fps@ (1280 × 960 + 328 × 248 × 3).

Download Full-text

Vari-Focal Light Field Camera for Extended Depth of Field

Micromachines ◽

10.3390/mi12121453 ◽

2021 ◽

Vol 12 (12) ◽

pp. 1453

Author(s):

Hyun Myung Kim ◽

Min Seok Kim ◽

Sehui Chang ◽

Jiseong Jeong ◽

Hae-Gon Jeon ◽

...

Keyword(s):

Light Field ◽

High Reliability ◽

Focal Length ◽

Outdoor Environment ◽

Depth Of Field ◽

Depth Information ◽

Single Shot ◽

Depth Sensing ◽

Depth Measurement ◽

Sensing Applications

The light field camera provides a robust way to capture both spatial and angular information within a single shot. One of its important applications is in 3D depth sensing, which can extract depth information from the acquired scene. However, conventional light field cameras suffer from shallow depth of field (DoF). Here, a vari-focal light field camera (VF-LFC) with an extended DoF is newly proposed for mid-range 3D depth sensing applications. As a main lens of the system, a vari-focal lens with four different focal lengths is adopted to extend the DoF up to ~15 m. The focal length of the micro-lens array (MLA) is optimized by considering the DoF both in the image plane and in the object plane for each focal length. By dividing measurement regions with each focal length, depth estimation with high reliability is available within the entire DoF. The proposed VF-LFC is evaluated by the disparity data extracted from images with different distances. Moreover, the depth measurement in an outdoor environment demonstrates that our VF-LFC could be applied in various fields such as delivery robots, autonomous vehicles, and remote sensing drones.

Download Full-text

Comparative Performance Analysis of Neural Network Real-Time Object Detections in Different Implementations

EPJ Web of Conferences ◽

10.1051/epjconf/202022602020 ◽

2020 ◽

Vol 226 ◽

pp. 02020

Author(s):

Alexey V. Stadnik ◽

Pavel S. Sazhin ◽

Slavomir Hnatic

Keyword(s):

Neural Network ◽

Neural Networks ◽

Computer Vision ◽

Performance Analysis ◽

Object Detection ◽

Real Time ◽

Network Architecture ◽

Neural Network Architecture ◽

Comparative Performance

The performance of neural networks is one of the most important topics in the field of computer vision. In this work, we analyze the speed of object detection using the well-known YOLOv3 neural network architecture in different frameworks under different hardware requirements. We obtain results, which allow us to formulate preliminary qualitative conclusions about the feasibility of various hardware scenarios to solve tasks in real-time environments.

Download Full-text

Tiny SSD: A Tiny Single-Shot Detection Deep Convolutional Neural Network for Real-Time Embedded Object Detection

2018 15th Conference on Computer and Robot Vision (CRV) ◽

10.1109/crv.2018.00023 ◽

2018 ◽

Cited By ~ 23

Author(s):

Alexander Womg ◽

Mohammad Javad Shafiee ◽

Francis Li ◽

Brendan Chwyl

Keyword(s):

Neural Network ◽

Object Detection ◽

Convolutional Neural Network ◽

Real Time ◽

Deep Convolutional Neural Network ◽

Single Shot ◽

Shot Detection

Download Full-text

Real Time Multi Object Detection for Blind Using Single Shot Multibox Detector

Wireless Personal Communications ◽

10.1007/s11277-019-06294-1 ◽

2019 ◽

Vol 107 (1) ◽

pp. 651-661 ◽

Cited By ~ 5

Author(s):

Adwitiya Arora ◽

Atul Grover ◽

Raksha Chugh ◽

S. Sofana Reka

Keyword(s):

Object Detection ◽

Real Time ◽

Single Shot

Download Full-text

Real-Time Moving Object Detection in High-Resolution Video Sensing

Sensors ◽

10.3390/s20123591 ◽

2020 ◽

Vol 20 (12) ◽

pp. 3591 ◽

Cited By ~ 1

Author(s):

Haidi Zhu ◽

Haoran Wei ◽

Baoqing Li ◽

Xiaobing Yuan ◽

Nasser Kehtarnavaz

Keyword(s):

High Resolution ◽

Object Detection ◽

Real Time ◽

Moving Object Detection ◽

Moving Object ◽

Computationally Efficient ◽

Real Time Processing ◽

Time Processing ◽

Video Frames ◽

High Resolution Images

This paper addresses real-time moving object detection with high accuracy in high-resolution video frames. A previously developed framework for moving object detection is modified to enable real-time processing of high-resolution images. First, a computationally efficient method is employed, which detects moving regions on a resized image while maintaining moving regions on the original image with mapping coordinates. Second, a light backbone deep neural network in place of a more complex one is utilized. Third, the focal loss function is employed to alleviate the imbalance between positive and negative samples. The results of the extensive experimentations conducted indicate that the modified framework developed in this paper achieves a processing rate of 21 frames per second with 86.15% accuracy on the dataset SimitMovingDataset, which contains high-resolution images of the size 1920 × 1080.

Download Full-text

Network virtualization for real-time processing of object detection using deep learning

Multimedia Tools and Applications ◽

10.1007/s11042-020-09603-0 ◽

2020 ◽

Cited By ~ 1

Author(s):

Dae-Young Kim ◽

Ji-Hoon Park ◽

Youngchan Lee ◽

Seokhoon Kim

Keyword(s):

Deep Learning ◽

Object Detection ◽

Real Time ◽

Network Virtualization ◽

Real Time Processing ◽

Time Processing

Download Full-text

Road Object Detection: A Comparative Study of Deep Learning-Based Algorithms

Electronics ◽

10.3390/electronics10161932 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1932

Author(s):

Malik Haris ◽

Adam Glowacz

Keyword(s):

Image Processing ◽

Deep Learning ◽

Object Detection ◽

Real Time ◽

Large Scale ◽

Single Shot ◽

Automated Driving ◽

Convolutional Network ◽

Image Processing Algorithms ◽

Processing Algorithms

Automated driving and vehicle safety systems need object detection. It is important that object detection be accurate overall and robust to weather and environmental conditions and run in real-time. As a consequence of this approach, they require image processing algorithms to inspect the contents of images. This article compares the accuracy of five major image processing algorithms: Region-based Fully Convolutional Network (R-FCN), Mask Region-based Convolutional Neural Networks (Mask R-CNN), Single Shot Multi-Box Detector (SSD), RetinaNet, and You Only Look Once v4 (YOLOv4). In this comparative analysis, we used a large-scale Berkeley Deep Drive (BDD100K) dataset. Their strengths and limitations are analyzed based on parameters such as accuracy (with/without occlusion and truncation), computation time, precision-recall curve. The comparison is given in this article helpful in understanding the pros and cons of standard deep learning-based algorithms while operating under real-time deployment restrictions. We conclude that the YOLOv4 outperforms accurately in detecting difficult road target objects under complex road scenarios and weather conditions in an identical testing environment.

Download Full-text

Object detection in real time based on improved single shot multi-box detector algorithm

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-020-01826-x ◽

2020 ◽

Vol 2020 (1) ◽

Author(s):

Ashwani Kumar ◽

Zuopeng Justin Zhang ◽

Hongbo Lyu

Keyword(s):

Neural Networks ◽

Object Detection ◽

Real Time ◽

Convolutional Neural Networks ◽

Single Layer ◽

Single Shot ◽

Convolutional Network ◽

Detection Techniques ◽

Computational Performance ◽

Speed Up

Abstract In today’s scenario, the fastest algorithm which uses a single layer of convolutional network to detect the objects from the image is single shot multi-box detector (SSD) algorithm. This paper studies object detection techniques to detect objects in real time on any device running the proposed model in any environment. In this paper, we have increased the classification accuracy of detecting objects by improving the SSD algorithm while keeping the speed constant. These improvements have been done in their convolutional layers, by using depth-wise separable convolution along with spatial separable convolutions generally called multilayer convolutional neural networks. The proposed method uses these multilayer convolutional neural networks to develop a system model which consists of multilayers to classify the given objects into any of the defined classes. The schemes then use multiple images and detect the objects from these images, labeling them with their respective class label. To speed up the computational performance, the proposed algorithm is applied along with the multilayer convolutional neural network which uses a larger number of default boxes and results in more accurate detection. The accuracy in detecting the objects is checked by different parameters such as loss function, frames per second (FPS), mean average precision (mAP), and aspect ratio. Experimental results confirm that our proposed improved SSD algorithm has high accuracy.

Download Full-text

Ship Detection Based on YOLOv2 for SAR Imagery

Remote Sensing ◽

10.3390/rs11070786 ◽

2019 ◽

Vol 11 (7) ◽

pp. 786 ◽

Cited By ~ 41

Author(s):

Yang-Lang Chang ◽

Amare Anagaw ◽

Lena Chang ◽

Yi Wang ◽

Chih-Yu Hsiao ◽

...

Keyword(s):

Deep Learning ◽

Object Detection ◽

Real Time ◽

Experimental Results ◽

Detection Methods ◽

Computational Time ◽

Detection Accuracy ◽

Single Shot ◽

Ship Detection ◽

Sar Imagery

Synthetic aperture radar (SAR) imagery has been used as a promising data source for monitoring maritime activities, and its application for oil and ship detection has been the focus of many previous research studies. Many object detection methods ranging from traditional to deep learning approaches have been proposed. However, majority of them are computationally intensive and have accuracy problems. The huge volume of the remote sensing data also brings a challenge for real time object detection. To mitigate this problem a high performance computing (HPC) method has been proposed to accelerate SAR imagery analysis, utilizing the GPU based computing methods. In this paper, we propose an enhanced GPU based deep learning method to detect ship from the SAR images. The You Only Look Once version 2 (YOLOv2) deep learning framework is proposed to model the architecture and training the model. YOLOv2 is a state-of-the-art real-time object detection system, which outperforms Faster Region-Based Convolutional Network (Faster R-CNN) and Single Shot Multibox Detector (SSD) methods. Additionally, in order to reduce computational time with relatively competitive detection accuracy, we develop a new architecture with less number of layers called YOLOv2-reduced. In the experiment, we use two types of datasets: A SAR ship detection dataset (SSDD) dataset and a Diversified SAR Ship Detection Dataset (DSSDD). These two datasets were used for training and testing purposes. YOLOv2 test results showed an increase in accuracy of ship detection as well as a noticeable reduction in computational time compared to Faster R-CNN. From the experimental results, the proposed YOLOv2 architecture achieves an accuracy of 90.05% and 89.13% on the SSDD and DSSDD datasets respectively. The proposed YOLOv2-reduced architecture has a similarly competent detection performance as YOLOv2, but with less computational time on a NVIDIA TITAN X GPU. The experimental results shows that the deep learning can make a big leap forward in improving the performance of SAR image ship detection.

Download Full-text