scholarly journals Exploring RGBDepth Fusion for Real-Time Object Detection

Sensors ◽  
2019 ◽  
Vol 19 (4) ◽  
pp. 866 ◽  
Author(s):  
Tanguy Ophoff ◽  
Kristof Van Beeck ◽  
Toon Goedemé

In this paper, we investigate whether fusing depth information on top of normal RGB data for camera-based object detection can help to increase the performance of current state-of-the-art single-shot detection networks. Indeed, depth sensing is easily acquired using depth cameras such as a Kinect or stereo setups. We investigate the optimal manner to perform this sensor fusion with a special focus on lightweight single-pass convolutional neural network (CNN) architectures, enabling real-time processing on limited hardware. For this, we implement a network architecture allowing us to parameterize at which network layer both information sources are fused together. We performed exhaustive experiments to determine the optimal fusion point in the network, from which we can conclude that fusing towards the mid to late layers provides the best results. Our best fusion models significantly outperform the baseline RGB network in both accuracy and localization of the detections.

Entropy ◽  
2021 ◽  
Vol 23 (5) ◽  
pp. 546
Author(s):  
Zhenni Li ◽  
Haoyi Sun ◽  
Yuliang Gao ◽  
Jiao Wang

Depth maps obtained through sensors are often unsatisfactory because of their low-resolution and noise interference. In this paper, we propose a real-time depth map enhancement system based on a residual network which uses dual channels to process depth maps and intensity maps respectively and cancels the preprocessing process, and the algorithm proposed can achieve real-time processing speed at more than 30 fps. Furthermore, the FPGA design and implementation for depth sensing is also introduced. In this FPGA design, intensity image and depth image are captured by the dual-camera synchronous acquisition system as the input of neural network. Experiments on various depth map restoration shows our algorithms has better performance than existing LRMC, DE-CNN and DDTF algorithms on standard datasets and has a better depth map super-resolution, and our FPGA completed the test of the system to ensure that the data throughput of the USB 3.0 interface of the acquisition system is stable at 226 Mbps, and support dual-camera to work at full speed, that is, 54 fps@ (1280 × 960 + 328 × 248 × 3).


Micromachines ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 1453
Author(s):  
Hyun Myung Kim ◽  
Min Seok Kim ◽  
Sehui Chang ◽  
Jiseong Jeong ◽  
Hae-Gon Jeon ◽  
...  

The light field camera provides a robust way to capture both spatial and angular information within a single shot. One of its important applications is in 3D depth sensing, which can extract depth information from the acquired scene. However, conventional light field cameras suffer from shallow depth of field (DoF). Here, a vari-focal light field camera (VF-LFC) with an extended DoF is newly proposed for mid-range 3D depth sensing applications. As a main lens of the system, a vari-focal lens with four different focal lengths is adopted to extend the DoF up to ~15 m. The focal length of the micro-lens array (MLA) is optimized by considering the DoF both in the image plane and in the object plane for each focal length. By dividing measurement regions with each focal length, depth estimation with high reliability is available within the entire DoF. The proposed VF-LFC is evaluated by the disparity data extracted from images with different distances. Moreover, the depth measurement in an outdoor environment demonstrates that our VF-LFC could be applied in various fields such as delivery robots, autonomous vehicles, and remote sensing drones.


2020 ◽  
Vol 226 ◽  
pp. 02020
Author(s):  
Alexey V. Stadnik ◽  
Pavel S. Sazhin ◽  
Slavomir Hnatic

The performance of neural networks is one of the most important topics in the field of computer vision. In this work, we analyze the speed of object detection using the well-known YOLOv3 neural network architecture in different frameworks under different hardware requirements. We obtain results, which allow us to formulate preliminary qualitative conclusions about the feasibility of various hardware scenarios to solve tasks in real-time environments.


2019 ◽  
Vol 107 (1) ◽  
pp. 651-661 ◽  
Author(s):  
Adwitiya Arora ◽  
Atul Grover ◽  
Raksha Chugh ◽  
S. Sofana Reka

Sensors ◽  
2020 ◽  
Vol 20 (12) ◽  
pp. 3591 ◽  
Author(s):  
Haidi Zhu ◽  
Haoran Wei ◽  
Baoqing Li ◽  
Xiaobing Yuan ◽  
Nasser Kehtarnavaz

This paper addresses real-time moving object detection with high accuracy in high-resolution video frames. A previously developed framework for moving object detection is modified to enable real-time processing of high-resolution images. First, a computationally efficient method is employed, which detects moving regions on a resized image while maintaining moving regions on the original image with mapping coordinates. Second, a light backbone deep neural network in place of a more complex one is utilized. Third, the focal loss function is employed to alleviate the imbalance between positive and negative samples. The results of the extensive experimentations conducted indicate that the modified framework developed in this paper achieves a processing rate of 21 frames per second with 86.15% accuracy on the dataset SimitMovingDataset, which contains high-resolution images of the size 1920 × 1080.


Electronics ◽  
2021 ◽  
Vol 10 (16) ◽  
pp. 1932
Author(s):  
Malik Haris ◽  
Adam Glowacz

Automated driving and vehicle safety systems need object detection. It is important that object detection be accurate overall and robust to weather and environmental conditions and run in real-time. As a consequence of this approach, they require image processing algorithms to inspect the contents of images. This article compares the accuracy of five major image processing algorithms: Region-based Fully Convolutional Network (R-FCN), Mask Region-based Convolutional Neural Networks (Mask R-CNN), Single Shot Multi-Box Detector (SSD), RetinaNet, and You Only Look Once v4 (YOLOv4). In this comparative analysis, we used a large-scale Berkeley Deep Drive (BDD100K) dataset. Their strengths and limitations are analyzed based on parameters such as accuracy (with/without occlusion and truncation), computation time, precision-recall curve. The comparison is given in this article helpful in understanding the pros and cons of standard deep learning-based algorithms while operating under real-time deployment restrictions. We conclude that the YOLOv4 outperforms accurately in detecting difficult road target objects under complex road scenarios and weather conditions in an identical testing environment.


Author(s):  
Ashwani Kumar ◽  
Zuopeng Justin Zhang ◽  
Hongbo Lyu

Abstract In today’s scenario, the fastest algorithm which uses a single layer of convolutional network to detect the objects from the image is single shot multi-box detector (SSD) algorithm. This paper studies object detection techniques to detect objects in real time on any device running the proposed model in any environment. In this paper, we have increased the classification accuracy of detecting objects by improving the SSD algorithm while keeping the speed constant. These improvements have been done in their convolutional layers, by using depth-wise separable convolution along with spatial separable convolutions generally called multilayer convolutional neural networks. The proposed method uses these multilayer convolutional neural networks to develop a system model which consists of multilayers to classify the given objects into any of the defined classes. The schemes then use multiple images and detect the objects from these images, labeling them with their respective class label. To speed up the computational performance, the proposed algorithm is applied along with the multilayer convolutional neural network which uses a larger number of default boxes and results in more accurate detection. The accuracy in detecting the objects is checked by different parameters such as loss function, frames per second (FPS), mean average precision (mAP), and aspect ratio. Experimental results confirm that our proposed improved SSD algorithm has high accuracy.


2019 ◽  
Vol 11 (7) ◽  
pp. 786 ◽  
Author(s):  
Yang-Lang Chang ◽  
Amare Anagaw ◽  
Lena Chang ◽  
Yi Wang ◽  
Chih-Yu Hsiao ◽  
...  

Synthetic aperture radar (SAR) imagery has been used as a promising data source for monitoring maritime activities, and its application for oil and ship detection has been the focus of many previous research studies. Many object detection methods ranging from traditional to deep learning approaches have been proposed. However, majority of them are computationally intensive and have accuracy problems. The huge volume of the remote sensing data also brings a challenge for real time object detection. To mitigate this problem a high performance computing (HPC) method has been proposed to accelerate SAR imagery analysis, utilizing the GPU based computing methods. In this paper, we propose an enhanced GPU based deep learning method to detect ship from the SAR images. The You Only Look Once version 2 (YOLOv2) deep learning framework is proposed to model the architecture and training the model. YOLOv2 is a state-of-the-art real-time object detection system, which outperforms Faster Region-Based Convolutional Network (Faster R-CNN) and Single Shot Multibox Detector (SSD) methods. Additionally, in order to reduce computational time with relatively competitive detection accuracy, we develop a new architecture with less number of layers called YOLOv2-reduced. In the experiment, we use two types of datasets: A SAR ship detection dataset (SSDD) dataset and a Diversified SAR Ship Detection Dataset (DSSDD). These two datasets were used for training and testing purposes. YOLOv2 test results showed an increase in accuracy of ship detection as well as a noticeable reduction in computational time compared to Faster R-CNN. From the experimental results, the proposed YOLOv2 architecture achieves an accuracy of 90.05% and 89.13% on the SSDD and DSSDD datasets respectively. The proposed YOLOv2-reduced architecture has a similarly competent detection performance as YOLOv2, but with less computational time on a NVIDIA TITAN X GPU. The experimental results shows that the deep learning can make a big leap forward in improving the performance of SAR image ship detection.


Sign in / Sign up

Export Citation Format

Share Document