Multi-scale ResNet for real-time underwater object detection

Abstract Feature pyramids have become an essential component in most modern object detectors, such as Mask RCNN, YOLOv3, RetinaNet. In these detectors, the pyramidal feature representations are commonly used which represent an image with multi-scale feature layers. However, the detectors can’t be used in many real world applications which require real time performance under a computationally limited circumstance. In the paper, we study network architecture in YOLOv3 and modify the classical backbone--darknet53 of YOLOv3 by using a group of convolutions and dilated convolutions (DC). Then, a novel one-stage object detection network framework called DC-YOLOv3 is proposed. A lot of experiments on the Pascal 2017 benchmark prove the effectiveness of our framework. The results illustrate that DC-YOLOv3 achieves comparable results with YOLOv3 while being about 1.32× faster in training time and 1.38× faster in inference time.

Download Full-text

A Real-time Underwater Object Detection Algorithm for Multi-beam Forward Looking Sonar

IFAC Proceedings Volumes ◽

10.3182/20120410-3-pt-4028.00051 ◽

2012 ◽

Vol 45 (5) ◽

pp. 306-311 ◽

Cited By ~ 22

Author(s):

Enric Galceran ◽

Vladimir Djapic ◽

Marc Carreras ◽

David P Williams

Keyword(s):

Object Detection ◽

Real Time ◽

Detection Algorithm ◽

Underwater Object ◽

Forward Looking

Download Full-text

Convolutional Neural Networks-Based Object Detection Algorithm by Jointing Semantic Segmentation for Images

Sensors ◽

10.3390/s20185080 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5080

Author(s):

Baohua Qiang ◽

Ruidong Chen ◽

Mingliang Zhou ◽

Yuanchao Pang ◽

Yijie Zhai ◽

...

Keyword(s):

Object Detection ◽

Real Time ◽

Image Data ◽

Semantic Segmentation ◽

Image Understanding ◽

Detection Algorithm ◽

Vital Role ◽

Multi Scale ◽

Segmentation Task ◽

High Level

In recent years, increasing image data comes from various sensors, and object detection plays a vital role in image understanding. For object detection in complex scenes, more detailed information in the image should be obtained to improve the accuracy of detection task. In this paper, we propose an object detection algorithm by jointing semantic segmentation (SSOD) for images. First, we construct a feature extraction network that integrates the hourglass structure network with the attention mechanism layer to extract and fuse multi-scale features to generate high-level features with rich semantic information. Second, the semantic segmentation task is used as an auxiliary task to allow the algorithm to perform multi-task learning. Finally, multi-scale features are used to predict the location and category of the object. The experimental results show that our algorithm substantially enhances object detection performance and consistently outperforms other three comparison algorithms, and the detection speed can reach real-time, which can be used for real-time detection.

Download Full-text

DNS: A multi-scale deconvolution semantic segmentation network for joint detection and segmentation

MATEC Web of Conferences ◽

10.1051/matecconf/201927702005 ◽

2019 ◽

Vol 277 ◽

pp. 02005

Author(s):

Ning Feng ◽

Le Dong ◽

Qianni Zhang ◽

Ning Zhang ◽

Xi Wu ◽

...

Keyword(s):

Image Analysis ◽

Object Detection ◽

Real Time ◽

Medical Image ◽

Medical Image Analysis ◽

Semantic Segmentation ◽

Autonomous Driving ◽

Joint Detection ◽

Multi Scale ◽

Segmentation Task

Real-time semantic segmentation has become crucial in many applications such as medical image analysis and autonomous driving. In this paper, we introduce a single semantic segmentation network, called DNS, for joint object detection and segmentation task. We take advantage of multi-scale deconvolution mechanism to perform real time computations. To this goal, down-scale and up-scale streams are utilized to combine the multi-scale features for the final detection and segmentation task. By using the proposed DNS, not only the tradeoff between accuracy and cost but also the balance of detection and segmentation performance are settled. Experimental results for PASCAL VOC datasets show competitive performance for joint object detection and segmentation task.

Download Full-text

An Efficient Deep Convolutional Neural Network Approach for Object Detection and Recognition Using a Multi-Scale Anchor Box in Real-Time

Future Internet ◽

10.3390/fi13120307 ◽

2021 ◽

Vol 13 (12) ◽

pp. 307

Author(s):

Vijayakumar Varadarajan ◽

Dweepna Garg ◽

Ketan Kotecha

Keyword(s):

Neural Network ◽

Object Detection ◽

Real Time ◽

Semantic Segmentation ◽

Object Identification ◽

Detection Accuracy ◽

Neural Network Approach ◽

Multi Scale ◽

Scale Anchor ◽

Detection And Recognition

Deep learning is a relatively new branch of machine learning in which computers are taught to recognize patterns in massive volumes of data. It primarily describes learning at various levels of representation, which aids in understanding data that includes text, voice, and visuals. Convolutional neural networks have been used to solve challenges in computer vision, including object identification, image classification, semantic segmentation and a lot more. Object detection in videos involves confirming the presence of the object in the image or video and then locating it accurately for recognition. In the video, modelling techniques suffer from high computation and memory costs, which may decrease performance measures such as accuracy and efficiency to identify the object accurately in real-time. The current object detection technique based on a deep convolution neural network requires executing multilevel convolution and pooling operations on the entire image to extract deep semantic properties from it. For large objects, detection models can provide superior results; however, those models fail to detect the varying size of the objects that have low resolution and are greatly influenced by noise because the features after the repeated convolution operations of existing models do not fully represent the essential characteristics of the objects in real-time. With the help of a multi-scale anchor box, the proposed approach reported in this paper enhances the detection accuracy by extracting features at multiple convolution levels of the object. The major contribution of this paper is to design a model to understand better the parameters and the hyper-parameters which affect the detection and the recognition of objects of varying sizes and shapes, and to achieve real-time object detection and recognition speeds by improving accuracy. The proposed model has achieved 84.49 mAP on the test set of the Pascal VOC-2007 dataset at 11 FPS, which is comparatively better than other real-time object detection models.

Download Full-text

Real-time underwater object detection based on an electrically scanned high-resolution sonar

Proceedings of IEEE Symposium on Autonomous Underwater Vehicle Technology (AUV'94) ◽

10.1109/auv.1994.518613 ◽

2002 ◽

Cited By ~ 26

Author(s):

L. Henriksen

Keyword(s):

High Resolution ◽

Object Detection ◽

Real Time ◽

Underwater Object

Download Full-text

TF-YOLO: An Improved Incremental Network for Real-Time Object Detection

Applied Sciences ◽

10.3390/app9163225 ◽

2019 ◽

Vol 9 (16) ◽

pp. 3225 ◽

Cited By ~ 8

Author(s):

He ◽

Huang ◽

Wei ◽

Li ◽

Guo

Keyword(s):

Embedded System ◽

Object Detection ◽

Real Time ◽

Visual Detection ◽

Portable Devices ◽

Detection Model ◽

Multi Scale ◽

Art Object ◽

Feature Pyramid ◽

Small Targets

In recent years, significant advances have been gained in visual detection, and an abundance of outstanding models have been proposed. However, state-of-the-art object detection networks have some inefficiencies in detecting small targets. They commonly fail to run on portable devices or embedded systems due to their high complexity. In this workpaper, a real-time object detection model, termed as Tiny Fast You Only Look Once (TF-YOLO), is developed to implement in an embedded system. Firstly, the k-means++ algorithm is applied to cluster the dataset, which contributes to more excellent priori boxes of the targets. Secondly, inspired by the multi-scale prediction idea in the Feature Pyramid Networks (FPN) algorithm, the framework in YOLOv3 is effectively improved and optimized, by three scales to detect the earlier extracted features. In this way, the modified network is sensitive for small targets. Experimental results demonstrate that the proposed TF-YOLO method is a smaller, faster and more efficient network model increasing the performance of end-to-end training and real-time object detection for a variety of devices.

Download Full-text

MS-Faster R-CNN: Multi-Stream Backbone for Improved Faster R-CNN Object Detection and Aerial Tracking from UAV Images

Remote Sensing ◽

10.3390/rs13091670 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1670

Author(s):

Danilo Avola ◽

Luigi Cinque ◽

Anxhelo Diko ◽

Alessio Fagioli ◽

Gian Luca Foresti ◽

...

Keyword(s):

Image Analysis ◽

Object Detection ◽

Real Time ◽

Video Sequences ◽

Multi Scale ◽

Video Frames ◽

Common Strategy ◽

Real Time Tracking ◽

Uav Images ◽

Sort Algorithm

Tracking objects across multiple video frames is a challenging task due to several difficult issues such as occlusions, background clutter, lighting as well as object and camera view-point variations, which directly affect the object detection. These aspects are even more emphasized when analyzing unmanned aerial vehicles (UAV) based images, where the vehicle movement can also impact the image quality. A common strategy employed to address these issues is to analyze the input images at different scales to obtain as much information as possible to correctly detect and track the objects across video sequences. Following this rationale, in this paper, we introduce a simple yet effective novel multi-stream (MS) architecture, where different kernel sizes are applied to each stream to simulate a multi-scale image analysis. The proposed architecture is then used as backbone for the well-known Faster-R-CNN pipeline, defining a MS-Faster R-CNN object detector that consistently detects objects in video sequences. Subsequently, this detector is jointly used with the Simple Online and Real-time Tracking with a Deep Association Metric (Deep SORT) algorithm to achieve real-time tracking capabilities on UAV images. To assess the presented architecture, extensive experiments were performed on the UMCD, UAVDT, UAV20L, and UAV123 datasets. The presented pipeline achieved state-of-the-art performance, confirming that the proposed multi-stream method can correctly emulate the robust multi-scale image analysis paradigm.

Download Full-text