Convolutional Neural Networks-Based Object Detection Algorithm by Jointing Semantic Segmentation for Images

In recent years, increasing image data comes from various sensors, and object detection plays a vital role in image understanding. For object detection in complex scenes, more detailed information in the image should be obtained to improve the accuracy of detection task. In this paper, we propose an object detection algorithm by jointing semantic segmentation (SSOD) for images. First, we construct a feature extraction network that integrates the hourglass structure network with the attention mechanism layer to extract and fuse multi-scale features to generate high-level features with rich semantic information. Second, the semantic segmentation task is used as an auxiliary task to allow the algorithm to perform multi-task learning. Finally, multi-scale features are used to predict the location and category of the object. The experimental results show that our algorithm substantially enhances object detection performance and consistently outperforms other three comparison algorithms, and the detection speed can reach real-time, which can be used for real-time detection.

Download Full-text

DNS: A multi-scale deconvolution semantic segmentation network for joint detection and segmentation

MATEC Web of Conferences ◽

10.1051/matecconf/201927702005 ◽

2019 ◽

Vol 277 ◽

pp. 02005

Author(s):

Ning Feng ◽

Le Dong ◽

Qianni Zhang ◽

Ning Zhang ◽

Xi Wu ◽

...

Keyword(s):

Image Analysis ◽

Object Detection ◽

Real Time ◽

Medical Image ◽

Medical Image Analysis ◽

Semantic Segmentation ◽

Autonomous Driving ◽

Joint Detection ◽

Multi Scale ◽

Segmentation Task

Real-time semantic segmentation has become crucial in many applications such as medical image analysis and autonomous driving. In this paper, we introduce a single semantic segmentation network, called DNS, for joint object detection and segmentation task. We take advantage of multi-scale deconvolution mechanism to perform real time computations. To this goal, down-scale and up-scale streams are utilized to combine the multi-scale features for the final detection and segmentation task. By using the proposed DNS, not only the tradeoff between accuracy and cost but also the balance of detection and segmentation performance are settled. Experimental results for PASCAL VOC datasets show competitive performance for joint object detection and segmentation task.

Download Full-text

DC-YOLOv3: A novel efficient object detection algorithm

Journal of Physics Conference Series ◽

10.1088/1742-6596/2082/1/012012 ◽

2021 ◽

Vol 2082 (1) ◽

pp. 012012

Author(s):

Xu Zhang ◽

Fang Han ◽

Ping Wang ◽

Wei Jiang ◽

Chen Wang

Keyword(s):

Object Detection ◽

Real Time ◽

Real World ◽

Network Architecture ◽

Detection Algorithm ◽

Training Time ◽

Feature Representations ◽

Multi Scale ◽

Time Performance ◽

Real World Applications

Abstract Feature pyramids have become an essential component in most modern object detectors, such as Mask RCNN, YOLOv3, RetinaNet. In these detectors, the pyramidal feature representations are commonly used which represent an image with multi-scale feature layers. However, the detectors can’t be used in many real world applications which require real time performance under a computationally limited circumstance. In the paper, we study network architecture in YOLOv3 and modify the classical backbone--darknet53 of YOLOv3 by using a group of convolutions and dilated convolutions (DC). Then, a novel one-stage object detection network framework called DC-YOLOv3 is proposed. A lot of experiments on the Pascal 2017 benchmark prove the effectiveness of our framework. The results illustrate that DC-YOLOv3 achieves comparable results with YOLOv3 while being about 1.32× faster in training time and 1.38× faster in inference time.

Download Full-text

An Efficient Deep Convolutional Neural Network Approach for Object Detection and Recognition Using a Multi-Scale Anchor Box in Real-Time

Future Internet ◽

10.3390/fi13120307 ◽

2021 ◽

Vol 13 (12) ◽

pp. 307

Author(s):

Vijayakumar Varadarajan ◽

Dweepna Garg ◽

Ketan Kotecha

Keyword(s):

Neural Network ◽

Object Detection ◽

Real Time ◽

Semantic Segmentation ◽

Object Identification ◽

Detection Accuracy ◽

Neural Network Approach ◽

Multi Scale ◽

Scale Anchor ◽

Detection And Recognition

Deep learning is a relatively new branch of machine learning in which computers are taught to recognize patterns in massive volumes of data. It primarily describes learning at various levels of representation, which aids in understanding data that includes text, voice, and visuals. Convolutional neural networks have been used to solve challenges in computer vision, including object identification, image classification, semantic segmentation and a lot more. Object detection in videos involves confirming the presence of the object in the image or video and then locating it accurately for recognition. In the video, modelling techniques suffer from high computation and memory costs, which may decrease performance measures such as accuracy and efficiency to identify the object accurately in real-time. The current object detection technique based on a deep convolution neural network requires executing multilevel convolution and pooling operations on the entire image to extract deep semantic properties from it. For large objects, detection models can provide superior results; however, those models fail to detect the varying size of the objects that have low resolution and are greatly influenced by noise because the features after the repeated convolution operations of existing models do not fully represent the essential characteristics of the objects in real-time. With the help of a multi-scale anchor box, the proposed approach reported in this paper enhances the detection accuracy by extracting features at multiple convolution levels of the object. The major contribution of this paper is to design a model to understand better the parameters and the hyper-parameters which affect the detection and the recognition of objects of varying sizes and shapes, and to achieve real-time object detection and recognition speeds by improving accuracy. The proposed model has achieved 84.49 mAP on the test set of the Pascal VOC-2007 dataset at 11 FPS, which is comparatively better than other real-time object detection models.

Download Full-text

Real-Time Conveyor Belt Deviation Detection Algorithm Based on Multi-Scale Feature Fusion Network

Algorithms ◽

10.3390/a12100205 ◽

2019 ◽

Vol 12 (10) ◽

pp. 205 ◽

Cited By ~ 1

Author(s):

Chan Zeng ◽

Junfeng Zheng ◽

Jiangyun Li

Keyword(s):

Feature Extraction ◽

Real Time ◽

Load Distribution ◽

Feature Fusion ◽

Conveyor Belt ◽

Detection Algorithm ◽

Scale Feature ◽

Multi Scale ◽

High Level ◽

Detection Effect

The conveyor belt is an indispensable piece of conveying equipment for a mine whose deviation caused by roller sticky material and uneven load distribution is the most common failure during operation. In this paper, a real-time conveyor belt detection algorithm based on a multi-scale feature fusion network is proposed, which mainly includes two parts: the feature extraction module and the deviation detection module. The feature extraction module uses a multi-scale feature fusion network structure to fuse low-level features with rich position and detail information and high-level features with stronger semantic information to improve network detection performance. Depthwise separable convolutions are used to achieve real-time detection. The deviation detection module identifies and monitors the deviation fault by calculating the offset of conveyor belt. In particular, a new weighted loss function is designed to optimize the network and to improve the detection effect of the conveyor belt edge. In order to evaluate the effectiveness of the proposed method, the Canny algorithm, FCNs, UNet and Deeplab v3 networks are selected for comparison. The experimental results show that the proposed algorithm achieves 78.92% in terms of pixel accuracy (PA), and reaches 13.4 FPS (Frames per Second) with the error of less than 3.2 mm, which outperforms the other four algorithms.

Download Full-text

Salient Object Detection Combining a Self-Attention Module and a Feature Pyramid Network

Electronics ◽

10.3390/electronics9101702 ◽

2020 ◽

Vol 9 (10) ◽

pp. 1702

Author(s):

Guangyu Ren ◽

Tianhong Dai ◽

Panagiotis Barmpoutis ◽

Tania Stathaki

Keyword(s):

Object Detection ◽

Receptive Fields ◽

Semantic Segmentation ◽

Salient Object Detection ◽

Object Localization ◽

Salient Object ◽

Convolutional Networks ◽

Multi Scale ◽

Fully Convolutional Networks ◽

High Level

Salient object detection has achieved great improvements by using the Fully Convolutional Networks (FCNs). However, the FCN-based U-shape architecture may cause dilution problems in the high-level semantic information during the up-sample operations in the top-down pathway. Thus, it can weaken the ability of salient object localization and produce degraded boundaries. To this end, in order to overcome this limitation, we propose a novel pyramid self-attention module (PSAM) and the adoption of an independent feature-complementing strategy. In PSAM, self-attention layers are equipped after multi-scale pyramid features to capture richer high-level features and bring larger receptive fields to the model. In addition, a channel-wise attention module is also employed to reduce the redundant features of the FPN and provide refined results. Experimental analysis demonstrates that the proposed PSAM effectively contributes to the whole model so that it outperforms state-of-the-art results over five challenging datasets. Finally, quantitative results show that PSAM generates accurate predictions and integral salient maps, which can provide further help to other computer vision tasks, such as object detection and semantic segmentation.

Download Full-text

Real-time 2D–3D door detection and state classification on a low-power device

SN Applied Sciences ◽

10.1007/s42452-021-04588-3 ◽

2021 ◽

Vol 3 (5) ◽

Author(s):

João Gaspar Ramôa ◽

Vasco Lopes ◽

Luís A. Alexandre ◽

S. Mogo

Keyword(s):

Low Power ◽

Real Time ◽

Object Classification ◽

Semantic Segmentation ◽

Detection Algorithm ◽

Power Device ◽

Indoor Environments ◽

State Classification ◽

Segmentation Algorithms ◽

Indoor Spaces

AbstractIn this paper, we propose three methods for door state classification with the goal to improve robot navigation in indoor spaces. These methods were also developed to be used in other areas and applications since they are not limited to door detection as other related works are. Our methods work offline, in low-powered computers as the Jetson Nano, in real-time with the ability to differentiate between open, closed and semi-open doors. We use the 3D object classification, PointNet, real-time semantic segmentation algorithms such as, FastFCN, FC-HarDNet, SegNet and BiSeNet, the object detection algorithm, DetectNet and 2D object classification networks, AlexNet and GoogleNet. We built a 3D and RGB door dataset with images from several indoor environments using a 3D Realsense camera D435. This dataset is freely available online. All methods are analysed taking into account their accuracy and the speed of the algorithm in a low powered computer. We conclude that it is possible to have a door classification algorithm running in real-time on a low-power device.

Download Full-text

RSNet: Rail semantic segmentation network for extracting aerial railroad images

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210349 ◽

2021 ◽

pp. 1-18

Author(s):

R.S. Rampriya ◽

Sabarinathan ◽

R. Suganya

Keyword(s):

Real Time ◽

Visual Processing ◽

Feature Fusion ◽

Semantic Segmentation ◽

Vital Role ◽

Obstacle Detection ◽

Aerial Images ◽

Computationally Efficient ◽

Fusion Algorithm ◽

Uav Images

In the near future, combo of UAV (Unmanned Aerial Vehicle) and computer vision will play a vital role in monitoring the condition of the railroad periodically to ensure passenger safety. The most significant module involved in railroad visual processing is obstacle detection, in which caution is obstacle fallen near track gage inside or outside. This leads to the importance of detecting and segment the railroad as three key regions, such as gage inside, rails, and background. Traditional railroad segmentation methods depend on either manual feature selection or expensive dedicated devices such as Lidar, which is typically less reliable in railroad semantic segmentation. Also, cameras mounted on moving vehicles like a drone can produce high-resolution images, so segmenting precise pixel information from those aerial images has been challenging due to the railroad surroundings chaos. RSNet is a multi-level feature fusion algorithm for segmenting railroad aerial images captured by UAV and proposes an attention-based efficient convolutional encoder for feature extraction, which is robust and computationally efficient and modified residual decoder for segmentation which considers only essential features and produces less overhead with higher performance even in real-time railroad drone imagery. The network is trained and tested on a railroad scenic view segmentation dataset (RSSD), which we have built from real-time UAV images and achieves 0.973 dice coefficient and 0.94 jaccard on test data that exhibits better results compared to the existing approaches like a residual unit and residual squeeze net.

Download Full-text