scholarly journals Convolutional Neural Networks-Based Object Detection Algorithm by Jointing Semantic Segmentation for Images

Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5080
Author(s):  
Baohua Qiang ◽  
Ruidong Chen ◽  
Mingliang Zhou ◽  
Yuanchao Pang ◽  
Yijie Zhai ◽  
...  

In recent years, increasing image data comes from various sensors, and object detection plays a vital role in image understanding. For object detection in complex scenes, more detailed information in the image should be obtained to improve the accuracy of detection task. In this paper, we propose an object detection algorithm by jointing semantic segmentation (SSOD) for images. First, we construct a feature extraction network that integrates the hourglass structure network with the attention mechanism layer to extract and fuse multi-scale features to generate high-level features with rich semantic information. Second, the semantic segmentation task is used as an auxiliary task to allow the algorithm to perform multi-task learning. Finally, multi-scale features are used to predict the location and category of the object. The experimental results show that our algorithm substantially enhances object detection performance and consistently outperforms other three comparison algorithms, and the detection speed can reach real-time, which can be used for real-time detection.

2019 ◽  
Vol 277 ◽  
pp. 02005
Author(s):  
Ning Feng ◽  
Le Dong ◽  
Qianni Zhang ◽  
Ning Zhang ◽  
Xi Wu ◽  
...  

Real-time semantic segmentation has become crucial in many applications such as medical image analysis and autonomous driving. In this paper, we introduce a single semantic segmentation network, called DNS, for joint object detection and segmentation task. We take advantage of multi-scale deconvolution mechanism to perform real time computations. To this goal, down-scale and up-scale streams are utilized to combine the multi-scale features for the final detection and segmentation task. By using the proposed DNS, not only the tradeoff between accuracy and cost but also the balance of detection and segmentation performance are settled. Experimental results for PASCAL VOC datasets show competitive performance for joint object detection and segmentation task.


2021 ◽  
Vol 2082 (1) ◽  
pp. 012012
Author(s):  
Xu Zhang ◽  
Fang Han ◽  
Ping Wang ◽  
Wei Jiang ◽  
Chen Wang

Abstract Feature pyramids have become an essential component in most modern object detectors, such as Mask RCNN, YOLOv3, RetinaNet. In these detectors, the pyramidal feature representations are commonly used which represent an image with multi-scale feature layers. However, the detectors can’t be used in many real world applications which require real time performance under a computationally limited circumstance. In the paper, we study network architecture in YOLOv3 and modify the classical backbone--darknet53 of YOLOv3 by using a group of convolutions and dilated convolutions (DC). Then, a novel one-stage object detection network framework called DC-YOLOv3 is proposed. A lot of experiments on the Pascal 2017 benchmark prove the effectiveness of our framework. The results illustrate that DC-YOLOv3 achieves comparable results with YOLOv3 while being about 1.32× faster in training time and 1.38× faster in inference time.


2021 ◽  
Vol 13 (12) ◽  
pp. 307
Author(s):  
Vijayakumar Varadarajan ◽  
Dweepna Garg ◽  
Ketan Kotecha

Deep learning is a relatively new branch of machine learning in which computers are taught to recognize patterns in massive volumes of data. It primarily describes learning at various levels of representation, which aids in understanding data that includes text, voice, and visuals. Convolutional neural networks have been used to solve challenges in computer vision, including object identification, image classification, semantic segmentation and a lot more. Object detection in videos involves confirming the presence of the object in the image or video and then locating it accurately for recognition. In the video, modelling techniques suffer from high computation and memory costs, which may decrease performance measures such as accuracy and efficiency to identify the object accurately in real-time. The current object detection technique based on a deep convolution neural network requires executing multilevel convolution and pooling operations on the entire image to extract deep semantic properties from it. For large objects, detection models can provide superior results; however, those models fail to detect the varying size of the objects that have low resolution and are greatly influenced by noise because the features after the repeated convolution operations of existing models do not fully represent the essential characteristics of the objects in real-time. With the help of a multi-scale anchor box, the proposed approach reported in this paper enhances the detection accuracy by extracting features at multiple convolution levels of the object. The major contribution of this paper is to design a model to understand better the parameters and the hyper-parameters which affect the detection and the recognition of objects of varying sizes and shapes, and to achieve real-time object detection and recognition speeds by improving accuracy. The proposed model has achieved 84.49 mAP on the test set of the Pascal VOC-2007 dataset at 11 FPS, which is comparatively better than other real-time object detection models.


Algorithms ◽  
2019 ◽  
Vol 12 (10) ◽  
pp. 205 ◽  
Author(s):  
Chan Zeng ◽  
Junfeng Zheng ◽  
Jiangyun Li

The conveyor belt is an indispensable piece of conveying equipment for a mine whose deviation caused by roller sticky material and uneven load distribution is the most common failure during operation. In this paper, a real-time conveyor belt detection algorithm based on a multi-scale feature fusion network is proposed, which mainly includes two parts: the feature extraction module and the deviation detection module. The feature extraction module uses a multi-scale feature fusion network structure to fuse low-level features with rich position and detail information and high-level features with stronger semantic information to improve network detection performance. Depthwise separable convolutions are used to achieve real-time detection. The deviation detection module identifies and monitors the deviation fault by calculating the offset of conveyor belt. In particular, a new weighted loss function is designed to optimize the network and to improve the detection effect of the conveyor belt edge. In order to evaluate the effectiveness of the proposed method, the Canny algorithm, FCNs, UNet and Deeplab v3 networks are selected for comparison. The experimental results show that the proposed algorithm achieves 78.92% in terms of pixel accuracy (PA), and reaches 13.4 FPS (Frames per Second) with the error of less than 3.2 mm, which outperforms the other four algorithms.


Electronics ◽  
2020 ◽  
Vol 9 (10) ◽  
pp. 1702
Author(s):  
Guangyu Ren ◽  
Tianhong Dai ◽  
Panagiotis Barmpoutis ◽  
Tania Stathaki

Salient object detection has achieved great improvements by using the Fully Convolutional Networks (FCNs). However, the FCN-based U-shape architecture may cause dilution problems in the high-level semantic information during the up-sample operations in the top-down pathway. Thus, it can weaken the ability of salient object localization and produce degraded boundaries. To this end, in order to overcome this limitation, we propose a novel pyramid self-attention module (PSAM) and the adoption of an independent feature-complementing strategy. In PSAM, self-attention layers are equipped after multi-scale pyramid features to capture richer high-level features and bring larger receptive fields to the model. In addition, a channel-wise attention module is also employed to reduce the redundant features of the FPN and provide refined results. Experimental analysis demonstrates that the proposed PSAM effectively contributes to the whole model so that it outperforms state-of-the-art results over five challenging datasets. Finally, quantitative results show that PSAM generates accurate predictions and integral salient maps, which can provide further help to other computer vision tasks, such as object detection and semantic segmentation.


2021 ◽  
Vol 3 (5) ◽  
Author(s):  
João Gaspar Ramôa ◽  
Vasco Lopes ◽  
Luís A. Alexandre ◽  
S. Mogo

AbstractIn this paper, we propose three methods for door state classification with the goal to improve robot navigation in indoor spaces. These methods were also developed to be used in other areas and applications since they are not limited to door detection as other related works are. Our methods work offline, in low-powered computers as the Jetson Nano, in real-time with the ability to differentiate between open, closed and semi-open doors. We use the 3D object classification, PointNet, real-time semantic segmentation algorithms such as, FastFCN, FC-HarDNet, SegNet and BiSeNet, the object detection algorithm, DetectNet and 2D object classification networks, AlexNet and GoogleNet. We built a 3D and RGB door dataset with images from several indoor environments using a 3D Realsense camera D435. This dataset is freely available online. All methods are analysed taking into account their accuracy and the speed of the algorithm in a low powered computer. We conclude that it is possible to have a door classification algorithm running in real-time on a low-power device.


2021 ◽  
pp. 1-18
Author(s):  
R.S. Rampriya ◽  
Sabarinathan ◽  
R. Suganya

In the near future, combo of UAV (Unmanned Aerial Vehicle) and computer vision will play a vital role in monitoring the condition of the railroad periodically to ensure passenger safety. The most significant module involved in railroad visual processing is obstacle detection, in which caution is obstacle fallen near track gage inside or outside. This leads to the importance of detecting and segment the railroad as three key regions, such as gage inside, rails, and background. Traditional railroad segmentation methods depend on either manual feature selection or expensive dedicated devices such as Lidar, which is typically less reliable in railroad semantic segmentation. Also, cameras mounted on moving vehicles like a drone can produce high-resolution images, so segmenting precise pixel information from those aerial images has been challenging due to the railroad surroundings chaos. RSNet is a multi-level feature fusion algorithm for segmenting railroad aerial images captured by UAV and proposes an attention-based efficient convolutional encoder for feature extraction, which is robust and computationally efficient and modified residual decoder for segmentation which considers only essential features and produces less overhead with higher performance even in real-time railroad drone imagery. The network is trained and tested on a railroad scenic view segmentation dataset (RSSD), which we have built from real-time UAV images and achieves 0.973 dice coefficient and 0.94 jaccard on test data that exhibits better results compared to the existing approaches like a residual unit and residual squeeze net.


2021 ◽  
Author(s):  
Kangning Yin ◽  
Jie Liang ◽  
Shaoqi Hou ◽  
Rui Zhu ◽  
Guangqiang Yin ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document