Real-Time Semantic Segmentation Algorithm Based on Feature Fusion Technology

2020 ◽  
Vol 57 (2) ◽  
pp. 021011
Author(s):  
蔡雨 Cai Yu ◽  
黄学功 Huang Xuegong ◽  
张志安 Zhang Zhian ◽  
朱新年 Zhu Xinnian ◽  
马祥 Ma Xiang
2021 ◽  
pp. 1-18
Author(s):  
R.S. Rampriya ◽  
Sabarinathan ◽  
R. Suganya

In the near future, combo of UAV (Unmanned Aerial Vehicle) and computer vision will play a vital role in monitoring the condition of the railroad periodically to ensure passenger safety. The most significant module involved in railroad visual processing is obstacle detection, in which caution is obstacle fallen near track gage inside or outside. This leads to the importance of detecting and segment the railroad as three key regions, such as gage inside, rails, and background. Traditional railroad segmentation methods depend on either manual feature selection or expensive dedicated devices such as Lidar, which is typically less reliable in railroad semantic segmentation. Also, cameras mounted on moving vehicles like a drone can produce high-resolution images, so segmenting precise pixel information from those aerial images has been challenging due to the railroad surroundings chaos. RSNet is a multi-level feature fusion algorithm for segmenting railroad aerial images captured by UAV and proposes an attention-based efficient convolutional encoder for feature extraction, which is robust and computationally efficient and modified residual decoder for segmentation which considers only essential features and produces less overhead with higher performance even in real-time railroad drone imagery. The network is trained and tested on a railroad scenic view segmentation dataset (RSSD), which we have built from real-time UAV images and achieves 0.973 dice coefficient and 0.94 jaccard on test data that exhibits better results compared to the existing approaches like a residual unit and residual squeeze net.


2021 ◽  
Author(s):  
Wei Bai

Abstract Image semantic segmentation is one of the core tasks of computer vision. It is widely used in fields such as unmanned driving, medical image processing, geographic information systems and intelligent robots. Aiming at the problem that the existing semantic segmentation algorithm ignores the different channel and location features of the feature map and the simple method when the feature map is fused, this paper designs a semantic segmentation algorithm that combines the attention mechanism. Firstly, dilated convolution is used, and a smaller downsampling factor is used to maintain the resolution of the image and obtain the detailed information of the image. Secondly, the attention mechanism module is introduced to assign weights to different parts of the feature map, which reduces the accuracy loss. The design feature fusion module assigns weights to the feature maps of different receptive fields obtained by the two paths, and merges them together to obtain the final segmentation result. Finally, through experiments, it was verified on the Camvid, Cityscapes and PASCAL VOC2012 datasets. Mean intersection over union (MIoU) and mean pixel accuracy (MPA) are used as metrics. The method in this paper can make up for the loss of accuracy caused by downsampling while ensuring the receptive field and improving the resolution, which can better guide the model learning. And the proposed feature fusion module can better integrate the features of different receptive fields. Therefore, the proposed method can significantly improve the segmentation performance compared to the traditional method.


2020 ◽  
Vol 6 (6) ◽  
pp. 50
Author(s):  
Anthony Cioppa ◽  
Marc Braham ◽  
Marc Van Droogenbroeck

The method of Semantic Background Subtraction (SBS), which combines semantic segmentation and background subtraction, has recently emerged for the task of segmenting moving objects in video sequences. While SBS has been shown to improve background subtraction, a major difficulty is that it combines two streams generated at different frame rates. This results in SBS operating at the slowest frame rate of the two streams, usually being the one of the semantic segmentation algorithm. We present a method, referred to as “Asynchronous Semantic Background Subtraction” (ASBS), able to combine a semantic segmentation algorithm with any background subtraction algorithm asynchronously. It achieves performances close to that of SBS while operating at the fastest possible frame rate, being the one of the background subtraction algorithm. Our method consists in analyzing the temporal evolution of pixel features to possibly replicate the decisions previously enforced by semantics when no semantic information is computed. We showcase ASBS with several background subtraction algorithms and also add a feedback mechanism that feeds the background model of the background subtraction algorithm to upgrade its updating strategy and, consequently, enhance the decision. Experiments show that we systematically improve the performance, even when the semantic stream has a much slower frame rate than the frame rate of the background subtraction algorithm. In addition, we establish that, with the help of ASBS, a real-time background subtraction algorithm, such as ViBe, stays real time and competes with some of the best non-real-time unsupervised background subtraction algorithms such as SuBSENSE.


Sensors ◽  
2020 ◽  
Vol 20 (24) ◽  
pp. 7089
Author(s):  
Bushi Liu ◽  
Yongbo Lv ◽  
Yang Gu ◽  
Wanjun Lv

Due to deep learning’s accurate cognition of the street environment, the convolutional neural network has achieved dramatic development in the application of street scenes. Considering the needs of autonomous driving and assisted driving, in a general way, computer vision technology is used to find obstacles to avoid collisions, which has made semantic segmentation a research priority in recent years. However, semantic segmentation has been constantly facing new challenges for quite a long time. Complex network depth information, large datasets, real-time requirements, etc., are typical problems that need to be solved urgently in the realization of autonomous driving technology. In order to address these problems, we propose an improved lightweight real-time semantic segmentation network, which is based on an efficient image cascading network (ICNet) architecture, using multi-scale branches and a cascaded feature fusion unit to extract rich multi-level features. In this paper, a spatial information network is designed to transmit more prior knowledge of spatial location and edge information. During the course of the training phase, we append an external loss function to enhance the learning process of the deep learning network system as well. This lightweight network can quickly perceive obstacles and detect roads in the drivable area from images to satisfy autonomous driving characteristics. The proposed model shows substantial performance on the Cityscapes dataset. With the premise of ensuring real-time performance, several sets of experimental comparisons illustrate that SP-ICNet enhances the accuracy of road obstacle detection and provides nearly ideal prediction outputs. Compared to the current popular semantic segmentation network, this study also demonstrates the effectiveness of our lightweight network for road obstacle detection in autonomous driving.


Author(s):  
Houcheng Su ◽  
Bin Lin ◽  
Xiaoshuang Huang ◽  
Jiao Li ◽  
Kailin Jiang ◽  
...  

Colonoscopy is currently one of the main methods for the detection of rectal polyps, rectal cancer, and other diseases. With the rapid development of computer vision, deep learning–based semantic segmentation methods can be applied to the detection of medical lesions. However, it is challenging for current methods to detect polyps with high accuracy and real-time performance. To solve this problem, we propose a multi-branch feature fusion network (MBFFNet), which is an accurate real-time segmentation method for detecting colonoscopy. First, we use UNet as the basis of our model architecture and adopt stepwise sampling with channel multiplication to integrate features, which decreases the number of flops caused by stacking channels in UNet. Second, to improve model accuracy, we extract features from multiple layers and resize feature maps to the same size in different ways, such as up-sampling and pooling, to supplement information lost in multiplication-based up-sampling. Based on mIOU and Dice loss with cross entropy (CE), we conduct experiments in both CPU and GPU environments to verify the effectiveness of our model. The experimental results show that our proposed MBFFNet is superior to the selected baselines in terms of accuracy, model size, and flops. mIOU, F score, and Dice loss with CE reached 0.8952, 0.9450, and 0.1602, respectively, which were better than those of UNet, UNet++, and other networks. Compared with UNet, the flop count decreased by 73.2%, and the number of participants also decreased. The actual segmentation effect of MBFFNet is only lower than that of PraNet, the number of parameters is 78.27% of that of PraNet, and the flop count is 0.23% that of PraNet. In addition, experiments on other types of medical tasks show that MBFFNet has good potential for general application in medical image segmentation.


2021 ◽  
pp. 193-205
Author(s):  
Tanmay Singha ◽  
Duc-Son Pham ◽  
Aneesh Krishna ◽  
Tom Gedeon

Sensors ◽  
2020 ◽  
Vol 20 (21) ◽  
pp. 6264
Author(s):  
Xinyuan Tu ◽  
Jian Zhang ◽  
Runhao Luo ◽  
Kai Wang ◽  
Qingji Zeng ◽  
...  

We present a real-time Truncated Signed Distance Field (TSDF)-based three-dimensional (3D) semantic reconstruction for LiDAR point cloud, which achieves incremental surface reconstruction and highly accurate semantic segmentation. The high-precise 3D semantic reconstruction in real time on LiDAR data is important but challenging. Lighting Detection and Ranging (LiDAR) data with high accuracy is massive for 3D reconstruction. We so propose a line-of-sight algorithm to update implicit surface incrementally. Meanwhile, in order to use more semantic information effectively, an online attention-based spatial and temporal feature fusion method is proposed, which is well integrated into the reconstruction system. We implement parallel computation in the reconstruction and semantic fusion process, which achieves real-time performance. We demonstrate our approach on the CARLA dataset, Apollo dataset, and our dataset. When compared with the state-of-art mapping methods, our method has a great advantage in terms of both quality and speed, which meets the needs of robotic mapping and navigation.


Sign in / Sign up

Export Citation Format

Share Document