A Visual SLAM Robust against Dynamic Objects Based on Hybrid Semantic-Geometry Information

A visual localization approach for dynamic objects based on hybrid semantic-geometry information is presented. Due to the interference of moving objects in the real environment, the traditional simultaneous localization and mapping (SLAM) system can be corrupted. To address this problem, we propose a method for static/dynamic image segmentation that leverages semantic and geometric modules, including optical flow residual clustering, epipolar constraint checks, semantic segmentation, and outlier elimination. We integrated the proposed approach into the state-of-the-art ORB-SLAM2 and evaluated its performance on both public datasets and a quadcopter platform. Experimental results demonstrated that the root-mean-square error of the absolute trajectory error improved, on average, by 93.63% in highly dynamic benchmarks when compared with ORB-SLAM2. Thus, the proposed method can improve the performance of state-of-the-art SLAM systems in challenging scenarios.

Download Full-text

Y-Net: Dual-branch Joint Network for Semantic Segmentation

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3460940 ◽

2021 ◽

Vol 17 (4) ◽

pp. 1-22

Author(s):

Yizhen Chen ◽

Haifeng Hu

Keyword(s):

Feature Vector ◽

State Of The Art ◽

Computational Cost ◽

Receptive Fields ◽

Semantic Segmentation ◽

Global Context ◽

Multi Level ◽

The One ◽

Public Datasets ◽

High Level

Most existing segmentation networks are built upon a “ U -shaped” encoder–decoder structure, where the multi-level features extracted by the encoder are gradually aggregated by the decoder. Although this structure has been proven to be effective in improving segmentation performance, there are two main drawbacks. On the one hand, the introduction of low-level features brings a significant increase in calculations without an obvious performance gain. On the other hand, general strategies of feature aggregation such as addition and concatenation fuse features without considering the usefulness of each feature vector, which mixes the useful information with massive noises. In this article, we abandon the traditional “ U -shaped” architecture and propose Y-Net, a dual-branch joint network for accurate semantic segmentation. Specifically, it only aggregates the high-level features with low-resolution and utilizes the global context guidance generated by the first branch to refine the second branch. The dual branches are effectively connected through a Semantic Enhancing Module, which can be regarded as the combination of spatial attention and channel attention. We also design a novel Channel-Selective Decoder (CSD) to adaptively integrate features from different receptive fields by assigning specific channelwise weights, where the weights are input-dependent. Our Y-Net is capable of breaking through the limit of singe-branch network and attaining higher performance with less computational cost than “ U -shaped” structure. The proposed CSD can better integrate useful information and suppress interference noises. Comprehensive experiments are carried out on three public datasets to evaluate the effectiveness of our method. Eventually, our Y-Net achieves state-of-the-art performance on PASCAL VOC 2012, PASCAL Person-Part, and ADE20K dataset without pre-training on extra datasets.

Download Full-text

Moving Object Tracking for SLAM-based Augmented Reality

Journal of Mobile Multimedia ◽

10.13052/jmm1550-4646.1745 ◽

2021 ◽

Author(s):

Douglas Coelho Braga de Oliveira ◽

Rodrigo Luis de Souza da Silva

Keyword(s):

Augmented Reality ◽

Moving Objects ◽

Recent Literature ◽

State Of The Art ◽

Recognition System ◽

Virtual Object ◽

Total Occlusion ◽

Optimization Framework ◽

Localization And Mapping ◽

Potential Applications

Augmented Reality (AR) systems based on the Simultaneous Localization and Mapping (SLAM) problem have received much attention in the last few years. SLAM allows AR applications on unprepared environments, i.e., without markers. However, by eliminating the marker object, we lose the referential for virtual object projection and the main source of interaction between real and virtual elements. In the recent literature, we found works that integrate an object recognition system to the SLAM in a way the objects are incorporated into the map. In this work, we propose a novel optimization framework for an object-aware SLAM system capable of simultaneously estimating the camera and moving objects positioning in the map. In this way, we can combine the advantages of both marker- and SLAM-based methods. We implement our proposed framework over state-of-the-art SLAM software and demonstrate potential applications for AR like the total occlusion of the marker object.

Download Full-text

Monocular Visual-Inertial Navigation for Dynamic Environment

Remote Sensing ◽

10.3390/rs13091610 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1610

Author(s):

Dong Fu ◽

Hao Xia ◽

Yanyou Qiao

Keyword(s):

Moving Objects ◽

Dynamic Environment ◽

Image Feature ◽

Measurement Unit ◽

Feature Points ◽

Location Accuracy ◽

Elimination Algorithm ◽

Localization And Mapping ◽

Stationary Objects ◽

Public Datasets

Simultaneous localization and mapping (SLAM) systems have been generally limited to static environments. Moving objects considerably reduce the location accuracy of SLAM systems, rendering them unsuitable for several applications. Using a combined vision camera and inertial measurement unit (IMU) to separate moving and static objects in dynamic scenes, we improve the location accuracy and adaptability of SLAM systems in these scenes. We develop a moving object-matched feature points elimination algorithm that uses IMU data to eliminate matches on moving objects but retains them on stationary objects. Moreover, we develop a second algorithm to validate the IMU data to avoid erroneous data from influencing image feature points matching. We test the new algorithms with public datasets and in a real-world experiment. In terms of the root mean square error of the location absolute pose error, the proposed method exhibited higher positioning accuracy for the public datasets than the traditional algorithms. Compared with the closed-loop errors obtained by OKVIS-mono and VINS-mono, those obtained in the practical experiment were lower by 50.17% and 56.91%, respectively. Thus, the proposed method eliminates the matching points on moving objects effectively and achieves feature point matching results that are realistic.

Download Full-text

Visual SLAM Based on Dynamic Object Removal

10.36227/techrxiv.11687190.v1 ◽

2020 ◽

Author(s):

Guoliang Liu

Keyword(s):

Moving Objects ◽

State Of The Art ◽

Service Robot ◽

Visual Slam ◽

Dynamic Object ◽

Dynamic Features ◽

Dynamic Scenes ◽

The Core ◽

Localization And Mapping ◽

Strong Hypothesis

Visual simultaneous localization and mapping (SLAM) is the core of intelligent robot navigation system. Many traditional SLAM algorithms assume that the scene is static. When a dynamic object appears in the environment, the accuracy of visual SLAM can degrade due to the interference of dynamic features of moving objects. This strong hypothesis limits the SLAM applications for service robot or driverless car intherealdynamicenvironment.Inthispaper,adynamicobject removal algorithm that combines object recognition and optical ﬂow techniques is proposed in the visual SLAM framework for dynamic scenes. The experimental results show that our new method can detect moving object effectively and improve the SLAM performance compared to the state of the art methods.<br>

Download Full-text

Region-Enhancing Network for Semantic Segmentation of Remote-Sensing Imagery

Sensors ◽

10.3390/s21217316 ◽

2021 ◽

Vol 21 (21) ◽

pp. 7316

Author(s):

Bo Zhong ◽

Jiang Du ◽

Minghao Liu ◽

Aixia Yang ◽

Junjun Wu

Keyword(s):

Remote Sensing ◽

State Of The Art ◽

Semantic Segmentation ◽

The State ◽

Learning Ability ◽

Remote Sensing Imagery ◽

Multi Scale ◽

Context Learning ◽

Learning Procedure ◽

Public Datasets

Semantic segmentation for high-resolution remote-sensing imagery (HRRSI) has become increasingly popular in machine vision in recent years. Most of the state-of-the-art methods for semantic segmentation of HRRSI usually emphasize the strong learning ability of deep convolutional neural network to model the contextual relationship in the image, which takes too much consideration on every pixel in images and subsequently causes the problem of overlearning. Annotation errors and easily confused features can also lead to the confusion problem while using the pixel-based methods. Therefore, we propose a new semantic segmentation network—the region-enhancing network (RE-Net)—to emphasize the regional information instead of pixels to solve the above problems. RE-Net introduces the regional information into the base network, to enhance the regional integrity of images and thus reduce misclassification. Specifically, the regional context learning procedure (RCLP) can learn the context relationship from the perspective of regions. The region correcting procedure (RCP) uses the pixel aggregation feature to recalibrate the pixel features in each region. In addition, another simple intra-network multi-scale attention module is introduced to select features at different scales by the size of the region. A large number of comparative experiments on four different public datasets demonstrate that the proposed RE-Net performs better than most of the state-of-the-art ones.

Download Full-text

Improved simultaneous localization and mapping algorithm combined with semantic segmentation model

International Journal of Distributed Sensor Networks ◽

10.1177/15501477211014131 ◽

2021 ◽

Vol 17 (4) ◽

pp. 155014772110141

Author(s):

Xuerong Cui ◽

Shengjie Xue ◽

Juan Li ◽

Shibao Li ◽

Jianhang Liu ◽

...

Keyword(s):

Simultaneous Localization And Mapping ◽

Semantic Segmentation ◽

Indoor Navigation ◽

Dynamic Object ◽

Dynamic Features ◽

Processing Stage ◽

Convolutional Network ◽

Mapping Algorithm ◽

Localization And Mapping ◽

Dynamic Objects

In the past decades, emerging technologies such as unmanned driving and indoor navigation have developed rapidly, and simultaneous localization and mapping has played unparalleled roles as core technologies. However, dynamic objects in complex environments will affect the positioning accuracy. In order to reduce the influence of dynamic objects, this article proposes an improved simultaneous localization and mapping algorithm combined with semantic segmentation model. First, in the pre-processing stage, in order to reduce the influence of dynamic features, fully convolutional network model is used to find the dynamic object, and then the output image is masked and fused to obtain the final image without dynamic object features. Second, in the feature-processing stage, three parts are improved to reduce the computing complexity, which are extracting, matching, and eliminating mismatching feature points. Experiments show that the absolute trajectory accuracy in high dynamic scene is improved by 48.58% on average. Meanwhile, the average processing time is also reduced by 21.84%.

Download Full-text

Fast and Robust Monocular Visua-Inertial Odometry Using Points and Lines

Sensors ◽

10.3390/s19204545 ◽

2019 ◽

Vol 19 (20) ◽

pp. 4545

Author(s):

Ning Zhang ◽

Yongjia Zhao

Keyword(s):

Measurement Data ◽

Visual Localization ◽

Localization Algorithm ◽

Geometric Information ◽

Localization And Mapping ◽

Slam Algorithm ◽

Public Datasets ◽

Local Map ◽

3D Landmarks ◽

Feature Alignment

When the camera moves quickly and the image is blurred or the texture in the scene is missing, the Simultaneous Localization and Mapping (SLAM) algorithm based on point feature experiences difficulty tracking enough effective feature points, and the positioning accuracy and robustness are poor, and even may not work properly. For this problem, we propose a monocular visual odometry algorithm based on the point and line features and combining IMU measurement data. Based on this, an environmental-feature map with geometric information is constructed, and the IMU measurement data is incorporated to provide prior and scale information for the visual localization algorithm. Then, the initial pose estimation is obtained based on the motion estimation of the sparse image alignment, and the feature alignment is further performed to obtain the sub-pixel level feature correlation. Finally, more accurate poses and 3D landmarks are obtained by minimizing the re-projection errors of local map points and lines. The experimental results on EuRoC public datasets show that the proposed algorithm outperforms the Open Keyframe-based Visual-Inertial SLAM (OKVIS-mono) algorithm and Oriented FAST and Rotated BRIEF-SLAM (ORB-SLAM) algorithm, which demonstrates the accuracy and speed of the algorithm.

Download Full-text

Visual SLAM Based on Dynamic Object Removal

10.36227/techrxiv.11687190 ◽

2020 ◽

Author(s):

Guoliang Liu

Keyword(s):

Moving Objects ◽

State Of The Art ◽

Service Robot ◽

Visual Slam ◽

Dynamic Object ◽

Dynamic Features ◽

Dynamic Scenes ◽

The Core ◽

Localization And Mapping ◽

Strong Hypothesis

Download Full-text

Reliable Visual Exploration System with Fault Tolerance Structure

Applied Sciences ◽

10.3390/app9040662 ◽

2019 ◽

Vol 9 (4) ◽

pp. 662

Author(s):

Weinan Chen ◽

Lei Zhu ◽

Li He ◽

Yisheng Guan ◽

Hong Zhang

Keyword(s):

Fault Tolerance ◽

Visual Tracking ◽

Evaluation Method ◽

Semantic Segmentation ◽

Visual Exploration ◽

Visual Localization ◽

Challenging Problem ◽

Visual Mapping ◽

Localization And Mapping ◽

Failure Conditions

Reliability of visual tracking and mapping is a challenging problem in robotics research, and it limits the promotion of vision-based mobile robot applications to a great extent. In this paper, we propose to improve the reliability of visual exploration in terms of its fault tolerance. Three modules are involved in our visual exploration system: visual localization and mapping, active controller and termination condition. High maintainability of mapping is obtained by the submap-based visual mapping module, persistent driving is achieved by a semantic segmentation based active controller, and robustness of re-localization is guaranteed by a novel completeness evaluation method in the termination condition. All the modules are integrated tightly for maintaining mapping and improving visual tracking. The system is verified with simulations and real world experiments, and all the solutions to fault tolerance are verified to overcome the failure conditions of visual tracking and mapping.

Download Full-text

SketchGNN: Semantic Sketch Segmentation with Graph Neural Networks

ACM Transactions on Graphics ◽

10.1145/3450284 ◽

2021 ◽

Vol 40 (3) ◽

pp. 1-13

Author(s):

Lumin Yang ◽

Jiajie Zhuang ◽

Hongbo Fu ◽

Xiangzhi Wei ◽

Kun Zhou ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Network Architecture ◽

Large Scale ◽

State Of The Art ◽

Semantic Segmentation ◽

Structure Information ◽

Graph Neural Networks ◽

Node Labels ◽

Point Level

We introduce SketchGNN , a convolutional graph neural network for semantic segmentation and labeling of freehand vector sketches. We treat an input stroke-based sketch as a graph with nodes representing the sampled points along input strokes and edges encoding the stroke structure information. To predict the per-node labels, our SketchGNN uses graph convolution and a static-dynamic branching network architecture to extract the features at three levels, i.e., point-level, stroke-level, and sketch-level. SketchGNN significantly improves the accuracy of the state-of-the-art methods for semantic sketch segmentation (by 11.2% in the pixel-based metric and 18.2% in the component-based metric over a large-scale challenging SPG dataset) and has magnitudes fewer parameters than both image-based and sequence-based methods.

Download Full-text