Elimination of Unnecessary Feature Points by Semantic Segmentation Aimed at Improving the Accuracy of Feature-Based SLAM

Author(s):  
Tomoki HAKOTANI ◽  
Ryota SAKUMA ◽  
Masanao KOEDA ◽  
Akihiro HAMADA ◽  
Atsuro SAWADA ◽  
...  
2020 ◽  
Vol 9 (4) ◽  
pp. 202
Author(s):  
Junhao Cheng ◽  
Zhi Wang ◽  
Hongyan Zhou ◽  
Li Li ◽  
Jian Yao

Most Simultaneous Localization and Mapping (SLAM) methods assume that environments are static. Such a strong assumption limits the application of most visual SLAM systems. The dynamic objects will cause many wrong data associations during the SLAM process. To address this problem, a novel visual SLAM method that follows the pipeline of feature-based methods called DM-SLAM is proposed in this paper. DM-SLAM combines an instance segmentation network with optical flow information to improve the location accuracy in dynamic environments, which supports monocular, stereo, and RGB-D sensors. It consists of four modules: semantic segmentation, ego-motion estimation, dynamic point detection and a feature-based SLAM framework. The semantic segmentation module obtains pixel-wise segmentation results of potentially dynamic objects, and the ego-motion estimation module calculates the initial pose. In the third module, two different strategies are presented to detect dynamic feature points for RGB-D/stereo and monocular cases. In the first case, the feature points with depth information are reprojected to the current frame. The reprojection offset vectors are used to distinguish the dynamic points. In the other case, we utilize the epipolar constraint to accomplish this task. Furthermore, the static feature points left are fed into the fourth module. The experimental results on the public TUM and KITTI datasets demonstrate that DM-SLAM outperforms the standard visual SLAM baselines in terms of accuracy in highly dynamic environments.


2018 ◽  
Vol 10 (6) ◽  
pp. 964 ◽  
Author(s):  
Zhenfeng Shao ◽  
Ke Yang ◽  
Weixun Zhou

Benchmark datasets are essential for developing and evaluating remote sensing image retrieval (RSIR) approaches. However, most of the existing datasets are single-labeled, with each image in these datasets being annotated by a single label representing the most significant semantic content of the image. This is sufficient for simple problems, such as distinguishing between a building and a beach, but multiple labels and sometimes even dense (pixel) labels are required for more complex problems, such as RSIR and semantic segmentation.We therefore extended the existing multi-labeled dataset collected for multi-label RSIR and presented a dense labeling remote sensing dataset termed "DLRSD". DLRSD contained a total of 17 classes, and the pixels of each image were assigned with 17 pre-defined labels. We used DLRSD to evaluate the performance of RSIR methods ranging from traditional handcrafted feature-based methods to deep learning-based ones. More specifically, we evaluated the performances of RSIR methods from both single-label and multi-label perspectives. These results demonstrated the advantages of multiple labels over single labels for interpreting complex remote sensing images. DLRSD provided the literature a benchmark for RSIR and other pixel-based problems such as semantic segmentation.


2008 ◽  
Vol 38 (9) ◽  
pp. 945-961 ◽  
Author(s):  
Ho Lee ◽  
Jeongjin Lee ◽  
Namkug Kim ◽  
Sang Joon Kim ◽  
Yeong Gil Shin

Author(s):  
D. Tosic ◽  
S. Tuttas ◽  
L. Hoegner ◽  
U. Stilla

<p><strong>Abstract.</strong> This work proposes an approach for semantic classification of an outdoor-scene point cloud acquired with a high precision Mobile Mapping System (MMS), with major goal to contribute to the automatic creation of High Definition (HD) Maps. The automatic point labeling is achieved by utilizing the combination of a feature-based approach for semantic classification of point clouds and a deep learning approach for semantic segmentation of images. Both, point cloud data, as well as the data from a multi-camera system are used for gaining spatial information in an urban scene. Two types of classification applied for this task are: 1) Feature-based approach, in which the point cloud is organized into a supervoxel structure for capturing geometric characteristics of points. Several geometric features are then extracted for appropriate representation of the local geometry, followed by removing the effect of local tendency for each supervoxel to enhance the distinction between similar structures. And lastly, the Random Forests (RF) algorithm is applied in the classification phase, for assigning labels to supervoxels and therefore to points within them. 2) The deep learning approach is employed for semantic segmentation of MMS images of the same scene. To achieve this, an implementation of Pyramid Scene Parsing Network is used. Resulting segmented images with each pixel containing a class label are then projected onto the point cloud, enabling label assignment for each point. At the end, experiment results are presented from a complex urban scene and the performance of this method is evaluated on a manually labeled dataset, for the deep learning and feature-based classification individually, as well as for the result of the labels fusion. The achieved overall accuracy with fusioned output is 0.87 on the final test set, which significantly outperforms the results of individual methods on the same point cloud. The labeled data is published on the TUM-PF Semantic-Labeling-Benchmark.</p>


2021 ◽  
pp. 335-344
Author(s):  
Yusong Chen ◽  
Changxing Geng ◽  
Yong Wang ◽  
Guofeng Zhu ◽  
Renyuan Shen

For the extraction of paddy rice seedling centerline, this study proposed a method based on Fast-SCNN (Fast Segmentation Convolutional Neural Network) semantic segmentation network. By training the FAST-SCNN network, the optimal model was selected to separate the seedling from the picture. Feature points were extracted using the FAST (Features from Accelerated Segment Test) corner detection algorithm after the pre-processing of original images. All the outer contours of the segmentation results were extracted, and feature point classification was carried out based on the extracted outer contour. For each class of points, Hough transformation based on known points was used to fit the seedling row centerline. It has been verified by experiments that this algorithm has high robustness in each period within three weeks after transplanting. In a 1280×1024-pixel PNG format color image, the accuracy of this algorithm is 95.9% and the average time of each frame is 158ms, which meets the real-time requirement of visual navigation in paddy field.


Robotica ◽  
2014 ◽  
Vol 34 (9) ◽  
pp. 1923-1947 ◽  
Author(s):  
Salam Dhou ◽  
Yuichi Motai

SUMMARYAn efficient method for tracking a target using a single Pan-Tilt-Zoom (PTZ) camera is proposed. The proposed Scale-Invariant Optical Flow (SIOF) method estimates the motion of the target and rotates the camera accordingly to keep the target at the center of the image. Also, SIOF estimates the scale of the target and changes the focal length relatively to adjust the Field of View (FoV) and keep the target appear in the same size in all captured frames. SIOF is a feature-based tracking method. Feature points used are extracted and tracked using Optical Flow (OF) and Scale-Invariant Feature Transform (SIFT). They are combined in groups and used to achieve robust tracking. The feature points in these groups are used within a twist model to recover the 3D free motion of the target. The merits of this proposed method are (i) building an efficient scale-invariant tracking method that tracks the target and keep it in the FoV of the camera with the same size, and (ii) using tracking with prediction and correction to speed up the PTZ control and achieve smooth camera control. Experimental results were performed on online video streams and validated the efficiency of the proposed method SIOF, comparing with OF, SIFT, and other tracking methods. The proposed SIOF has around 36% less average tracking error and around 70% less tracking overshoot than OF.


2017 ◽  
Vol 2017 ◽  
pp. 1-15 ◽  
Author(s):  
Roziana Ramli ◽  
Mohd Yamani Idna Idris ◽  
Khairunnisa Hasikin ◽  
Noor Khairiah A. Karim ◽  
Ainuddin Wahid Abdul Wahab ◽  
...  

Retinal image registration is important to assist diagnosis and monitor retinal diseases, such as diabetic retinopathy and glaucoma. However, registering retinal images for various registration applications requires the detection and distribution of feature points on the low-quality region that consists of vessels of varying contrast and sizes. A recent feature detector known as Saddle detects feature points on vessels that are poorly distributed and densely positioned on strong contrast vessels. Therefore, we propose a multiresolution difference of Gaussian pyramid with Saddle detector (D-Saddle) to detect feature points on the low-quality region that consists of vessels with varying contrast and sizes. D-Saddle is tested on Fundus Image Registration (FIRE) Dataset that consists of 134 retinal image pairs. Experimental results show that D-Saddle successfully registered 43% of retinal image pairs with average registration accuracy of 2.329 pixels while a lower success rate is observed in other four state-of-the-art retinal image registration methods GDB-ICP (28%), Harris-PIIFD (4%), H-M (16%), and Saddle (16%). Furthermore, the registration accuracy of D-Saddle has the weakest correlation (Spearman) with the intensity uniformity metric among all methods. Finally, the paired t-test shows that D-Saddle significantly improved the overall registration accuracy of the original Saddle.


Sensors ◽  
2021 ◽  
Vol 21 (12) ◽  
pp. 4020
Author(s):  
Keon-woo Park ◽  
Yoo-Jeong Shim ◽  
Myeong-jin Lee

In this paper, we propose a semantic segmentation-based static video stitching method to reduce parallax and misalignment distortion for sports stadium scenes with dynamic foreground objects. First, video frame pairs for stitching are divided into segments of different classes through semantic segmentation. Region-based stitching is performed on matched segment pairs, assuming that segments of the same semantic class are on the same plane. Second, to prevent degradation of the stitching quality of plain or noisy videos, the homography for each matched segment pair is estimated using the temporally consistent feature points. Finally, the stitched video frame is synthesized by stacking the stitched matched segment pairs and the foreground segments to the reference frame plane by descending order of the area. The performance of the proposed method is evaluated by comparing the subjective quality, geometric distortion, and pixel distortion of video sequences stitched using the proposed and conventional methods. The proposed method is shown to reduce parallax and misalignment distortion in segments with plain texture or large parallax, and significantly improve geometric distortion and pixel distortion compared to conventional methods.


2021 ◽  
Vol 11 (23) ◽  
pp. 11201
Author(s):  
Roziana Ramli ◽  
Khairunnisa Hasikin ◽  
Mohd Yamani Idna Idris ◽  
Noor Khairiah A. Karim ◽  
Ainuddin Wahid Abdul Wahab

Feature-based retinal fundus image registration (RIR) technique aligns fundus images according to geometrical transformations estimated between feature point correspondences. To ensure accurate registration, the feature points extracted must be from the retinal vessels and throughout the image. However, noises in the fundus image may resemble retinal vessels in local patches. Therefore, this paper introduces a feature extraction method based on a local feature of retinal vessels (CURVE) that incorporates retinal vessels and noises characteristics to accurately extract feature points on retinal vessels and throughout the fundus image. The CURVE performance is tested on CHASE, DRIVE, HRF and STARE datasets and compared with six feature extraction methods used in the existing feature-based RIR techniques. From the experiment, the feature extraction accuracy of CURVE (86.021%) significantly outperformed the existing feature extraction methods (p ≤ 0.001*). Then, CURVE is paired with a scale-invariant feature transform (SIFT) descriptor to test its registration capability on the fundus image registration (FIRE) dataset. Overall, CURVE-SIFT successfully registered 44.030% of the image pairs while the existing feature-based RIR techniques (GDB-ICP, Harris-PIIFD, Ghassabi’s-SIFT, H-M 16, H-M 17 and D-Saddle-HOG) only registered less than 27.612% of the image pairs. The one-way ANOVA analysis showed that CURVE-SIFT significantly outperformed GDB-ICP (p = 0.007*), Harris-PIIFD, Ghassabi’s-SIFT, H-M 16, H-M 17 and D-Saddle-HOG (p ≤ 0.001*).


Author(s):  
A. Hanel ◽  
U. Stilla

<p><strong>Abstract.</strong> Environment-observing vehicle camera self-calibration using a structure from motion (SfM) algorithm allows calibration over vehicle lifetime without the need of special calibration objects being present in the calibration images. Scene-specific problems with feature-based correspondence search and reconstruction during the SfM pipeline might be caused by critical objects like moving objects, poor-texture objects or reflecting objects and might have negative influence on camera calibration. In this contribution, a method to use semantic road scene knowledge by means of semantic masks for a semantic-guided SfM algorithm is proposed to make the calibration more robust. Semantic masks are used to exclude image parts showing critical objects from feature extraction, whereby semantic knowledge is obtained by semantic segmentation of the road scene images. The proposed method is tested with an image sequence recorded in a suburban road scene. It has been shown that semantic guidance leads to smaller deviations of the estimated interior orientation and distortion parameters from reference values obtained by test field calibration compared to a standard SfM algorithm.</p>


Sign in / Sign up

Export Citation Format

Share Document