Optimal Target Placement Strategy for Improved Pose Estimation Accuracy

Abstract The relative pose estimation of a space noncooperative target is an attractive yet challenging task due to the complexity of the target background and illumination, and the lack of a priori knowledge. Unfortunately, these negative factors have a grave impact on the estimation accuracy and the robustness of filter algorithms. In response, this paper proposes a novel filter algorithm to estimate the relative pose to improve the robustness based on a stereovision system. First, to obtain a coarse relative pose, the weighted total least squares (WTLS) algorithm is adopted to estimate the relative pose based on several feature points. The resulting relative pose is fed into the subsequent filter scheme as observation quantities. Second, the classic Bayes filter is exploited to estimate the relative state except for moment-of-inertia ratios. Additionally, the one-step prediction results are used as feedback for WTLS initialization. The proposed algorithm successfully eliminates the dependency on continuous tracking of several fixed points. Finally, comparison experiments demonstrate that the proposed algorithm presents a better performance in terms of robustness and convergence time.

Download Full-text

A combined local and global structure module for human pose estimation

Journal of Computational Methods in Sciences and Engineering ◽

10.3233/jcm-215210 ◽

2021 ◽

pp. 1-11

Author(s):

Zhihui Yang ◽

Xiangyu Tang ◽

Lijuan Zhang ◽

Zhiling Yang

Keyword(s):

Pose Estimation ◽

Computational Cost ◽

Hybrid Structure ◽

Estimation Accuracy ◽

Structure Model ◽

Data Set ◽

Human Pose ◽

Human Joints ◽

Residual Block ◽

Keypoints Detection

Human pose estimate can be used in action recognition, video surveillance and other fields, which has received a lot of attentions. Since the flexibility of human joints and environmental factors greatly influence pose estimation accuracy, related research is confronted with many challenges. In this paper, we incorporate the pyramid convolution and attention mechanism into the residual block, and introduce a hybrid structure model which synthetically applies the local and global information of the image for the analysis of keypoints detection. In addition, our improved structure model adopts grouped convolution, and the attention module used is lightweight, which will reduce the computational cost of the network. Simulation experiments based on the MS COCO human body keypoints detection data set show that, compared with the Simple Baseline model, our model is similar in parameters and GFLOPs (giga floating-point operations per second), but the performance is better on the detection of accuracy under the multi-person scenes.

Download Full-text

CSA6D: Channel-Spatial Attention Networks for 6D Object Pose Estimation

Cognitive Computation ◽

10.1007/s12559-021-09966-y ◽

2021 ◽

Author(s):

Tao Chen ◽

Dongbing Gu

Keyword(s):

Spatial Attention ◽

Pose Estimation ◽

Estimation Accuracy ◽

Pose Prediction ◽

Attention Networks ◽

Depth Images ◽

Segmented Object ◽

Benchmark Datasets ◽

Object Pose Estimation ◽

Network Channel

Abstract6D object pose estimation plays a crucial role in robotic manipulation and grasping tasks. The aim to estimate the 6D object pose from RGB or RGB-D images is to detect objects and estimate their orientations and translations relative to the given canonical models. RGB-D cameras provide two sensory modalities: RGB and depth images, which could benefit the estimation accuracy. But the exploitation of two different modality sources remains a challenging issue. In this paper, inspired by recent works on attention networks that could focus on important regions and ignore unnecessary information, we propose a novel network: Channel-Spatial Attention Network (CSA6D) to estimate the 6D object pose from RGB-D camera. The proposed CSA6D includes a pre-trained 2D network to segment the interested objects from RGB image. Then it uses two separate networks to extract appearance and geometrical features from RGB and depth images for each segmented object. Two feature vectors for each pixel are stacked together as a fusion vector which is refined by an attention module to generate a aggregated feature vector. The attention module includes a channel attention block and a spatial attention block which can effectively leverage the concatenated embeddings into accurate 6D pose prediction on known objects. We evaluate proposed network on two benchmark datasets YCB-Video dataset and LineMod dataset and the results show it can outperform previous state-of-the-art methods under ADD and ADD-S metrics. Also, the attention map demonstrates our proposed network searches for the unique geometry information as the most likely features for pose estimation. From experiments, we conclude that the proposed network can accurately estimate the object pose by effectively leveraging multi-modality features.

Download Full-text

Learning-Type Anchors-Driven Pose Estimation for the Autolanding Fixed-Wing UAVs

10.36227/techrxiv.16880926.v1 ◽

2021 ◽

Author(s):

Dengqing Tang ◽

Lincheng Shen ◽

Xiaojiao Xiang ◽

Han Zhou ◽

Tianjiang Hu

Keyword(s):

Real Time ◽

Pose Estimation ◽

Stereo Vision ◽

Vision System ◽

Estimation Method ◽

Satellite System ◽

Training Dataset ◽

Estimation Accuracy ◽

The Impact ◽

Outdoor Experiments

<p>We propose a learning-type anchors-driven real-time pose estimation method for the autolanding fixed-wing unmanned aerial vehicle (UAV). The proposed method enables online tracking of both position and attitude by the ground stereo vision system in the Global Navigation Satellite System denied environments. A pipeline of convolutional neural network (CNN)-based UAV anchors detection and anchors-driven UAV pose estimation are employed. To realize robust and accurate anchors detection, we design and implement a Block-CNN architecture to reduce the impact of the outliers. With the basis of the anchors, monocular and stereo vision-based filters are established to update the UAV position and attitude. To expand the training dataset without extra outdoor experiments, we develop a parallel system containing the outdoor and simulated systems with the same configuration. Simulated and outdoor experiments are performed to demonstrate the remarkable pose estimation accuracy improvement compared with the conventional Perspective-N-Points solution. In addition, the experiments also validate the feasibility of the proposed architecture and algorithm in terms of the accuracy and real-time capability requirements for fixed-wing autolanding UAVs.</p>

Download Full-text

Principal Component Analysis For ICP Pose Estimation Of Space Structures

10.32920/ryerson.14656941 ◽

2021 ◽

Author(s):

Lun H. Mark

Keyword(s):

Pose Estimation ◽

Point Cloud ◽

Principal Component ◽

Point Clouds ◽

Optimal Combination ◽

Estimation Accuracy ◽

Space Structures ◽

Complex Objects ◽

Error Norm ◽

Simulation Results

This thesis investigates how geometry of complex objects is related to LIDAR scanning with the Iterative Closest Point (ICP) pose estimation and provides statistical means to assess the pose accuracy. LIDAR scanners have become essential parts of space vision systems for autonomous docking and rendezvous. Principal Componenet Analysis based geometric constraint indices have been found to be strongly related to the pose error norm and the error of each individual degree of freedom. This leads to the development of several strategies for identifying the best view of an object and the optimal combination of localized scanned areas of the object's surface to achieve accurate pose estimation. Also investigated is the possible relation between the ICP pose estimation accuracy and the districution or allocation of the point cloud. The simulation results were validated using point clouds generated by scanning models of Quicksat and a cuboctahedron using Neptec's TriDAR scanner.

Download Full-text

Principal Component Analysis For ICP Pose Estimation Of Space Structures

10.32920/ryerson.14656941.v1 ◽

2021 ◽

Author(s):

Lun H. Mark

Keyword(s):

Pose Estimation ◽

Point Cloud ◽

Principal Component ◽

Point Clouds ◽

Optimal Combination ◽

Estimation Accuracy ◽

Space Structures ◽

Complex Objects ◽

Error Norm ◽

Simulation Results

This thesis investigates how geometry of complex objects is related to LIDAR scanning with the Iterative Closest Point (ICP) pose estimation and provides statistical means to assess the pose accuracy. LIDAR scanners have become essential parts of space vision systems for autonomous docking and rendezvous. Principal Componenet Analysis based geometric constraint indices have been found to be strongly related to the pose error norm and the error of each individual degree of freedom. This leads to the development of several strategies for identifying the best view of an object and the optimal combination of localized scanned areas of the object's surface to achieve accurate pose estimation. Also investigated is the possible relation between the ICP pose estimation accuracy and the districution or allocation of the point cloud. The simulation results were validated using point clouds generated by scanning models of Quicksat and a cuboctahedron using Neptec's TriDAR scanner.

Download Full-text

Wearable Device for High-Speed Hand Pose Estimation with a Ultrasmall Camera

Journal of Robotics and Mechatronics ◽

10.20965/jrm.2015.p0167 ◽

2015 ◽

Vol 27 (2) ◽

pp. 167-173 ◽

Cited By ~ 5

Author(s):

Motomasa Tomida ◽

◽

Kiyoshi Hoshino

Keyword(s):

Pose Estimation ◽

High Speed ◽

Computational Cost ◽

Wearable Device ◽

Estimation Accuracy ◽

Hand Pose Estimation ◽

Data Matching ◽

Pip Joint ◽

Image Characteristic ◽

Hand Pose

<div class=""abs_img""> <img src=""[disp_template_path]/JRM/abst-image/00270002/06.jpg"" width=""300"" /> Hand pose estimation with ultrasmall camera</div> Operating a robot intentionally by using various complex motions of the hands and fingers requires a system that accurately detects hand and finger motions at high speed. This study uses an ultrasmall camera and compact computer for development of a wearable device of hand pose estimation, also called a hand-capture device. The accurate estimations, however, require data matching with a large database. But a compact computer usually has only limited memory and low machine power. We avoided this problem by reducing frequently used image characteristics from 1,600 dimensions to 64 dimensions of characteristic quantities. This saved on memory and lowered computational cost while achieving high accuracy and speed. To enable an operator to wear the device comfortably, the camera was placed as close to the back of the hand as possible to enable hand pose estimation from hand images without fingertips. A prototype device with a compact computer used to evaluate performance indicated that the device achieved high-speed estimation. Estimation accuracy was 2.32°±14.61° at the PIP joint of the index finger and 3.06°±10.56° at the CM joint of the thumb – as accurate as obtained using previous methods. This indicated that dimensional compression of image-characteristic quantities is important for realizing a compact hand-capture device. </span>

Download Full-text

Head Pose Estimation with Improved Random Regression Forests

Mathematical Problems in Engineering ◽

10.1155/2015/703514 ◽

2015 ◽

Vol 2015 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Gaoli Sang ◽

Hu Chen ◽

Qijun Zhao

Keyword(s):

Random Forest ◽

Pose Estimation ◽

Feature Weighting ◽

Random Regression ◽

Head Pose Estimation ◽

Estimation Accuracy ◽

Head Pose ◽

Face Images ◽

Sampling Intervals ◽

The Impact

Perception of head pose is useful for many face-related tasks such as face recognition, gaze estimation, and emotion analysis. In this paper, we propose a novel random forest based method for estimating head pose angles from single face images. In order to improve the effectiveness of the constructed head pose predictor, we introduce feature weighting and tree screening into the random forest training process. In this way, the features with more discriminative power are more likely to be chosen for constructing trees, and each of the trees in the obtained random forest usually has high pose estimation accuracy, while the diversity or generalization ability of the forest is not deteriorated. The proposed method has been evaluated on four public databases as well as a surveillance dataset collected by ourselves. The results show that the proposed method can achieve state-of-the-art pose estimation accuracy. Moreover, we investigate the impact of pose angle sampling intervals and heterogeneous face images on the effectiveness of trained head pose predictors.

Download Full-text

Monocular weakly supervised depth and pose estimation method based on multi-information fusion

Kuwait Journal of Science ◽

10.48129/kjs.12929 ◽

2021 ◽

Author(s):

Zhimin Zhang ◽

◽

Jianzhong Qiao ◽

Shukuan Lin ◽

◽

...

Keyword(s):

Information Fusion ◽

Pose Estimation ◽

Video Sequence ◽

Stereo Matching ◽

Estimation Method ◽

Ground Truth ◽

Estimation Accuracy ◽

Monocular Video ◽

Monocular Depth ◽

Weakly Supervised

The depth and pose information are the basic issues in the ﬁeld of robotics, autonomous driving, and virtual reality, and are also the focus and difﬁcult issues of computer vision research. The supervised monocular depth and pose estimation learning are not feasible in environments where labeled data is not abundant. Self-supervised monocular video methods can learn effectively only by applying photometric constraints without expensive ground true depth label constraints, which results in an inefﬁcient training process and suboptimal estimation accuracy. To solve these problems, a monocular weakly supervised depth and pose estimation method based on multi-information fusion is proposed in this paper. First, we design a high-precision stereo matching method to generate a depth and pose data as the "Ground Truth" labels to solve the problem that the ground truth labels are difﬁcult to obtain. Then, we construct a multi-information fusion network model based on the "Ground truth" labels, video sequence, and IMU information to improve the estimation accuracy. Finally, we design the loss function of supervised cues based on "Ground Truth" labels cues and self-supervised cues to optimize our model. In the testing phase, the network model can separately output high-precision depth and pose data from a monocular video sequence. The resulting model outperforms mainstream monocular depth and poses estimation methods as well as the partial stereo matching method in the challenging KITTI dataset by only using a small number of real training data(200 pairs).

Download Full-text

An Adaptive Pose Fusion Method for Indoor Map Construction

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10120800 ◽

2021 ◽

Vol 10 (12) ◽

pp. 800

Author(s):

Jinming Zhang ◽

Lianrui Xu ◽

Cuizhu Bao

Keyword(s):

Pose Estimation ◽

Estimation Accuracy ◽

Measurement Unit ◽

Fusion Method ◽

Static State ◽

Matching Process ◽

Map Construction ◽

Mapping System ◽

Weak Light ◽

High Dynamics

The vision-based robot pose estimation and mapping system has the disadvantage of low pose estimation accuracy and poor local detail mapping effects, while the modeling environment has poor features, high dynamics, weak light, and multiple shadows, among others. To address these issues, we propose an adaptive pose fusion (APF) method to fuse the robot’s pose and use the optimized pose to construct an indoor map. Firstly, the proposed method calculates the robot’s pose by the camera and inertial measurement unit (IMU), respectively. Then, the pose fusion method is adaptively selected according to the motion state of the robot. When the robot is in a static state, the proposed method directly uses the extended Kalman filter (EKF) method to fuse camera and IMU data. When the robot is in a motive state, the weighted coefficient is determined according to the matching success rate of the feature points, and the weighted pose fusion (WPF) method is used to fuse camera and IMU data. According to the different states, a series of new poses of the robot are obtained. Secondly, the fusion optimized pose is used to correct the distance and azimuth angle of the laser points obtained by LiDAR, and a Gauss–Newton iterative matching process is used to match the corresponding laser points to construct an indoor map. Finally, a pose fusion experiment is designed, and the EuRoc data and the measured data are used to verify the effectiveness of this method. The experimental results confirm that this method provides higher pose estimation accuracy compared with the robust visual inertial odometry (ROVIO) and visual-inertial ORB-SLAM (VI ORB-SLAM) algorithms. Compared with the Cartographer algorithm, this method provides higher two-dimensional map modeling accuracy and modeling performance.

Download Full-text