scholarly journals Robust Pose Estimation and Calibration of Catadioptric Cameras With Spherical Mirrors

2020 ◽  
Vol 86 (1) ◽  
pp. 33-44
Author(s):  
Sagi Filin ◽  
Grigory Ilizirov ◽  
Bashar Elnashef

Catadioptric cameras broaden the field of view and reveal otherwise occluded object parts. They differ geometrically from central-perspective cameras because of light reflection from the mirror surface. To handle these effects, we present new pose-estimation and reconstruction models for imaging through spherical mirrors. We derive a closed-form equivalent to the collinearity principle via which three methods are established to estimate the system parameters: a resection-based one, a trilateration-based one that introduces novel constraints that enhance accuracy, and a direct and linear transform-based one. The estimated system parameters exhibit improved accuracy compared to the state of the art, and analysis shows intrinsic robustness to the presence of a high fraction of outliers. We then show that 3D point reconstruction can be performed at accurate levels. Thus, we provide an in-depth look into the geometrical modeling of spherical catadioptric systems and practical enhancements of accuracies and requirements to reach them.

Author(s):  
Grigory Ilizirov ◽  
Sagi Filin

Catadioptric cameras have the advantage of broadening the field of view and revealing otherwise occluded object parts. However, they differ geometrically from standard central perspective cameras because of light reflection from the mirror surface which alters the collinearity relation and introduces severe non-linear distortions of the imaged scene. Accommodating for these features, we present in this paper a novel modeling for pose estimation and reconstruction while imaging through spherical mirrors. We derive a closed-form equivalent to the collinearity principle via which we estimate the system’s parameters. Our model yields a resection-like solution which can be developed into a linear one. We show that accurate estimates can be derived with only a small set of control points. Analysis shows that control configuration in the orientation scheme is rather flexible and that high levels of accuracy can be reached in both pose estimation and mapping. Clearly, the ability to model objects which fall outside of the immediate camera field-of-view offers an appealing means to supplement 3-D reconstruction and modeling.


Author(s):  
Grigory Ilizirov ◽  
Sagi Filin

Catadioptric cameras have the advantage of broadening the field of view and revealing otherwise occluded object parts. However, they differ geometrically from standard central perspective cameras because of light reflection from the mirror surface which alters the collinearity relation and introduces severe non-linear distortions of the imaged scene. Accommodating for these features, we present in this paper a novel modeling for pose estimation and reconstruction while imaging through spherical mirrors. We derive a closed-form equivalent to the collinearity principle via which we estimate the system’s parameters. Our model yields a resection-like solution which can be developed into a linear one. We show that accurate estimates can be derived with only a small set of control points. Analysis shows that control configuration in the orientation scheme is rather flexible and that high levels of accuracy can be reached in both pose estimation and mapping. Clearly, the ability to model objects which fall outside of the immediate camera field-of-view offers an appealing means to supplement 3-D reconstruction and modeling.


2021 ◽  
Vol 11 (15) ◽  
pp. 6975
Author(s):  
Tao Zhang ◽  
Lun He ◽  
Xudong Li ◽  
Guoqing Feng

Lipreading aims to recognize sentences being spoken by a talking face. In recent years, the lipreading method has achieved a high level of accuracy on large datasets and made breakthrough progress. However, lipreading is still far from being solved, and existing methods tend to have high error rates on the wild data and have the defects of disappearing training gradient and slow convergence. To overcome these problems, we proposed an efficient end-to-end sentence-level lipreading model, using an encoder based on a 3D convolutional network, ResNet50, Temporal Convolutional Network (TCN), and a CTC objective function as the decoder. More importantly, the proposed architecture incorporates TCN as a feature learner to decode feature. It can partly eliminate the defects of RNN (LSTM, GRU) gradient disappearance and insufficient performance, and this yields notable performance improvement as well as faster convergence. Experiments show that the training and convergence speed are 50% faster than the state-of-the-art method, and improved accuracy by 2.4% on the GRID dataset.


2021 ◽  
Vol 11 (9) ◽  
pp. 4241
Author(s):  
Jiahua Wu ◽  
Hyo Jong Lee

In bottom-up multi-person pose estimation, grouping joint candidates into the appropriately structured corresponding instance of a person is challenging. In this paper, a new bottom-up method, the Partitioned CenterPose (PCP) Network, is proposed to better cluster the detected joints. To achieve this goal, we propose a novel approach called Partition Pose Representation (PPR) which integrates the instance of a person and its body joints based on joint offset. PPR leverages information about the center of the human body and the offsets between that center point and the positions of the body’s joints to encode human poses accurately. To enhance the relationships between body joints, we divide the human body into five parts, and then, we generate a sub-PPR for each part. Based on this PPR, the PCP Network can detect people and their body joints simultaneously, then group all body joints according to joint offset. Moreover, an improved l1 loss is designed to more accurately measure joint offset. Using the COCO keypoints and CrowdPose datasets for testing, it was found that the performance of the proposed method is on par with that of existing state-of-the-art bottom-up methods in terms of accuracy and speed.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1091
Author(s):  
Izaak Van Crombrugge ◽  
Rudi Penne ◽  
Steve Vanlanduit

Knowledge of precise camera poses is vital for multi-camera setups. Camera intrinsics can be obtained for each camera separately in lab conditions. For fixed multi-camera setups, the extrinsic calibration can only be done in situ. Usually, some markers are used, like checkerboards, requiring some level of overlap between cameras. In this work, we propose a method for cases with little or no overlap. Laser lines are projected on a plane (e.g., floor or wall) using a laser line projector. The pose of the plane and cameras is then optimized using bundle adjustment to match the lines seen by the cameras. To find the extrinsic calibration, only a partial overlap between the laser lines and the field of view of the cameras is needed. Real-world experiments were conducted both with and without overlapping fields of view, resulting in rotation errors below 0.5°. We show that the accuracy is comparable to other state-of-the-art methods while offering a more practical procedure. The method can also be used in large-scale applications and can be fully automated.


Author(s):  
J. Li-Chee-Ming ◽  
C. Armenakis

This paper presents a novel application of the Visual Servoing Platform’s (ViSP) for pose estimation in indoor and GPS-denied outdoor environments. Our proposed solution integrates the trajectory solution from RGBD-SLAM into ViSP’s pose estimation process. Li-Chee-Ming and Armenakis (2015) explored the application of ViSP in mapping large outdoor environments, and tracking larger objects (i.e., building models). Their experiments revealed that tracking was often lost due to a lack of model features in the camera’s field of view, and also because of rapid camera motion. Further, the pose estimate was often biased due to incorrect feature matches. This work proposes a solution to improve ViSP’s pose estimation performance, aiming specifically to reduce the frequency of tracking losses and reduce the biases present in the pose estimate. This paper explores the integration of ViSP with RGB-D SLAM. We discuss the performance of the combined tracker in mapping indoor environments and tracking 3D wireframe indoor building models, and present preliminary results from our experiments.


2014 ◽  
Vol 2014 ◽  
pp. 1-23 ◽  
Author(s):  
Francisco Amorós ◽  
Luis Payá ◽  
Oscar Reinoso ◽  
Walterio Mayol-Cuevas ◽  
Andrew Calway

In this work we present a topological map building and localization system for mobile robots based on global appearance of visual information. We include a comparison and analysis of global-appearance techniques applied to wide-angle scenes in retrieval tasks. Next, we define multiscale analysis, which permits improving the association between images and extracting topological distances. Then, a topological map-building algorithm is proposed. At first, the algorithm has information only of some isolated positions of the navigation area in the form of nodes. Each node is composed of a collection of images that covers the complete field of view from a certain position. The algorithm solves the node retrieval and estimates their spatial arrangement. With these aims, it uses the visual information captured along some routes that cover the navigation area. As a result, the algorithm builds a graph that reflects the distribution and adjacency relations between nodes (map). After the map building, we also propose a route path estimation system. This algorithm takes advantage of the multiscale analysis. The accuracy in the pose estimation is not reduced to the nodes locations but also to intermediate positions between them. The algorithms have been tested using two different databases captured in real indoor environments under dynamic conditions.


Sensors ◽  
2019 ◽  
Vol 19 (8) ◽  
pp. 1889 ◽  
Author(s):  
Shuang Liu ◽  
Hongli Xu ◽  
Yang Lin ◽  
Lei Gao

Autonomous underwater vehicles (AUVs) play very important roles in underwater missions. However, the reliability of the automated recovery of AUVs has still not been well addressed. We propose a vision-based framework for automatically recovering an AUV by another AUV in shallow water. The proposed framework contains a detection phase for the robust detection of underwater landmarks mounted on the docking station in shallow water and a pose-estimation phase for estimating the pose between AUVs and underwater landmarks. We propose a Laplacian-of-Gaussian-based coarse-to-fine blockwise (LCB) method for the detection of underwater landmarks to overcome ambient light and nonuniform spreading, which are the two main problems in shallow water. We propose a novel method for pose estimation in practical cases where landmarks are broken or covered by biofouling. In the experiments, we show that our proposed LCB method outperforms the state-of-the-art method in terms of remote landmark detection. We then combine our proposed vision-based framework with acoustic sensors in field experiments to demonstrate its effectiveness in the automated recovery of AUVs.


2017 ◽  
Vol 11 (3) ◽  
Author(s):  
Matthias Ehrhart ◽  
Werner Lienhart

AbstractThe importance of automated prism tracking is increasingly triggered by the rising automation of total station measurements in machine control, monitoring and one-person operation. In this article we summarize and explain the different techniques that are used to coarsely search a prism, to precisely aim at a prism, and to identify whether the correct prism is tracked. Along with the state-of-the-art review, we discuss and experimentally evaluate possible improvements based on the image data of an additional wide-angle camera which is available for many total stations today. In cases in which the total station’s fine aiming module loses the prism, the tracked object may still be visible to the wide-angle camera because of its larger field of view. The theodolite angles towards the target can then be derived from its image coordinates which facilitates a fast reacquisition of the prism. In experimental measurements we demonstrate that our image-based approach for the coarse target search is 4 to 10-times faster than conventional approaches.


2020 ◽  
Vol 34 (07) ◽  
pp. 11924-11931
Author(s):  
Zhongwei Qiu ◽  
Kai Qiu ◽  
Jianlong Fu ◽  
Dongmei Fu

Multi-person pose estimation aims to detect human keypoints from images with multiple persons. Bottom-up methods for multi-person pose estimation have attracted extensive attention, owing to the good balance between efficiency and accuracy. Recent bottom-up methods usually follow the principle of keypoints localization and grouping, where relations between keypoints are the keys to group keypoints. These relations spontaneously construct a graph of keypoints, where the edges represent the relations between two nodes (i.e., keypoints). Existing bottom-up methods mainly define relations by empirically picking out edges from this graph, while omitting edges that may contain useful semantic relations. In this paper, we propose a novel Dynamic Graph Convolutional Module (DGCM) to model rich relations in the keypoints graph. Specifically, we take into account all relations (all edges of the graph) and construct dynamic graphs to tolerate large variations of human pose. The DGCM is quite lightweight, which allows it to be stacked like a pyramid architecture and learn structural relations from multi-level features. Our network with single DGCM based on ResNet-50 achieves relative gains of 3.2% and 4.8% over state-of-the-art bottom-up methods on COCO keypoints and MPII dataset, respectively.


Sign in / Sign up

Export Citation Format

Share Document