scholarly journals Retro-Reflective-Marker-Aided Target Pose Estimation in a Safety-Critical Environment

2020 ◽  
Vol 11 (1) ◽  
pp. 3
Author(s):  
Laura Gonçalves Ribeiro ◽  
Olli J. Suominen ◽  
Ahmed Durmush ◽  
Sari Peltonen ◽  
Emilio Ruiz Morales ◽  
...  

Visual technologies have an indispensable role in safety-critical applications, where tasks must often be performed through teleoperation. Due to the lack of stereoscopic and motion parallax depth cues in conventional images, alignment tasks pose a significant challenge to remote operation. In this context, machine vision can provide mission-critical information to augment the operator’s perception. In this paper, we propose a retro-reflector marker-based teleoperation aid to be used in hostile remote handling environments. The system computes the remote manipulator’s position with respect to the target using a set of one or two low-resolution cameras attached to its wrist. We develop an end-to-end pipeline of calibration, marker detection, and pose estimation, and extensively study the performance of the overall system. The results demonstrate that we have successfully engineered a retro-reflective marker from materials that can withstand the extreme temperature and radiation levels of the environment. Furthermore, we demonstrate that the proposed maker-based approach provides robust and reliable estimates and significantly outperforms a previous stereo-matching-based approach, even with a single camera.

2019 ◽  
Vol 39 (9) ◽  
pp. 0915004
Author(s):  
张雄锋 Xiongfeng Zhang ◽  
刘海波 Haibo Liu ◽  
尚洋 Yang Shang

2006 ◽  
Vol 03 (01) ◽  
pp. 53-60
Author(s):  
LUPING AN ◽  
YUNDE JIA ◽  
MINGTAO PEI ◽  
HONGBIN DENG

In this article, a method of the precise shape measurement of dynamic surfaces via a single camera stereo vision system is presented, a cross-curve pattern is painted on the surface of an object, and the intersections of cross-curves which represent the shape of the object are measured by the stereo vision system. The system with a single camera is modeled as a virtual binocular stereo by strong calibration technique. Binocular epipolar rectification is used to make the stereo matching efficient, and principal curves theory is employed to extract curves in images for stereo matching. Under the framework of RANSAC, the curves are interpolated robustly with cubic spline based on moving-least-square (MLS). Experimental results on both static and dynamic deforming surfaces illustrate the effectiveness of the proposed method.


2015 ◽  
Vol 2015 (0) ◽  
pp. _2A2-E07_1-_2A2-E07_4
Author(s):  
Masanori AIZAWA ◽  
Toshihiro MAKI ◽  
Yoshiki SATO ◽  
Takashi SAKAMAKI

2018 ◽  
Author(s):  
Reuben Rideaux ◽  
William J Harrison

ABSTRACTDiscerning objects from their surrounds (i.e., figure-ground segmentation) in a way that guides adaptive behaviours is a fundamental task of the brain. Neurophysiological work has revealed a class of cells in the macaque visual cortex that may be ideally suited to support this neural computation: border-ownership cells (Zhou, Friedman, & von der Heydt, 2000). These orientation-tuned cells appear to respond conditionally to the borders of objects. A behavioural correlate supporting the existence of these cells in humans was demonstrated using two-dimensional luminance defined objects (von der Heydt, Macuda, & Qiu, 2005). However, objects in our natural visual environments are often signalled by complex cues, such as motion and depth order. Thus, for border-ownership systems to effectively support figure-ground segmentation and object depth ordering, they must have access to information from multiple depth cues with strict depth order selectivity. Here we measure in humans (of both sexes) border-ownership-dependent tilt aftereffects after adapting to figures defined by either motion parallax or binocular disparity. We find that both depth cues produce a tilt aftereffect that is selective for figure-ground depth order. Further, we find the effects of adaptation are transferable between cues, suggesting that these systems may combine depth cues to reduce uncertainty (Bülthoff & Mallot, 1988). These results suggest that border-ownership mechanisms have strict depth order selectivity and access to multiple depth cues that are jointly encoded, providing compelling psychophysical support for their role in figure-ground segmentation in natural visual environments.SIGNIFICANCE STATEMENTSegmenting a visual object from its surrounds is a critical function that may be supported by “border-ownership” neural systems that conditionally respond to object borders. Psychophysical work indicates these systems are sensitive to objects defined by luminance contrast. To effectively support figure-ground segmentation, however, neural systems supporting border-ownership must have access to information from multiple depth cues and depth order selectivity. We measured border-ownership-dependent tilt aftereffects to figures defined by either motion parallax or binocular disparity and found aftereffects for both depth cues. These effects were transferable between cues, but selective for figure-ground depth order. Our results suggest that the neural systems supporting figure-ground segmentation have strict depth order selectivity and access to multiple depth cues that are jointly encoded.


2021 ◽  
Author(s):  
Zhimin Zhang ◽  
◽  
Jianzhong Qiao ◽  
Shukuan Lin ◽  
◽  
...  

The depth and pose information are the basic issues in the field of robotics, autonomous driving, and virtual reality, and are also the focus and difficult issues of computer vision research. The supervised monocular depth and pose estimation learning are not feasible in environments where labeled data is not abundant. Self-supervised monocular video methods can learn effectively only by applying photometric constraints without expensive ground true depth label constraints, which results in an inefficient training process and suboptimal estimation accuracy. To solve these problems, a monocular weakly supervised depth and pose estimation method based on multi-information fusion is proposed in this paper. First, we design a high-precision stereo matching method to generate a depth and pose data as the "Ground Truth" labels to solve the problem that the ground truth labels are difficult to obtain. Then, we construct a multi-information fusion network model based on the "Ground truth" labels, video sequence, and IMU information to improve the estimation accuracy. Finally, we design the loss function of supervised cues based on "Ground Truth" labels cues and self-supervised cues to optimize our model. In the testing phase, the network model can separately output high-precision depth and pose data from a monocular video sequence. The resulting model outperforms mainstream monocular depth and poses estimation methods as well as the partial stereo matching method in the challenging KITTI dataset by only using a small number of real training data(200 pairs).


2021 ◽  
Vol 11 (17) ◽  
pp. 8047
Author(s):  
Dongkyu Lee ◽  
Wee Peng Tay ◽  
Seok-Cheol Kee

In this work, a study was carried out to estimate a look-up table (LUT) that converts a camera image plane to a birds eye view (BEV) plane using a single camera. The traditional camera pose estimation fields require high costs in researching and manufacturing autonomous vehicles for the future and may require pre-configured infra. This paper proposes an autonomous vehicle driving camera calibration system that is low cost and utilizes low infra. A network that outputs an image in the form of an LUT that converts the image into a BEV by estimating the camera pose under urban road driving conditions using a single camera was studied. We propose a network that predicts human-like poses from a single image. We collected synthetic data using a simulator, made BEV and LUT as ground truth, and utilized the proposed network and ground truth to train pose estimation function. In the progress, it predicts the pose by deciphering the semantic segmentation feature and increases its performance by attaching a layer that handles the overall direction of the network. The network outputs camera angle (roll/pitch/yaw) on the 3D coordinate system so that the user can monitor learning. Since the network's output is a LUT, there is no need for additional calculation, and real-time performance is improved.


Author(s):  
Martin Böcker ◽  
Detlef Runde ◽  
Lothar Mühlbach

The paper addresses the question whether reproducing motion parallax increases the extent of telepresence in videocommunications. Motion parallax is defined as the change of the view due to the observer's movements. It was hypothesized that reproducing motion parallax (a) leads to more precise depth judgments by providing further depth cues, (b) allows ‘interactive viewing’, i.e. the observer can actively explore the visual scene by changing his/her position, and (c) compensates for stereoscopic “apparent movements”. In a Human Factors study, two videoconferencing set-ups providing motion parallax (one stereoscopic and one monoscopic version) were compared with two set-ups (monoscopic and stereoscopic) without motion parallax. Each set-up was used and rated by 32 subjects. The results supported the hypotheses only in part. Even though there was some evidence for more “spatial presence” and for a greater explorability of the scene through motion parallax, the compensation of apparent movements could not be achieved.


Sign in / Sign up

Export Citation Format

Share Document