scholarly journals Visual Attention and Color Cues for 6D Pose Estimation on Occluded Scenarios Using RGB-D Data

Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 8090
Author(s):  
Joel Vidal ◽  
Chyi-Yeu Lin ◽  
Robert Martí

Recently, 6D pose estimation methods have shown robust performance on highly cluttered scenes and different illumination conditions. However, occlusions are still challenging, with recognition rates decreasing to less than 10% for half-visible objects in some datasets. In this paper, we propose to use top-down visual attention and color cues to boost performance of a state-of-the-art method on occluded scenarios. More specifically, color information is employed to detect potential points in the scene, improve feature-matching, and compute more precise fitting scores. The proposed method is evaluated on the Linemod occluded (LM-O), TUD light (TUD-L), Tejani (IC-MI) and Doumanoglou (IC-BIN) datasets, as part of the SiSo BOP benchmark, which includes challenging highly occluded cases, illumination changing scenarios, and multiple instances. The method is analyzed and discussed for different parameters, color spaces and metrics. The presented results show the validity of the proposed approach and their robustness against illumination changes and multiple instance scenarios, specially boosting the performance on relatively high occluded cases. The proposed solution provides an absolute improvement of up to 30% for levels of occlusion between 40% to 50%, outperforming other approaches with a best overall recall of 71% for the LM-O, 92% for TUD-L, 99.3% for IC-MI and 97.5% for IC-BIN.

2016 ◽  
Vol 2016 ◽  
pp. 1-10 ◽  
Author(s):  
Mohamed Alsheakhali ◽  
Abouzar Eslami ◽  
Hessam Roodaki ◽  
Nassir Navab

Detection of instrument tip in retinal microsurgery videos is extremely challenging due to rapid motion, illumination changes, the cluttered background, and the deformable shape of the instrument. For the same reason, frequent failures in tracking add the overhead of reinitialization of the tracking. In this work, a new method is proposed to localize not only the instrument center point but also its tips and orientation without the need of manual reinitialization. Our approach models the instrument as a Conditional Random Field (CRF) where each part of the instrument is detected separately. The relations between these parts are modeled to capture the translation, rotation, and the scale changes of the instrument. The tracking is done via separate detection of instrument parts and evaluation of confidence via the modeled dependence functions. In case of low confidence feedback an automatic recovery process is performed. The algorithm is evaluated on in vivo ophthalmic surgery datasets and its performance is comparable to the state-of-the-art methods with the advantage that no manual reinitialization is needed.


2021 ◽  
Vol 13 (4) ◽  
pp. 663
Author(s):  
Runze Fan ◽  
Ting-Bing Xu ◽  
Zhenzhong Wei

This article addresses the challenge of 6D aircraft pose estimation from a single RGB image during the flight. Many recent works have shown that keypoints-based approaches, which first detect keypoints and then estimate the 6D pose, achieve remarkable performance. However, it is hard to locate the keypoints precisely in complex weather scenes. In this article, we propose a novel approach, called Pose Estimation with Keypoints and Structures (PEKS), which leverages multiple intermediate representations to estimate the 6D pose. Unlike previous works, our approach simultaneously locates keypoints and structures to recover the pose parameter of aircraft through a Perspective-n-Point Structure (PnPS) algorithm. These representations integrate the local geometric information of the object and the topological relationship between components of the target, which effectively improve the accuracy and robustness of 6D pose estimation. In addition, we contribute a dataset for aircraft pose estimation which consists of 3681 real images and 216,000 rendered images. Extensive experiments on our own aircraft pose dataset and multiple open-access pose datasets (e.g., ObjectNet3D, LineMOD) demonstrate that our proposed method can accurately estimate 6D aircraft pose in various complex weather scenes while achieving the comparative performance with the state-of-the-art pose estimation methods.


2020 ◽  
Vol 218 ◽  
pp. 03023
Author(s):  
Zhiqin Zhang ◽  
Bo Zhang ◽  
Fen Li ◽  
Dehua Kong

In This paper, we propose a hand pose estimation neural networks architecture named MSAHP which can improve PCK (percentage correct keypoints) greatly by fusing self-attention module in CNN (Convolutional Neural Networks). The proposed network is based on a ResNet (Residual Neural Network) backbone and concatenate discriminative features through multiple different scale feature maps, then multiple head self-attention module was used to focus on the salient feature map area. In recent years, self-attention mechanism was applicated widely in NLP and speech recognition, which can improve greatly key metrics. But in compute vision especially for hand pose estimation, we did not find the application. Experiments on hand pose estimation dataset demonstrate the improved PCK of our MSAHP than the existing state-of-the-art hand pose estimation methods. Specifically, the proposed method can achieve 93.68% PCK score on our mixed test dataset.


Author(s):  
Mukhiddin Toshpulatov ◽  
Wookey Lee ◽  
Suan Lee ◽  
Arousha Haghighian Roudsari

AbstractHuman pose estimation is one of the issues that have gained many benefits from using state-of-the-art deep learning-based models. Human pose, hand and mesh estimation is a significant problem that has attracted the attention of the computer vision community for the past few decades. A wide variety of solutions have been proposed to tackle the problem. Deep Learning-based approaches have been extensively studied in recent years and used to address several computer vision problems. However, it is sometimes hard to compare these methods due to their intrinsic difference. This paper extensively summarizes the current deep learning-based 2D and 3D human pose, hand and mesh estimation methods with a single or multi-person, single or double-stage methodology-based taxonomy. The authors aim to make every step in the deep learning-based human pose, hand and mesh estimation techniques interpretable by providing readers with a readily understandable explanation. The presented taxonomy has clearly illustrated current research on deep learning-based 2D and 3D human pose, hand and mesh estimation. Moreover, it also provided dataset and evaluation metrics for both 2D and 3DHPE approaches.


2021 ◽  
Vol 11 (9) ◽  
pp. 4241
Author(s):  
Jiahua Wu ◽  
Hyo Jong Lee

In bottom-up multi-person pose estimation, grouping joint candidates into the appropriately structured corresponding instance of a person is challenging. In this paper, a new bottom-up method, the Partitioned CenterPose (PCP) Network, is proposed to better cluster the detected joints. To achieve this goal, we propose a novel approach called Partition Pose Representation (PPR) which integrates the instance of a person and its body joints based on joint offset. PPR leverages information about the center of the human body and the offsets between that center point and the positions of the body’s joints to encode human poses accurately. To enhance the relationships between body joints, we divide the human body into five parts, and then, we generate a sub-PPR for each part. Based on this PPR, the PCP Network can detect people and their body joints simultaneously, then group all body joints according to joint offset. Moreover, an improved l1 loss is designed to more accurately measure joint offset. Using the COCO keypoints and CrowdPose datasets for testing, it was found that the performance of the proposed method is on par with that of existing state-of-the-art bottom-up methods in terms of accuracy and speed.


2017 ◽  
Vol 18 (2) ◽  
pp. 107-117 ◽  
Author(s):  
György Kovács

Abstract The transport activity is one of the most expensive processes in the supply chain. Forwarding and transport companies focuses on the optimization of transportation and the reduction of transport costs. The goal of this study is to develop a method which calculate the first (prime) cost of a given transport task more precisely than the state of the art practices. In practice the calculation of transport fee depends on the individual estimation methods of the transport managers, which could result losses for the company. In this study the elaborated calculation method for total first cost is detailed for three types of fulfilment of transport tasks. The most common type of achievement is, when “own vehicle is used with own driver”. A software was also developed for this case based on the elaborated method. Based on the calculations of our software, the first cost can be defined quickly and precisely to realize higher profit.


2021 ◽  
Vol 10 (11) ◽  
pp. 748
Author(s):  
Ferdinand Maiwald ◽  
Christoph Lehmann ◽  
Taras Lazariv

The idea of virtual time machines in digital environments like hand-held virtual reality or four-dimensional (4D) geographic information systems requires an accurate positioning and orientation of urban historical images. The browsing of large repositories to retrieve historical images and their subsequent precise pose estimation is still a manual and time-consuming process in the field of Cultural Heritage. This contribution presents an end-to-end pipeline from finding relevant images with utilization of content-based image retrieval to photogrammetric pose estimation of large historical terrestrial image datasets. Image retrieval as well as pose estimation are challenging tasks and are subjects of current research. Thereby, research has a strong focus on contemporary images but the methods are not considered for a use on historical image material. The first part of the pipeline comprises the precise selection of many relevant historical images based on a few example images (so called query images) by using content-based image retrieval. Therefore, two different retrieval approaches based on convolutional neural networks (CNN) are tested, evaluated, and compared with conventional metadata search in repositories. Results show that image retrieval approaches outperform the metadata search and are a valuable strategy for finding images of interest. The second part of the pipeline uses techniques of photogrammetry to derive the camera position and orientation of the historical images identified by the image retrieval. Multiple feature matching methods are used on four different datasets, the scene is reconstructed in the Structure-from-Motion software COLMAP, and all experiments are evaluated on a newly generated historical benchmark dataset. A large number of oriented images, as well as low error measures for most of the datasets, show that the workflow can be successfully applied. Finally, the combination of a CNN-based image retrieval and the feature matching methods SuperGlue and DISK show very promising results to realize a fully automated workflow. Such an automated workflow of selection and pose estimation of historical terrestrial images enables the creation of large-scale 4D models.


2022 ◽  
Vol 31 (2) ◽  
pp. 1-32
Author(s):  
Luca Ardito ◽  
Andrea Bottino ◽  
Riccardo Coppola ◽  
Fabrizio Lamberti ◽  
Francesco Manigrasso ◽  
...  

In automated Visual GUI Testing (VGT) for Android devices, the available tools often suffer from low robustness to mobile fragmentation, leading to incorrect results when running the same tests on different devices. To soften these issues, we evaluate two feature matching-based approaches for widget detection in VGT scripts, which use, respectively, the complete full-screen snapshot of the application ( Fullscreen ) and the cropped images of its widgets ( Cropped ) as visual locators to match on emulated devices. Our analysis includes validating the portability of different feature-based visual locators over various apps and devices and evaluating their robustness in terms of cross-device portability and correctly executed interactions. We assessed our results through a comparison with two state-of-the-art tools, EyeAutomate and Sikuli. Despite a limited increase in the computational burden, our Fullscreen approach outperformed state-of-the-art tools in terms of correctly identified locators across a wide range of devices and led to a 30% increase in passing tests. Our work shows that VGT tools’ dependability can be improved by bridging the testing and computer vision communities. This connection enables the design of algorithms targeted to domain-specific needs and thus inherently more usable and robust.


Sensors ◽  
2020 ◽  
Vol 20 (4) ◽  
pp. 1074 ◽  
Author(s):  
Weiya Chen ◽  
Chenchen Yu ◽  
Chenyu Tu ◽  
Zehua Lyu ◽  
Jing Tang ◽  
...  

Real-time sensing and modeling of the human body, especially the hands, is an important research endeavor for various applicative purposes such as in natural human computer interactions. Hand pose estimation is a big academic and technical challenge due to the complex structure and dexterous movement of human hands. Boosted by advancements from both hardware and artificial intelligence, various prototypes of data gloves and computer-vision-based methods have been proposed for accurate and rapid hand pose estimation in recent years. However, existing reviews either focused on data gloves or on vision methods or were even based on a particular type of camera, such as the depth camera. The purpose of this survey is to conduct a comprehensive and timely review of recent research advances in sensor-based hand pose estimation, including wearable and vision-based solutions. Hand kinematic models are firstly discussed. An in-depth review is conducted on data gloves and vision-based sensor systems with corresponding modeling methods. Particularly, this review also discusses deep-learning-based methods, which are very promising in hand pose estimation. Moreover, the advantages and drawbacks of the current hand gesture estimation methods, the applicative scope, and related challenges are also discussed.


Sign in / Sign up

Export Citation Format

Share Document