Visual Attention and Color Cues for 6D Pose Estimation on Occluded Scenarios Using RGB-D Data

Recently, 6D pose estimation methods have shown robust performance on highly cluttered scenes and different illumination conditions. However, occlusions are still challenging, with recognition rates decreasing to less than 10% for half-visible objects in some datasets. In this paper, we propose to use top-down visual attention and color cues to boost performance of a state-of-the-art method on occluded scenarios. More specifically, color information is employed to detect potential points in the scene, improve feature-matching, and compute more precise fitting scores. The proposed method is evaluated on the Linemod occluded (LM-O), TUD light (TUD-L), Tejani (IC-MI) and Doumanoglou (IC-BIN) datasets, as part of the SiSo BOP benchmark, which includes challenging highly occluded cases, illumination changing scenarios, and multiple instances. The method is analyzed and discussed for different parameters, color spaces and metrics. The presented results show the validity of the proposed approach and their robustness against illumination changes and multiple instance scenarios, specially boosting the performance on relatively high occluded cases. The proposed solution provides an absolute improvement of up to 30% for levels of occlusion between 40% to 50%, outperforming other approaches with a best overall recall of 71% for the LM-O, 92% for TUD-L, 99.3% for IC-MI and 97.5% for IC-BIN.

Download Full-text

CRF-Based Model for Instrument Detection and Pose Estimation in Retinal Microsurgery

Computational and Mathematical Methods in Medicine ◽

10.1155/2016/1067509 ◽

2016 ◽

Vol 2016 ◽

pp. 1-10 ◽

Cited By ~ 5

Author(s):

Mohamed Alsheakhali ◽

Abouzar Eslami ◽

Hessam Roodaki ◽

Nassir Navab

Keyword(s):

Random Field ◽

Pose Estimation ◽

State Of The Art ◽

Conditional Random Field ◽

Recovery Process ◽

New Method ◽

Ophthalmic Surgery ◽

Illumination Changes ◽

Cluttered Background

Detection of instrument tip in retinal microsurgery videos is extremely challenging due to rapid motion, illumination changes, the cluttered background, and the deformable shape of the instrument. For the same reason, frequent failures in tracking add the overhead of reinitialization of the tracking. In this work, a new method is proposed to localize not only the instrument center point but also its tips and orientation without the need of manual reinitialization. Our approach models the instrument as a Conditional Random Field (CRF) where each part of the instrument is detected separately. The relations between these parts are modeled to capture the translation, rotation, and the scale changes of the instrument. The tracking is done via separate detection of instrument parts and evaluation of confidence via the modeled dependence functions. In case of low confidence feedback an automatic recovery process is performed. The algorithm is evaluated on in vivo ophthalmic surgery datasets and its performance is comparable to the state-of-the-art methods with the advantage that no manual reinitialization is needed.

Download Full-text

Estimating 6D Aircraft Pose from Keypoints and Structures

Remote Sensing ◽

10.3390/rs13040663 ◽

2021 ◽

Vol 13 (4) ◽

pp. 663

Author(s):

Runze Fan ◽

Ting-Bing Xu ◽

Zhenzhong Wei

Keyword(s):

Open Access ◽

Pose Estimation ◽

State Of The Art ◽

The State ◽

Estimation Methods ◽

Geometric Information ◽

Comparative Performance ◽

Topological Relationship ◽

Novel Approach ◽

Rgb Image

This article addresses the challenge of 6D aircraft pose estimation from a single RGB image during the flight. Many recent works have shown that keypoints-based approaches, which first detect keypoints and then estimate the 6D pose, achieve remarkable performance. However, it is hard to locate the keypoints precisely in complex weather scenes. In this article, we propose a novel approach, called Pose Estimation with Keypoints and Structures (PEKS), which leverages multiple intermediate representations to estimate the 6D pose. Unlike previous works, our approach simultaneously locates keypoints and structures to recover the pose parameter of aircraft through a Perspective-n-Point Structure (PnPS) algorithm. These representations integrate the local geometric information of the object and the topological relationship between components of the target, which effectively improve the accuracy and robustness of 6D pose estimation. In addition, we contribute a dataset for aircraft pose estimation which consists of 3681 real images and 216,000 rendered images. Extensive experiments on our own aircraft pose dataset and multiple open-access pose datasets (e.g., ObjectNet3D, LineMOD) demonstrate that our proposed method can accurately estimate 6D aircraft pose in various complex weather scenes while achieving the comparative performance with the state-of-the-art pose estimation methods.

Download Full-text

Multihead Self Attention Hand Pose Estimation

E3S Web of Conferences ◽

10.1051/e3sconf/202021803023 ◽

2020 ◽

Vol 218 ◽

pp. 03023

Author(s):

Zhiqin Zhang ◽

Bo Zhang ◽

Fen Li ◽

Dehua Kong

Keyword(s):

Neural Networks ◽

Pose Estimation ◽

State Of The Art ◽

Salient Feature ◽

Estimation Methods ◽

Feature Maps ◽

Test Dataset ◽

Hand Pose Estimation ◽

Network Backbone ◽

Hand Pose

In This paper, we propose a hand pose estimation neural networks architecture named MSAHP which can improve PCK (percentage correct keypoints) greatly by fusing self-attention module in CNN (Convolutional Neural Networks). The proposed network is based on a ResNet (Residual Neural Network) backbone and concatenate discriminative features through multiple different scale feature maps, then multiple head self-attention module was used to focus on the salient feature map area. In recent years, self-attention mechanism was applicated widely in NLP and speech recognition, which can improve greatly key metrics. But in compute vision especially for hand pose estimation, we did not find the application. Experiments on hand pose estimation dataset demonstrate the improved PCK of our MSAHP than the existing state-of-the-art hand pose estimation methods. Specifically, the proposed method can achieve 93.68% PCK score on our mixed test dataset.

Download Full-text

Human pose, hand and mesh estimation using deep learning: a survey

The Journal of Supercomputing ◽

10.1007/s11227-021-04184-7 ◽

2022 ◽

Author(s):

Mukhiddin Toshpulatov ◽

Wookey Lee ◽

Suan Lee ◽

Arousha Haghighian Roudsari

Keyword(s):

Computer Vision ◽

Deep Learning ◽

Pose Estimation ◽

State Of The Art ◽

Significant Problem ◽

Estimation Methods ◽

Human Pose Estimation ◽

Estimation Techniques ◽

The Past ◽

Human Pose

AbstractHuman pose estimation is one of the issues that have gained many benefits from using state-of-the-art deep learning-based models. Human pose, hand and mesh estimation is a significant problem that has attracted the attention of the computer vision community for the past few decades. A wide variety of solutions have been proposed to tackle the problem. Deep Learning-based approaches have been extensively studied in recent years and used to address several computer vision problems. However, it is sometimes hard to compare these methods due to their intrinsic difference. This paper extensively summarizes the current deep learning-based 2D and 3D human pose, hand and mesh estimation methods with a single or multi-person, single or double-stage methodology-based taxonomy. The authors aim to make every step in the deep learning-based human pose, hand and mesh estimation techniques interpretable by providing readers with a readily understandable explanation. The presented taxonomy has clearly illustrated current research on deep learning-based 2D and 3D human pose, hand and mesh estimation. Moreover, it also provided dataset and evaluation metrics for both 2D and 3DHPE approaches.

Download Full-text

A New Multi-Person Pose Estimation Method Using the Partitioned CenterPose Network

Applied Sciences ◽

10.3390/app11094241 ◽

2021 ◽

Vol 11 (9) ◽

pp. 4241

Author(s):

Jiahua Wu ◽

Hyo Jong Lee

Keyword(s):

Pose Estimation ◽

Human Body ◽

State Of The Art ◽

Estimation Method ◽

Bottom Up ◽

Center Point ◽

Novel Approach ◽

Body Joints

In bottom-up multi-person pose estimation, grouping joint candidates into the appropriately structured corresponding instance of a person is challenging. In this paper, a new bottom-up method, the Partitioned CenterPose (PCP) Network, is proposed to better cluster the detected joints. To achieve this goal, we propose a novel approach called Partition Pose Representation (PPR) which integrates the instance of a person and its body joints based on joint offset. PPR leverages information about the center of the human body and the offsets between that center point and the positions of the body’s joints to encode human poses accurately. To enhance the relationships between body joints, we divide the human body into five parts, and then, we generate a sub-PPR for each part. Based on this PPR, the PCP Network can detect people and their body joints simultaneously, then group all body joints according to joint offset. Moreover, an improved l1 loss is designed to more accurately measure joint offset. Using the COCO keypoints and CrowdPose datasets for testing, it was found that the performance of the proposed method is on par with that of existing state-of-the-art bottom-up methods in terms of accuracy and speed.

Download Full-text

First Cost Calculation Methods for Road Freight Transport Activity

Transport and Telecommunication Journal ◽

10.1515/ttj-2017-0010 ◽

2017 ◽

Vol 18 (2) ◽

pp. 107-117 ◽

Cited By ~ 1

Author(s):

György Kovács

Keyword(s):

State Of The Art ◽

Common Type ◽

Estimation Methods ◽

Cost Calculation ◽

Transport Activity ◽

Calculation Methods ◽

Transport Costs ◽

Art Practices ◽

Road Freight ◽

The Individual

Abstract The transport activity is one of the most expensive processes in the supply chain. Forwarding and transport companies focuses on the optimization of transportation and the reduction of transport costs. The goal of this study is to develop a method which calculate the first (prime) cost of a given transport task more precisely than the state of the art practices. In practice the calculation of transport fee depends on the individual estimation methods of the transport managers, which could result losses for the company. In this study the elaborated calculation method for total first cost is detailed for three types of fulfilment of transport tasks. The most common type of achievement is, when “own vehicle is used with own driver”. A software was also developed for this case based on the elaborated method. Based on the calculations of our software, the first cost can be defined quickly and precisely to realize higher profit.

Download Full-text

40 V to 100 V NLDMOS built on thin BOX SOI with high energy capability, state of the art Rdson/BVdss and robust performance

2018 IEEE 30th International Symposium on Power Semiconductor Devices and ICs (ISPSD) ◽

10.1109/ispsd.2018.8393708 ◽

2018 ◽

Cited By ~ 1

Author(s):

Yang Hao ◽

Sim Poh Ching ◽

Madelyn Liew ◽

Alexander Hoelke ◽

Uwe Eckoldt ◽

...

Keyword(s):

State Of The Art ◽

High Energy ◽

Robust Performance

Download Full-text

Fully Automated Pose Estimation of Historical Images in the Context of 4D Geographic Information Systems Utilizing Machine Learning Methods

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10110748 ◽

2021 ◽

Vol 10 (11) ◽

pp. 748

Author(s):

Ferdinand Maiwald ◽

Christoph Lehmann ◽

Taras Lazariv

Keyword(s):

Information Systems ◽

Geographic Information Systems ◽

Image Retrieval ◽

Pose Estimation ◽

Large Scale ◽

Feature Matching ◽

Geographic Information ◽

Content Based Image Retrieval ◽

Matching Methods ◽

Historical Images

The idea of virtual time machines in digital environments like hand-held virtual reality or four-dimensional (4D) geographic information systems requires an accurate positioning and orientation of urban historical images. The browsing of large repositories to retrieve historical images and their subsequent precise pose estimation is still a manual and time-consuming process in the field of Cultural Heritage. This contribution presents an end-to-end pipeline from finding relevant images with utilization of content-based image retrieval to photogrammetric pose estimation of large historical terrestrial image datasets. Image retrieval as well as pose estimation are challenging tasks and are subjects of current research. Thereby, research has a strong focus on contemporary images but the methods are not considered for a use on historical image material. The first part of the pipeline comprises the precise selection of many relevant historical images based on a few example images (so called query images) by using content-based image retrieval. Therefore, two different retrieval approaches based on convolutional neural networks (CNN) are tested, evaluated, and compared with conventional metadata search in repositories. Results show that image retrieval approaches outperform the metadata search and are a valuable strategy for finding images of interest. The second part of the pipeline uses techniques of photogrammetry to derive the camera position and orientation of the historical images identified by the image retrieval. Multiple feature matching methods are used on four different datasets, the scene is reconstructed in the Structure-from-Motion software COLMAP, and all experiments are evaluated on a newly generated historical benchmark dataset. A large number of oriented images, as well as low error measures for most of the datasets, show that the workflow can be successfully applied. Finally, the combination of a CNN-based image retrieval and the feature matching methods SuperGlue and DISK show very promising results to realize a fully automated workflow. Such an automated workflow of selection and pose estimation of historical terrestrial images enables the creation of large-scale 4D models.

Download Full-text

Feature Matching-based Approaches to Improve the Robustness of Android Visual GUI Testing

ACM Transactions on Software Engineering and Methodology ◽

10.1145/3477427 ◽

2022 ◽

Vol 31 (2) ◽

pp. 1-32

Author(s):

Luca Ardito ◽

Andrea Bottino ◽

Riccardo Coppola ◽

Fabrizio Lamberti ◽

Francesco Manigrasso ◽

...

Keyword(s):

Computer Vision ◽

Feature Matching ◽

State Of The Art ◽

Design Of Algorithms ◽

Computational Burden ◽

Domain Specific ◽

Gui Testing ◽

Wide Range ◽

Full Screen ◽

Feature Based

In automated Visual GUI Testing (VGT) for Android devices, the available tools often suffer from low robustness to mobile fragmentation, leading to incorrect results when running the same tests on different devices. To soften these issues, we evaluate two feature matching-based approaches for widget detection in VGT scripts, which use, respectively, the complete full-screen snapshot of the application ( Fullscreen ) and the cropped images of its widgets ( Cropped ) as visual locators to match on emulated devices. Our analysis includes validating the portability of different feature-based visual locators over various apps and devices and evaluating their robustness in terms of cross-device portability and correctly executed interactions. We assessed our results through a comparison with two state-of-the-art tools, EyeAutomate and Sikuli. Despite a limited increase in the computational burden, our Fullscreen approach outperformed state-of-the-art tools in terms of correctly identified locators across a wide range of devices and led to a 30% increase in passing tests. Our work shows that VGT tools’ dependability can be improved by bridging the testing and computer vision communities. This connection enables the design of algorithms targeted to domain-specific needs and thus inherently more usable and robust.

Download Full-text

A Survey on Hand Pose Estimation with Wearable Sensors and Computer-Vision-Based Methods

Sensors ◽

10.3390/s20041074 ◽

2020 ◽

Vol 20 (4) ◽

pp. 1074 ◽

Cited By ~ 3

Author(s):

Weiya Chen ◽

Chenchen Yu ◽

Chenyu Tu ◽

Zehua Lyu ◽

Jing Tang ◽

...

Keyword(s):

Computer Vision ◽

Pose Estimation ◽

Wearable Sensors ◽

Complex Structure ◽

Estimation Methods ◽

Human Computer Interactions ◽

Hand Pose Estimation ◽

Timely Review ◽

Kinematic Models ◽

Hand Pose

Real-time sensing and modeling of the human body, especially the hands, is an important research endeavor for various applicative purposes such as in natural human computer interactions. Hand pose estimation is a big academic and technical challenge due to the complex structure and dexterous movement of human hands. Boosted by advancements from both hardware and artificial intelligence, various prototypes of data gloves and computer-vision-based methods have been proposed for accurate and rapid hand pose estimation in recent years. However, existing reviews either focused on data gloves or on vision methods or were even based on a particular type of camera, such as the depth camera. The purpose of this survey is to conduct a comprehensive and timely review of recent research advances in sensor-based hand pose estimation, including wearable and vision-based solutions. Hand kinematic models are firstly discussed. An in-depth review is conducted on data gloves and vision-based sensor systems with corresponding modeling methods. Particularly, this review also discusses deep-learning-based methods, which are very promising in hand pose estimation. Moreover, the advantages and drawbacks of the current hand gesture estimation methods, the applicative scope, and related challenges are also discussed.

Download Full-text