scholarly journals Skeleton Image Representation for 3D Action Recognition Based on Tree Structure and Reference Joints

Author(s):  
Carlos Caetano ◽  
Francois Bremond ◽  
William Robson Schwartz
2003 ◽  
Vol 03 (01) ◽  
pp. 119-143 ◽  
Author(s):  
ZHIYONG WANG ◽  
ZHERU CHI ◽  
DAGAN FENG ◽  
AH CHUNG TSOI

Content-based image retrieval has become an essential technique in multimedia data management. However, due to the difficulties and complications involved in the various image processing tasks, a robust semantic representation of image content is still very difficult (if not impossible) to achieve. In this paper, we propose a novel content-based image retrieval approach with relevance feedback using adaptive processing of tree-structure image representation. In our approach, each image is first represented with a quad-tree, which is segmentation free. Then a neural network model with the Back-Propagation Through Structure (BPTS) learning algorithm is employed to learn the tree-structure representation of the image content. This approach that integrates image representation and similarity measure in a single framework is applied to the relevance feedback of the content-based image retrieval. In our approach, an initial ranking of the database images is first carried out based on the similarity between the query image and each of the database images according to global features. The user is then asked to categorize the top retrieved images into similar and dissimilar groups. Finally, the BPTS neural network model is used to learn the user's intention for a better retrieval result. This process continues until satisfactory retrieval results are achieved. In the refining process, a fine similarity grading scheme can also be adopted to improve the retrieval performance. Simulations on texture images and scenery pictures have demonstrated promising results which compare favorably with the other relevance feedback methods tested.


Sensors ◽  
2019 ◽  
Vol 19 (8) ◽  
pp. 1932 ◽  
Author(s):  
Huy Hieu Pham ◽  
Houssam Salmane ◽  
Louahdi Khoudour ◽  
Alain Crouzil ◽  
Pablo Zegers ◽  
...  

Designing motion representations for 3D human action recognition from skeleton sequences is an important yet challenging task. An effective representation should be robust to noise, invariant to viewpoint changes and result in a good performance with low-computational demand. Two main challenges in this task include how to efficiently represent spatio–temporal patterns of skeletal movements and how to learn their discriminative features for classification tasks. This paper presents a novel skeleton-based representation and a deep learning framework for 3D action recognition using RGB-D sensors. We propose to build an action map called SPMF (Skeleton Posture-Motion Feature), which is a compact image representation built from skeleton poses and their motions. An Adaptive Histogram Equalization (AHE) algorithm is then applied on the SPMF to enhance their local patterns and form an enhanced action map, namely Enhanced-SPMF. For learning and classification tasks, we exploit Deep Convolutional Neural Networks based on the DenseNet architecture to learn directly an end-to-end mapping between input skeleton sequences and their action labels via the Enhanced-SPMFs. The proposed method is evaluated on four challenging benchmark datasets, including both individual actions, interactions, multiview and large-scale datasets. The experimental results demonstrate that the proposed method outperforms previous state-of-the-art approaches on all benchmark tasks, whilst requiring low computational time for training and inference.


2020 ◽  
Author(s):  
Carlos Caetano ◽  
Jefersson Alex Dos Santos ◽  
William Robson Schwartz

This work addresses the activity recognition problem. We propose two different representations based on motion information for activity recognition. The first representation is a novel temporal stream for two-stream Convolutional Neural Networks (CNNs) that receives as input images computed from the optical flow magnitude and orientation to learn the motion in a better and richer manner. The method applies simple non-linear transformations on the vertical and horizontal components of the optical flow to generate input images for the temporal stream. The second representation is a novel skeleton image representation to be used as input of CNNs. The approach encodes the temporal dynamics by explicitly computing the magnitude and orientation values of the skeleton joints. Experiments carried out on challenging well-known activity recognition datasets (UCF101, NTU RGB+D 60 and NTU RGB+D 120) demonstrate that the proposed representations achieve results in the state of the art, indicating the suitability of our approaches as video representations.


Author(s):  
Huy Hieu Pham ◽  
Houssam Salmane ◽  
Louahdi Khoudour ◽  
Alain Crouzil ◽  
Pablo Zegers ◽  
...  

Designing motion representations for the problem of 3D human action recognition from skeleton sequences is an important yet challenging task. An effective representation should be robust to noise, invariant to viewpoint changes and result in a good performance with low-computational demand. Two main challenges in this task include how to efficiently represent spatio-temporal patterns of skeletal movements and how to learn their discriminative features for classification task. This paper presents a novel skeleton-based representation and a deep learning framework for 3D action recognition using RGB-D sensors. We propose to build an action map called SPMF (Skeleton Posture-Motion Feature), which is a compact image representation built from skeleton poses and their motions. An Adaptive Histogram Equalization (AHE) algorithm is then applied on the SPMF to enhance their local patterns and form an enhanced action map, namely Enhanced-SPMF. For learning and classification tasks, we exploit Deep Convolutional Neural Networks based on the DenseNet architecture to learn directly an end-to-end mapping between input skeleton sequences and their action labels via the Enhanced-SPMFs. The proposed method is evaluated on four challenging benchmark datasets, including both individual actions, interactions, multiview and large-scale datasets. The experimental results demonstrate that the proposed method outperforms previous state-of-the-art approaches on all benchmark tasks, whilst requiring low computational time for training and inference.


2020 ◽  
Vol 513 ◽  
pp. 112-126 ◽  
Author(s):  
Thien Huynh-The ◽  
Cam-Hao Hua ◽  
Trung-Thanh Ngo ◽  
Dong-Seong Kim

2021 ◽  
Vol 5 (1) ◽  
pp. 108-127
Author(s):  
Behzad Mirmahboub ◽  
Deise Santana Maia ◽  
François Merciol ◽  
Sébastien Lefèvre

Abstract Representing an image through a tree structure as provided with a morphological hierarchy enables efficient image analysis and processing methods operating directly on the tree structure. Max-tree and min-tree can be built with efficient algorithms but they only focus on brighter and darker components of the image respectively. Conversely, the Tree-of-Shapes is a self-complementary image representation that provides access to all regional extrema of the image (both brighter and darker components), but its computation is more time-consuming. In this paper, we introduce a new, simple and efficient tree structure called median-tree. It relies on a median image that is straightforwardly constructed by subtracting the median pixel value from an image to decompose it into positive and negative parts. The median tree can then be obtained by applying the efficient max-tree algorithms available in the literature on this median image. We show through theoretical and experimental studies that the median-tree offers similar characteristics to the Tree-of-Shapes, but comes with a considerably lower construction complexity.


Sign in / Sign up

Export Citation Format

Share Document