Skeleton Image Representation for 3D Action Recognition Based on Tree Structure and Reference Joints

Content-based image retrieval has become an essential technique in multimedia data management. However, due to the difficulties and complications involved in the various image processing tasks, a robust semantic representation of image content is still very difficult (if not impossible) to achieve. In this paper, we propose a novel content-based image retrieval approach with relevance feedback using adaptive processing of tree-structure image representation. In our approach, each image is first represented with a quad-tree, which is segmentation free. Then a neural network model with the Back-Propagation Through Structure (BPTS) learning algorithm is employed to learn the tree-structure representation of the image content. This approach that integrates image representation and similarity measure in a single framework is applied to the relevance feedback of the content-based image retrieval. In our approach, an initial ranking of the database images is first carried out based on the similarity between the query image and each of the database images according to global features. The user is then asked to categorize the top retrieved images into similar and dissimilar groups. Finally, the BPTS neural network model is used to learn the user's intention for a better retrieval result. This process continues until satisfactory retrieval results are achieved. In the refining process, a fine similarity grading scheme can also be adopted to improve the retrieval performance. Simulations on texture images and scenery pictures have demonstrated promising results which compare favorably with the other relevance feedback methods tested.

Download Full-text

Action Recognition with Domain Invariant Features of Skeleton Image

10.1109/avss52988.2021.9663824 ◽

2021 ◽

Author(s):

Han Chen ◽

Yifan Jiang ◽

Hanseok Ko

Keyword(s):

Action Recognition ◽

Invariant Features ◽

Skeleton Image

Download Full-text

Spatio–Temporal Image Representation of 3D Skeletal Movements for View-Invariant Action Recognition with Deep Convolutional Neural Networks

Sensors ◽

10.3390/s19081932 ◽

2019 ◽

Vol 19 (8) ◽

pp. 1932 ◽

Cited By ~ 4

Author(s):

Huy Hieu Pham ◽

Houssam Salmane ◽

Louahdi Khoudour ◽

Alain Crouzil ◽

Pablo Zegers ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Action Recognition ◽

Large Scale ◽

Image Representation ◽

Human Action ◽

Computational Time ◽

Deep Convolutional Neural Networks ◽

Classification Tasks ◽

Spatio Temporal

Designing motion representations for 3D human action recognition from skeleton sequences is an important yet challenging task. An effective representation should be robust to noise, invariant to viewpoint changes and result in a good performance with low-computational demand. Two main challenges in this task include how to efficiently represent spatio–temporal patterns of skeletal movements and how to learn their discriminative features for classification tasks. This paper presents a novel skeleton-based representation and a deep learning framework for 3D action recognition using RGB-D sensors. We propose to build an action map called SPMF (Skeleton Posture-Motion Feature), which is a compact image representation built from skeleton poses and their motions. An Adaptive Histogram Equalization (AHE) algorithm is then applied on the SPMF to enhance their local patterns and form an enhanced action map, namely Enhanced-SPMF. For learning and classification tasks, we exploit Deep Convolutional Neural Networks based on the DenseNet architecture to learn directly an end-to-end mapping between input skeleton sequences and their action labels via the Enhanced-SPMFs. The proposed method is evaluated on four challenging benchmark datasets, including both individual actions, interactions, multiview and large-scale datasets. The experimental results demonstrate that the proposed method outperforms previous state-of-the-art approaches on all benchmark tasks, whilst requiring low computational time for training and inference.

Download Full-text

Adaptive Processing of Tree-Structure Image Representation

Advances in Multimedia Information Processing — PCM 2001 - Lecture Notes in Computer Science ◽

10.1007/3-540-45453-5_133 ◽

2001 ◽

pp. 989-995 ◽

Cited By ~ 3

Author(s):

Zhiyong Wang ◽

Zheru Chi ◽

Dagan Feng ◽

S. Y. Cho

Keyword(s):

Image Representation ◽

Tree Structure ◽

Adaptive Processing

Download Full-text

Motion-Based Representations For Activity Recognition

10.5753/sibgrapi.est.2020.12988 ◽

2020 ◽

Author(s):

Carlos Caetano ◽

Jefersson Alex Dos Santos ◽

William Robson Schwartz

Keyword(s):

Neural Networks ◽

Optical Flow ◽

Convolutional Neural Networks ◽

Activity Recognition ◽

Temporal Dynamics ◽

State Of The Art ◽

Image Representation ◽

Motion Information ◽

Linear Transformations ◽

Skeleton Image

This work addresses the activity recognition problem. We propose two different representations based on motion information for activity recognition. The first representation is a novel temporal stream for two-stream Convolutional Neural Networks (CNNs) that receives as input images computed from the optical flow magnitude and orientation to learn the motion in a better and richer manner. The method applies simple non-linear transformations on the vertical and horizontal components of the optical flow to generate input images for the temporal stream. The second representation is a novel skeleton image representation to be used as input of CNNs. The approach encodes the temporal dynamics by explicitly computing the magnitude and orientation values of the skeleton joints. Experiments carried out on challenging well-known activity recognition datasets (UCF101, NTU RGB+D 60 and NTU RGB+D 120) demonstrate that the proposed representations achieve results in the state of the art, indicating the suitability of our approaches as video representations.

Download Full-text

Spatio-Temporal Image Representation of 3D Skeletal Movements for View-Invariant Action Recognition with Deep Convolutional Neural Networks

10.20944/preprints201903.0086.v1 ◽

2019 ◽

Author(s):

Huy Hieu Pham ◽

Houssam Salmane ◽

Louahdi Khoudour ◽

Alain Crouzil ◽

Pablo Zegers ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Action Recognition ◽

Large Scale ◽

Image Representation ◽

Human Action ◽

Computational Time ◽

Deep Convolutional Neural Networks ◽

Motion Feature ◽

Spatio Temporal

Designing motion representations for the problem of 3D human action recognition from skeleton sequences is an important yet challenging task. An effective representation should be robust to noise, invariant to viewpoint changes and result in a good performance with low-computational demand. Two main challenges in this task include how to efficiently represent spatio-temporal patterns of skeletal movements and how to learn their discriminative features for classification task. This paper presents a novel skeleton-based representation and a deep learning framework for 3D action recognition using RGB-D sensors. We propose to build an action map called SPMF (Skeleton Posture-Motion Feature), which is a compact image representation built from skeleton poses and their motions. An Adaptive Histogram Equalization (AHE) algorithm is then applied on the SPMF to enhance their local patterns and form an enhanced action map, namely Enhanced-SPMF. For learning and classification tasks, we exploit Deep Convolutional Neural Networks based on the DenseNet architecture to learn directly an end-to-end mapping between input skeleton sequences and their action labels via the Enhanced-SPMFs. The proposed method is evaluated on four challenging benchmark datasets, including both individual actions, interactions, multiview and large-scale datasets. The experimental results demonstrate that the proposed method outperforms previous state-of-the-art approaches on all benchmark tasks, whilst requiring low computational time for training and inference.

Download Full-text

Image representation of pose-transition feature for 3D skeleton-based action recognition

Information Sciences ◽

10.1016/j.ins.2019.10.047 ◽

2020 ◽

Vol 513 ◽

pp. 112-126 ◽

Cited By ~ 10

Author(s):

Thien Huynh-The ◽

Cam-Hao Hua ◽

Trung-Thanh Ngo ◽

Dong-Seong Kim

Keyword(s):

Action Recognition ◽

Image Representation ◽

3D Skeleton

Download Full-text

Action Recognition With Spatio–Temporal Visual Attention on Skeleton Image Sequences

IEEE Transactions on Circuits and Systems for Video Technology ◽

10.1109/tcsvt.2018.2864148 ◽

2019 ◽

Vol 29 (8) ◽

pp. 2405-2415 ◽

Cited By ~ 15

Author(s):

Zhengyuan Yang ◽

Yuncheng Li ◽

Jianchao Yang ◽

Jiebo Luo

Keyword(s):

Visual Attention ◽

Action Recognition ◽

Image Sequences ◽

Spatio Temporal ◽

Skeleton Image

Download Full-text

Median-Tree: An Efficient Counterpart of Tree-of-Shapes

Mathematical Morphology - Theory and Applications ◽

10.1515/mathm-2020-0110 ◽

2021 ◽

Vol 5 (1) ◽

pp. 108-127

Author(s):

Behzad Mirmahboub ◽

Deise Santana Maia ◽

François Merciol ◽

Sébastien Lefèvre

Keyword(s):

Image Analysis ◽

Image Representation ◽

Experimental Studies ◽

Efficient Algorithms ◽

Tree Structure ◽

Processing Methods ◽

Tree Algorithms ◽

Pixel Value ◽

Complementary Image ◽

Tree Of Shapes

Abstract Representing an image through a tree structure as provided with a morphological hierarchy enables efficient image analysis and processing methods operating directly on the tree structure. Max-tree and min-tree can be built with efficient algorithms but they only focus on brighter and darker components of the image respectively. Conversely, the Tree-of-Shapes is a self-complementary image representation that provides access to all regional extrema of the image (both brighter and darker components), but its computation is more time-consuming. In this paper, we introduce a new, simple and efficient tree structure called median-tree. It relies on a median image that is straightforwardly constructed by subtracting the median pixel value from an image to decompose it into positive and negative parts. The median tree can then be obtained by applying the efficient max-tree algorithms available in the literature on this median image. We show through theoretical and experimental studies that the median-tree offers similar characteristics to the Tree-of-Shapes, but comes with a considerably lower construction complexity.

Download Full-text

Skeleton Image Representation for 3D Action Recognition Based on Tree Structure and Reference Joints

An image representation of skeletal data for action recognition using convolutional neural networks

CONTENT-BASED IMAGE RETRIEVAL WITH RELEVANCE FEEDBACK USING ADAPTIVE PROCESSING OF TREE-STRUCTURE IMAGE REPRESENTATION

Action Recognition with Domain Invariant Features of Skeleton Image

Spatio–Temporal Image Representation of 3D Skeletal Movements for View-Invariant Action Recognition with Deep Convolutional Neural Networks

Adaptive Processing of Tree-Structure Image Representation

Motion-Based Representations For Activity Recognition

Spatio-Temporal Image Representation of 3D Skeletal Movements for View-Invariant Action Recognition with Deep Convolutional Neural Networks

Image representation of pose-transition feature for 3D skeleton-based action recognition

Action Recognition With Spatio–Temporal Visual Attention on Skeleton Image Sequences

Median-Tree: An Efficient Counterpart of Tree-of-Shapes

Export Citation Format