A spatio-temporal deep learning approach for human action recognition in infrared videos

With the evolution of computing technology in many application like human robot interaction, human computer interaction and health-care system, 3D human body models and their dynamic motions has gained popularity. Human performance accompanies human body shapes and their relative motions. Research on human activity recognition is structured around how the complex movement of a human body is identified and analyzed. Vision based action recognition from video is such kind of tasks where actions are inferred by observing the complete set of action sequence performed by human. Many techniques have been revised over the recent decades in order to develop a robust as well as effective framework for action recognition. In this survey, we summarize recent advances in human action recognition, namely the machine learning approach, deep learning approach and evaluation of these approaches.

Download Full-text

Sequential deep learning approach for human action recognition in infrared videos (Conference Presentation)

Automatic Target Recognition XXX ◽

10.1117/12.2565856 ◽

2020 ◽

Author(s):

Shreya Dey ◽

Ripul Ghosh ◽

Naga Vara Aparna Akula

Keyword(s):

Deep Learning ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Learning Approach

Download Full-text

A Deep Learning Approach for Real-Time 3D Human Action Recognition from Skeletal Data

Lecture Notes in Computer Science - Image Analysis and Recognition ◽

10.1007/978-3-030-27202-9_2 ◽

2019 ◽

pp. 18-32

Author(s):

Huy Hieu Pham ◽

Houssam Salmane ◽

Louahdi Khoudour ◽

Alain Crouzil ◽

Pablo Zegers ◽

...

Keyword(s):

Deep Learning ◽

Real Time ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Learning Approach

Download Full-text

Deep Learning Approach for Human Action Recognition Using Gated Recurrent Unit Neural Networks and Motion Analysis

Journal of Computer Science ◽

10.3844/jcssp.2019.1040.1049 ◽

2019 ◽

Vol 15 (7) ◽

pp. 1040-1049

Author(s):

Neziha Jaouedi ◽

Noureddine Boujnah ◽

Med Salim Bouhlel

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Motion Analysis ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Learning Approach ◽

Gated Recurrent Unit

Download Full-text

Deep Learning Based Human Activity Recognition Using Spatio-Temporal Image Formation of Skeleton Joints

Applied Sciences ◽

10.3390/app11062675 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2675

Author(s):

Nusrat Tasnim ◽

Mohammad Khairul Islam ◽

Joong-Hwan Baek

Keyword(s):

Deep Learning ◽

Activity Recognition ◽

Action Recognition ◽

Human Activity ◽

Image Formation ◽

Human Action Recognition ◽

Human Action ◽

Human Activity Recognition ◽

3D Skeleton ◽

Spatio Temporal

Human activity recognition has become a significant research trend in the fields of computer vision, image processing, and human–machine or human–object interaction due to cost-effectiveness, time management, rehabilitation, and the pandemic of diseases. Over the past years, several methods published for human action recognition using RGB (red, green, and blue), depth, and skeleton datasets. Most of the methods introduced for action classification using skeleton datasets are constrained in some perspectives including features representation, complexity, and performance. However, there is still a challenging problem of providing an effective and efficient method for human action discrimination using a 3D skeleton dataset. There is a lot of room to map the 3D skeleton joint coordinates into spatio-temporal formats to reduce the complexity of the system, to provide a more accurate system to recognize human behaviors, and to improve the overall performance. In this paper, we suggest a spatio-temporal image formation (STIF) technique of 3D skeleton joints by capturing spatial information and temporal changes for action discrimination. We conduct transfer learning (pretrained models- MobileNetV2, DenseNet121, and ResNet18 trained with ImageNet dataset) to extract discriminative features and evaluate the proposed method with several fusion techniques. We mainly investigate the effect of three fusion methods such as element-wise average, multiplication, and maximization on the performance variation to human action recognition. Our deep learning-based method outperforms prior works using UTD-MHAD (University of Texas at Dallas multi-modal human action dataset) and MSR-Action3D (Microsoft action 3D), publicly available benchmark 3D skeleton datasets with STIF representation. We attain accuracies of approximately 98.93%, 99.65%, and 98.80% for UTD-MHAD and 96.00%, 98.75%, and 97.08% for MSR-Action3D skeleton datasets using MobileNetV2, DenseNet121, and ResNet18, respectively.

Download Full-text