Real-Time Human Action Recognition Using Deep Learning Architecture

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026821500267 ◽

2021 ◽

Author(s):

Souhila Kahlouche ◽

Mahmoud Belhocine ◽

Abdallah Menouar

Keyword(s):

Deep Learning ◽

Real Time ◽

Action Recognition ◽

Spatial Data ◽

Short Term Memory ◽

Human Action Recognition ◽

Human Action ◽

Microsoft Kinect ◽

Kinect Camera ◽

Public Datasets

In this work, efficient human activity recognition (HAR) algorithm based on deep learning architecture is proposed to classify activities into seven different classes. In order to learn spatial and temporal features from only 3D skeleton data captured from a “Microsoft Kinect” camera, the proposed algorithm combines both convolution neural network (CNN) and long short-term memory (LSTM) architectures. This combination allows taking advantage of LSTM in modeling temporal data and of CNN in modeling spatial data. The captured skeleton sequences are used to create a specific dataset of interactive activities; these data are then transformed according to a view invariant and a symmetry criterion. To demonstrate the effectiveness of the developed algorithm, it has been tested on several public datasets and it has achieved and sometimes has overcome state-of-the-art performance. In order to verify the uncertainty of the proposed algorithm, some tools are provided and discussed to ensure its efficiency for continuous human action recognition in real time.

Download Full-text

Real-Time Human Action Recognition with a Low-Cost RGB Camera and Mobile Robot Platform

Sensors ◽

10.3390/s20102886 ◽

2020 ◽

Vol 20 (10) ◽

pp. 2886 ◽

Cited By ~ 1

Author(s):

Junwoo Lee ◽

Bummo Ahn

Keyword(s):

Mobile Robot ◽

Real Time ◽

Action Recognition ◽

Low Cost ◽

Human Action Recognition ◽

Human Action ◽

Research Area ◽

Training Dataset ◽

Kinect Camera ◽

Robot Platform

Human action recognition is an important research area in the field of computer vision that can be applied in surveillance, assisted living, and robotic systems interacting with people. Although various approaches have been widely used, recent studies have mainly focused on deep-learning networks using Kinect camera that can easily generate data on skeleton joints using depth data, and have achieved satisfactory performances. However, their models are deep and complex to achieve a higher recognition score; therefore, they cannot be applied to a mobile robot platform using a Kinect camera. To overcome these limitations, we suggest a method to classify human actions in real-time using a single RGB camera, which can be applied to the mobile robot platform as well. We integrated two open-source libraries, i.e., OpenPose and 3D-baseline, to extract skeleton joints on RGB images, and classified the actions using convolutional neural networks. Finally, we set up the mobile robot platform including an NVIDIA JETSON XAVIER embedded board and tracking algorithm to monitor a person continuously. We achieved an accuracy of 70% on the NTU-RGBD training dataset, and the whole process was performed on an average of 15 frames per second (FPS) on an embedded board system.

Download Full-text

A Deep Learning Approach for Real-Time 3D Human Action Recognition from Skeletal Data

Lecture Notes in Computer Science - Image Analysis and Recognition ◽

10.1007/978-3-030-27202-9_2 ◽

2019 ◽

pp. 18-32

Author(s):

Huy Hieu Pham ◽

Houssam Salmane ◽

Louahdi Khoudour ◽

Alain Crouzil ◽

Pablo Zegers ◽

...

Keyword(s):

Deep Learning ◽

Real Time ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Learning Approach

Download Full-text

A Deep Learning-Based Approach to Enable Action Recognition for Construction Equipment

Advances in Civil Engineering ◽

10.1155/2020/8812928 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14

Author(s):

Jinyue Zhang ◽

Lijun Zi ◽

Yuexian Hou ◽

Mingen Wang ◽

Wenting Jiang ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Action Recognition ◽

Short Term Memory ◽

Human Action Recognition ◽

Human Action ◽

Image Features ◽

Construction Equipment ◽

Video Frame ◽

Convolutional Network

In order to support smart construction, digital twin has been a well-recognized concept for virtually representing the physical facility. It is equally important to recognize human actions and the movement of construction equipment in virtual construction scenes. Compared to the extensive research on human action recognition (HAR) that can be applied to identify construction workers, research in the field of construction equipment action recognition (CEAR) is very limited, mainly due to the lack of available datasets with videos showing the actions of construction equipment. The contributions of this research are as follows: (1) the development of a comprehensive video dataset of 2,064 clips with five action types for excavators and dump trucks; (2) a new deep learning-based CEAR approach (known as a simplified temporal convolutional network or STCN) that combines a convolutional neural network (CNN) with long short-term memory (LSTM, an artificial recurrent neural network), where CNN is used to extract image features and LSTM is used to extract temporal features from video frame sequences; and (3) the comparison between this proposed new approach and a similar CEAR method and two of the best-performing HAR approaches, namely, three-dimensional (3D) convolutional networks (ConvNets) and two-stream ConvNets, to evaluate the performance of STCN and investigate the possibility of directly transferring HAR approaches to the field of CEAR.

Download Full-text

Deep Learning for Human Action Recognition Survey

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i10.323328 ◽

2018 ◽

Vol 6 (10) ◽

pp. 323-328

Author(s):

K.Kiruba . ◽

D. Shiloah Elizabeth ◽

C Sunil Retmin Raj

Keyword(s):

Deep Learning ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action

Download Full-text

Low-Cost Embedded System Using Convolutional Neural Networks-Based Spatiotemporal Feature Map for Real-Time Human Action Recognition

Applied Sciences ◽

10.3390/app11114940 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4940

Author(s):

Jinsoo Kim ◽

Jeongho Cho

Keyword(s):

Embedded System ◽

Real Time ◽

Action Recognition ◽

Processing Speed ◽

Recognition Accuracy ◽

Low Cost ◽

Human Action Recognition ◽

Human Action ◽

Video Data ◽

Feature Maps

The field of research related to video data has difficulty in extracting not only spatial but also temporal features and human action recognition (HAR) is a representative field of research that applies convolutional neural network (CNN) to video data. The performance for action recognition has improved, but owing to the complexity of the model, some still limitations to operation in real-time persist. Therefore, a lightweight CNN-based single-stream HAR model that can operate in real-time is proposed. The proposed model extracts spatial feature maps by applying CNN to the images that develop the video and uses the frame change rate of sequential images as time information. Spatial feature maps are weighted-averaged by frame change, transformed into spatiotemporal features, and input into multilayer perceptrons, which have a relatively lower complexity than other HAR models; thus, our method has high utility in a single embedded system connected to CCTV. The results of evaluating action recognition accuracy and data processing speed through challenging action recognition benchmark UCF-101 showed higher action recognition accuracy than the HAR model using long short-term memory with a small amount of video frames and confirmed the real-time operational possibility through fast data processing speed. In addition, the performance of the proposed weighted mean-based HAR model was verified by testing it in Jetson NANO to confirm the possibility of using it in low-cost GPU-based embedded systems.

Download Full-text