Unsupervised Learning Spatio-temporal Features for Human Activity Recognition from RGB-D Video Data

Author(s):  
Guang Chen ◽  
Feihu Zhang ◽  
Manuel Giuliani ◽  
Christian Buckl ◽  
Alois Knoll
Technologies ◽  
2020 ◽  
Vol 8 (4) ◽  
pp. 55
Author(s):  
Evaggelos Spyrou ◽  
Eirini Mathe ◽  
Georgios Pikramenos ◽  
Konstantinos Kechagias ◽  
Phivos Mylonas

Recent advances in big data systems and databases have made it possible to gather raw unlabeled data at unprecedented rates. However, labeling such data constitutes a costly and timely process. This is especially true for video data, and in particular for human activity recognition (HAR) tasks. For this reason, methods for reducing the need of labeled data for HAR applications have drawn significant attention from the research community. In particular, two popular approaches developed to address the above issue are data augmentation and domain adaptation. The former attempts to leverage problem-specific, hand-crafted data synthesizers to augment the training dataset with artificial labeled data instances. The latter attempts to extract knowledge from distinct but related supervised learning tasks for which labeled data is more abundant than the problem at hand. Both methods have been extensively studied and used successfully on various tasks, but a comprehensive comparison of the two has not been carried out in the context of video data HAR. In this work, we fill this gap by providing ample experimental results comparing data augmentation and domain adaptation techniques on a cross-viewpoint, human activity recognition task from pose information.


Sensors ◽  
2021 ◽  
Vol 21 (18) ◽  
pp. 6309
Author(s):  
Elena-Alexandra Budisteanu ◽  
Irina Georgiana Mocanu

Human activity recognition is an extensively researched topic in the last decade. Recent methods employ supervised and unsupervised deep learning techniques in which spatial and temporal dependency is modeled. This paper proposes a novel approach for human activity recognition using skeleton data. The method combines supervised and unsupervised learning algorithms in order to provide qualitative results and performance in real time. The proposed method involves a two-stage framework: the first stage applies an unsupervised clustering technique to group up activities based on their similarity, while the second stage classifies data assigned to each group using graph convolutional networks. Different clustering techniques and data augmentation strategies are explored for improving the training process. The results were compared against the state of the art methods and the proposed model achieved 90.22% Top-1 accuracy performance for NTU-RGB+D dataset (the performance was increased by approximately 9% compared with the baseline graph convolutional method). Moreover, inference time and total number of parameters stay within the same magnitude order. Extending the initial set of activities with additional classes is fast and robust, since there is no required retraining of the entire architecture but only to retrain the cluster to which the activity is assigned.


Sensors ◽  
2021 ◽  
Vol 21 (6) ◽  
pp. 2141
Author(s):  
Ohoud Nafea ◽  
Wadood Abdul ◽  
Ghulam Muhammad ◽  
Mansour Alsulaiman

Human activity recognition (HAR) remains a challenging yet crucial problem to address in computer vision. HAR is primarily intended to be used with other technologies, such as the Internet of Things, to assist in healthcare and eldercare. With the development of deep learning, automatic high-level feature extraction has become a possibility and has been used to optimize HAR performance. Furthermore, deep-learning techniques have been applied in various fields for sensor-based HAR. This study introduces a new methodology using convolution neural networks (CNN) with varying kernel dimensions along with bi-directional long short-term memory (BiLSTM) to capture features at various resolutions. The novelty of this research lies in the effective selection of the optimal video representation and in the effective extraction of spatial and temporal features from sensor data using traditional CNN and BiLSTM. Wireless sensor data mining (WISDM) and UCI datasets are used for this proposed methodology in which data are collected through diverse methods, including accelerometers, sensors, and gyroscopes. The results indicate that the proposed scheme is efficient in improving HAR. It was thus found that unlike other available methods, the proposed method improved accuracy, attaining a higher score in the WISDM dataset compared to the UCI dataset (98.53% vs. 97.05%).


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 130340-130352
Author(s):  
Zhenyu Liu ◽  
Yaqiang Yao ◽  
Yan Liu ◽  
Yuening Zhu ◽  
Zhenchao Tao ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document