scholarly journals Human action recognition based on deep network and feature fusion

Filomat ◽  
2020 ◽  
Vol 34 (15) ◽  
pp. 4967-4974
Author(s):  
Dongli Wang ◽  
Jun Yang ◽  
Yan Zhou ◽  
Zhen Zhou

Feature representation is of vital importance for human action recognition. In recent few years, the application of deep learning in action recognition has become popular. However, for action recognition in videos, the advantage of single convolution feature over traditional methods is not so evident. In this paper, a novel feature representation that combines spatial and temporal feature with global motion information is proposed. Specifically, spatial and temporal feature from RGB images is extracted by convolutional neural network (CNN) and long short-term memory (LSTM) network. On the other hand, global motion information extracted from motion difference images using another separate CNN. Hereby, the motion difference images are binary video frames processed by exclusive or (XOR). Finally, support vector machine (SVM) is adopted as classifier. Experimental results on YouTube Action and UCF-50 show the superiority of the proposed method.

Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-20 ◽  
Author(s):  
Yang Liu ◽  
Zhaoyang Lu ◽  
Jing Li ◽  
Chao Yao ◽  
Yanzi Deng

Recently, infrared human action recognition has attracted increasing attention for it has many advantages over visible light, that is, being robust to illumination change and shadows. However, the infrared action data is limited until now, which degrades the performance of infrared action recognition. Motivated by the idea of transfer learning, an infrared human action recognition framework using auxiliary data from visible light is proposed to solve the problem of limited infrared action data. In the proposed framework, we first construct a novel Cross-Dataset Feature Alignment and Generalization (CDFAG) framework to map the infrared data and visible light data into a common feature space, where Kernel Manifold Alignment (KEMA) and a dual aligned-to-generalized encoders (AGE) model are employed to represent the feature. Then, a support vector machine (SVM) is trained, using both the infrared data and visible light data, and can classify the features derived from infrared data. The proposed method is evaluated on InfAR, which is a publicly available infrared human action dataset. To build up auxiliary data, we set up a novel visible light action dataset XD145. Experimental results show that the proposed method can achieve state-of-the-art performance compared with several transfer learning and domain adaptation methods.


2018 ◽  
Vol 7 (2.20) ◽  
pp. 207 ◽  
Author(s):  
K Rajendra Prasad ◽  
P Srinivasa Rao

Human action recognition from 2D videos is a demanding area due to its broad applications. Many methods have been proposed by the researchers for recognizing human actions. The improved accuracy in identifying human actions is desirable. This paper presents an improved method of human action recognition using support vector machine (SVM) classifier. This paper proposes a novel feature descriptor constructed by fusing the various investigated features. The handcrafted features such as scale invariant feature transform (SIFT) features, speed up robust features (SURF), histogram of oriented gradient (HOG) features and local binary pattern (LBP) features are obtained on online 2D action videos. The proposed method is tested on different action datasets having both static and dynamically varying backgrounds. The proposed method achieves shows best recognition rates on both static and dynamically varying backgrounds. The datasets considered for the experimentation are KTH, Weizmann, UCF101, UCF sports actions, MSR action and HMDB51.The performance of the proposed feature fusion model with SVM classifier is compared with the individual features with SVM. The fusion method showed best results. The efficiency of the classifier is also tested by comparing with the other state of the art classifiers such as k-nearest neighbors (KNN), artificial neural network (ANN) and Adaboost classifier. The method achieved an average of 94.41% recognition rate.  


2014 ◽  
Vol 889-890 ◽  
pp. 1057-1064
Author(s):  
Rui Feng Li ◽  
Liang Liang Wang ◽  
Teng Fei Zhang

Human action often requires a large volume and computation-consuming representation for an accurate recognition with good diversity as the large complexity and variability of actions and scenarios. In this paper, an efficiency combined action representation approach is proposed to deal with the dilemma between accuracy and diversity. Two action features are extracted for combination from a Kinect sensor: silhouette and 3D message. An improved Histograms of Gradient named Interest-HOG is proposed for silhouette representation while the feature angles between skeleton points are calculated as the 3D representation. Kernel Principle Componet Analysis (KPCA) is also applied bidirectionally in our work to process the Interest-HOG descriptor for getting a concise and normative vector whose volume is same as the 3D one aimed at a successful combining. A depth dataset named DS&SP including 10 kinds of actions performed by 12 persons in 4 scenarios is built as the benchmark for our approach based on which Support Vector Machine (SVM) is employed for training and testing. Experimental results show that our approach has good performance in accuracy, efficiency and robustness of self-occlusion.


Author(s):  
L. Nirmala Devi ◽  
A.Nageswar Rao

Human action recognition (HAR) is one of most significant research topics, and it has attracted the concentration of many researchers. Automatic HAR system is applied in several fields like visual surveillance, data retrieval, healthcare, etc. Based on this inspiration, in this chapter, the authors propose a new HAR model that considers an image as input and analyses and exposes the action present in it. Under the analysis phase, they implement two different feature extraction methods with the help of rotation invariant Gabor filter and edge adaptive wavelet filter. For every action image, a new vector called as composite feature vector is formulated and then subjected to dimensionality reduction through principal component analysis (PCA). Finally, the authors employ the most popular supervised machine learning algorithm (i.e., support vector machine [SVM]) for classification. Simulation is done over two standard datasets; they are KTH and Weizmann, and the performance is measured through an accuracy metric.


2020 ◽  
Vol 2020 ◽  
pp. 1-18
Author(s):  
Chao Tang ◽  
Huosheng Hu ◽  
Wenjian Wang ◽  
Wei Li ◽  
Hua Peng ◽  
...  

The representation and selection of action features directly affect the recognition effect of human action recognition methods. Single feature is often affected by human appearance, environment, camera settings, and other factors. Aiming at the problem that the existing multimodal feature fusion methods cannot effectively measure the contribution of different features, this paper proposed a human action recognition method based on RGB-D image features, which makes full use of the multimodal information provided by RGB-D sensors to extract effective human action features. In this paper, three kinds of human action features with different modal information are proposed: RGB-HOG feature based on RGB image information, which has good geometric scale invariance; D-STIP feature based on depth image, which maintains the dynamic characteristics of human motion and has local invariance; and S-JRPF feature-based skeleton information, which has good ability to describe motion space structure. At the same time, multiple K-nearest neighbor classifiers with better generalization ability are used to integrate decision-making classification. The experimental results show that the algorithm achieves ideal recognition results on the public G3D and CAD60 datasets.


Sign in / Sign up

Export Citation Format

Share Document