Human Action Recognition Using CNN-SVM Model

2021 ◽  
Vol 105 ◽  
pp. 282-290
Author(s):  
Vijay Anant Athavale ◽  
Suresh Chand Gupta ◽  
Deepak Kumar ◽  
Savita

In this paper, a pre-trained CNN model VGG16 with the SVM classifier is presented for the HAR task. The deep features are learned via the VGG16 pre-trained CNN model. The VGG 16 network is previously used for the image classification task. We used VGG16 for the signal classification of human activity, which is recorded by the accelerometer sensor of the mobile phone. The UniMiB dataset contains the 11771 samples of the daily life activity of humans. A Smartphone records these samples through the accelerometer sensor. The features are learned via the fifth max-pooling layer of the VGG16 CNN model and feed to the SVM classifier. The SVM classifier replaced the fully connected layer of the VGG16 model. The proposed VGG16-SVM model achieves effective and efficient results. The proposed method of VGG16-SVM is compared with the previously used schemes. The classification accuracy and F-Score are the evaluation parameters, and the proposed method provided 79.55% accuracy and 71.63% F-Score.

Action recognition (AR) plays a fundamental role in computer vision and video analysis. We are witnessing an astronomical increase of video data on the web and it is difficult to recognize the action in video due to different view point of camera. For AR in video sequence, it depends upon appearance in frame and optical flow in frames of video. In video spatial and temporal components of video frames features play integral role for better classification of action in videos. In the proposed system, RGB frames and optical flow frames are used for AR with the help of Convolutional Neural Network (CNN) pre-trained model Alex-Net extract features from fc7 layer. Support vector machine (SVM) classifier is used for the classification of AR in videos. For classification purpose, HMDB51 dataset have been used which includes 51 Classes of human action. The dataset is divided into 51 action categories. Using SVM classifier, extracted features are used for classification and achieved best result 95.6% accuracy as compared to other techniques of the state-of- art.v


Algorithms ◽  
2020 ◽  
Vol 13 (11) ◽  
pp. 301
Author(s):  
Guocheng Liu ◽  
Caixia Zhang ◽  
Qingyang Xu ◽  
Ruoshi Cheng ◽  
Yong Song ◽  
...  

In view of difficulty in application of optical flow based human action recognition due to large amount of calculation, a human action recognition algorithm I3D-shufflenet model is proposed combining the advantages of I3D neural network and lightweight model shufflenet. The 5 × 5 convolution kernel of I3D is replaced by a double 3 × 3 convolution kernels, which reduces the amount of calculations. The shuffle layer is adopted to achieve feature exchange. The recognition and classification of human action is performed based on trained I3D-shufflenet model. The experimental results show that the shuffle layer improves the composition of features in each channel which can promote the utilization of useful information. The Histogram of Oriented Gradients (HOG) spatial-temporal features of the object are extracted for training, which can significantly improve the ability of human action expression and reduce the calculation of feature extraction. The I3D-shufflenet is testified on the UCF101 dataset, and compared with other models. The final result shows that the I3D-shufflenet has higher accuracy than the original I3D with an accuracy of 96.4%.


2018 ◽  
Vol 2018 ◽  
pp. 1-10 ◽  
Author(s):  
P. V. V. Kishore ◽  
K. V. V. Kumar ◽  
E. Kiran Kumar ◽  
A. S. C. S. Sastry ◽  
M. Teja Kiran ◽  
...  

Extracting and recognizing complex human movements from unconstrained online/offline video sequence is a challenging task in computer vision. This paper proposes the classification of Indian classical dance actions using a powerful artificial intelligence tool: convolutional neural networks (CNN). In this work, human action recognition on Indian classical dance videos is performed on recordings from both offline (controlled recording) and online (live performances, YouTube) data. The offline data is created with ten different subjects performing 200 familiar dance mudras/poses from different Indian classical dance forms under various background environments. The online dance data is collected from YouTube for ten different subjects. Each dance pose is occupied for 60 frames or images in a video in both the cases. CNN training is performed with 8 different sample sizes, each consisting of multiple sets of subjects. The remaining 2 samples are used for testing the trained CNN. Different CNN architectures were designed and tested with our data to obtain a better accuracy in recognition. We achieved a 93.33% recognition rate compared to other classifier models reported on the same dataset.


Author(s):  
L V Shiripova ◽  
E V Myasnikov

The paper is devoted to the problem of recognizing human actions in videos recorded in the optical range of wavelengths. An approach proposed in this paper consists in the detection of a moving person on a video sequence with the subsequent size normalization, generation of subsequences and dimensionality reduction using the principal component analysis technique. The classification of human actions is carried out using a support vector machine classifier. Experimental studies performed on the Weizmann dataset allowed us to determine the best values of the method parameters. The results showed that with a small number of action classes, high classification accuracy can be achieved.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Li Li

In accordance with the development trend of competitive aerobics’ arrangement structure, this paper studies the online arrangement method of difficult actions in competitive aerobics based on multimedia technology to improve the arrangement effect. RGB image, optical flow image, and corrected optical flow image are taken as the input modes of difficult action recognition network in competitive aerobics video based on top-down feature fusion. The key frames of input modes in competitive aerobics video are extracted by using the key frame extraction method based on subshot segmentation of a double-threshold sliding window and fully connected graph. Through forward propagation, the score vector of video relative to all categories is obtained, and the probability score of probability distribution is obtained after normalization. The human action recognition in competitive aerobics video is completed, and the online arrangement of difficult action in competitive aerobics is realized based on this. The experimental results show that this method has a high accuracy in identifying difficult actions in competitive aerobics video; the online arrangement of difficult actions in competitive aerobics has obvious advantages, meets the needs of users, and has strong practicability.


Human Action Recognition (HAR) is an interesting and helpful topic in various real-life applications such as surveillance based security system, computer vision and robotics. The selected features and feature representation methods, classification algorithms decides the accuracy of the HAR systems. A new feature called, Skeletonized STIP (Spatio Temporal Interest Points) is identified and used in this work. The skeletonization on the action video’s foreground frames are performed and the new feature is generated as STIP values of the skeleton frame sequence. Then the feature set is used for initial dictionary construction in sparse coding. The data for action recognition is huge, since the feature set is represented using the sparse representation. To refine the sparse representation the max pooling method is used and the action recognition is performed using SVM classifier. The proposed approach outperforms on the benchmark datasets.


2018 ◽  
Vol 7 (2.20) ◽  
pp. 207 ◽  
Author(s):  
K Rajendra Prasad ◽  
P Srinivasa Rao

Human action recognition from 2D videos is a demanding area due to its broad applications. Many methods have been proposed by the researchers for recognizing human actions. The improved accuracy in identifying human actions is desirable. This paper presents an improved method of human action recognition using support vector machine (SVM) classifier. This paper proposes a novel feature descriptor constructed by fusing the various investigated features. The handcrafted features such as scale invariant feature transform (SIFT) features, speed up robust features (SURF), histogram of oriented gradient (HOG) features and local binary pattern (LBP) features are obtained on online 2D action videos. The proposed method is tested on different action datasets having both static and dynamically varying backgrounds. The proposed method achieves shows best recognition rates on both static and dynamically varying backgrounds. The datasets considered for the experimentation are KTH, Weizmann, UCF101, UCF sports actions, MSR action and HMDB51.The performance of the proposed feature fusion model with SVM classifier is compared with the individual features with SVM. The fusion method showed best results. The efficiency of the classifier is also tested by comparing with the other state of the art classifiers such as k-nearest neighbors (KNN), artificial neural network (ANN) and Adaboost classifier. The method achieved an average of 94.41% recognition rate.  


Sign in / Sign up

Export Citation Format

Share Document