Sample weights determination based on cosine similarity method as an extension to infrared action recognition

2021 ◽  
pp. 1-12
Author(s):  
Hongzhong Hei ◽  
Xianzhong Jian ◽  
Erliang Xiao

The widespread application of infrared human action recognition in intelligent surveillance has attracted significant attention. However, the infrared action recognition dataset is limited, which limits the development of infrared action recognition. Existing methods for infrared action recognition are based on features in the same sample, without paying attention to within-class differences. Motivated by the idea of weighting video information, this paper proposes a novel infrared action recognition framework to reweight the samples of training sets named REWS to solve the problems of limited infrared action data and the large within-class differences in the infrared action recognition dataset. In the proposed framework, we first map infrared action video data to a low-dimensional feature space, and use the cosine similarity between the feature data of the training set and the testing set to determine the weight of the training set samples. Each training set sample has an independent weight. Then, a support vector machine (SVM) is trained by the training sets with weights to recognize the infrared actions. Experimental results demonstrate that our approach can achieve state-of-the-art performance compared with hand-crafted features based methods on the benchmark InfAR dataset.

2021 ◽  
Vol 11 (11) ◽  
pp. 4940
Author(s):  
Jinsoo Kim ◽  
Jeongho Cho

The field of research related to video data has difficulty in extracting not only spatial but also temporal features and human action recognition (HAR) is a representative field of research that applies convolutional neural network (CNN) to video data. The performance for action recognition has improved, but owing to the complexity of the model, some still limitations to operation in real-time persist. Therefore, a lightweight CNN-based single-stream HAR model that can operate in real-time is proposed. The proposed model extracts spatial feature maps by applying CNN to the images that develop the video and uses the frame change rate of sequential images as time information. Spatial feature maps are weighted-averaged by frame change, transformed into spatiotemporal features, and input into multilayer perceptrons, which have a relatively lower complexity than other HAR models; thus, our method has high utility in a single embedded system connected to CCTV. The results of evaluating action recognition accuracy and data processing speed through challenging action recognition benchmark UCF-101 showed higher action recognition accuracy than the HAR model using long short-term memory with a small amount of video frames and confirmed the real-time operational possibility through fast data processing speed. In addition, the performance of the proposed weighted mean-based HAR model was verified by testing it in Jetson NANO to confirm the possibility of using it in low-cost GPU-based embedded systems.


Author(s):  
L. Nirmala Devi ◽  
A.Nageswar Rao

Human action recognition (HAR) is one of most significant research topics, and it has attracted the concentration of many researchers. Automatic HAR system is applied in several fields like visual surveillance, data retrieval, healthcare, etc. Based on this inspiration, in this chapter, the authors propose a new HAR model that considers an image as input and analyses and exposes the action present in it. Under the analysis phase, they implement two different feature extraction methods with the help of rotation invariant Gabor filter and edge adaptive wavelet filter. For every action image, a new vector called as composite feature vector is formulated and then subjected to dimensionality reduction through principal component analysis (PCA). Finally, the authors employ the most popular supervised machine learning algorithm (i.e., support vector machine [SVM]) for classification. Simulation is done over two standard datasets; they are KTH and Weizmann, and the performance is measured through an accuracy metric.


Sensors ◽  
2019 ◽  
Vol 19 (7) ◽  
pp. 1599 ◽  
Author(s):  
Md Uddin ◽  
Young-Koo Lee

Human action recognition plays a significant part in the research community due to its emerging applications. A variety of approaches have been proposed to resolve this problem, however, several issues still need to be addressed. In action recognition, effectively extracting and aggregating the spatial-temporal information plays a vital role to describe a video. In this research, we propose a novel approach to recognize human actions by considering both deep spatial features and handcrafted spatiotemporal features. Firstly, we extract the deep spatial features by employing a state-of-the-art deep convolutional network, namely Inception-Resnet-v2. Secondly, we introduce a novel handcrafted feature descriptor, namely Weber’s law based Volume Local Gradient Ternary Pattern (WVLGTP), which brings out the spatiotemporal features. It also considers the shape information by using gradient operation. Furthermore, Weber’s law based threshold value and the ternary pattern based on an adaptive local threshold is presented to effectively handle the noisy center pixel value. Besides, a multi-resolution approach for WVLGTP based on an averaging scheme is also presented. Afterward, both these extracted features are concatenated and feed to the Support Vector Machine to perform the classification. Lastly, the extensive experimental analysis shows that our proposed method outperforms state-of-the-art approaches in terms of accuracy.


Author(s):  
Xueping Liu ◽  
Xingzuo Yue

The kernel function has been successfully utilized in the extreme learning machine (ELM) that provides a stabilized and generalized performance and greatly reduces the computational complexity. However, the selection and optimization of the parameters constituting the most common kernel functions are tedious and time-consuming. In this study, a set of new Hermit kernel functions derived from the generalized Hermit polynomials has been proposed. The significant contributions of the proposed kernel include only one parameter selected from a small set of natural numbers; thus, the parameter optimization is greatly facilitated and excessive structural information of the sample data is retained. Consequently, the new kernel functions can be used as optimal alternatives to other common kernel functions for ELM at a rapid learning speed. The experimental results showed that the proposed kernel ELM method tends to have similar or better robustness and generalized performance at a faster learning speed than the other common kernel ELM and support vector machine methods. Consequently, when applied to human action recognition by depth video sequence, the method also achieves excellent performance, demonstrating its time-based advantage on the video image data.


2019 ◽  
Vol 9 (10) ◽  
pp. 2126 ◽  
Author(s):  
Suge Dong ◽  
Daidi Hu ◽  
Ruijun Li ◽  
Mingtao Ge

Aimed at the problems of high redundancy of trajectory and susceptibility to background interference in traditional dense trajectory behavior recognition methods, a human action recognition method based on foreground trajectory and motion difference descriptors is proposed. First, the motion magnitude of each frame is estimated by optical flow, and the foreground region is determined according to each motion magnitude of the pixels; the trajectories are only extracted from behavior-related foreground regions. Second, in order to better describe the relative temporal information between different actions, a motion difference descriptor is introduced to describe the foreground trajectory, and the direction histogram of the motion difference is constructed by calculating the direction information of the motion difference per unit time of the trajectory point. Finally, a Fisher vector (FV) is used to encode histogram features to obtain video-level action features, and a support vector machine (SVM) is utilized to classify the action category. Experimental results show that this method can better extract the action-related trajectory, and it can improve the recognition accuracy by 7% compared to the traditional dense trajectory method.


Sign in / Sign up

Export Citation Format

Share Document