Fights Behavior Detection Based on Space-Time Interest Points

Human action recognition belongs to the senior visual analysis of computer vision, which involves image processing, artificial intelligence, pattern recognition and so on, is becoming one of the most hot research topic in recent years. In this paper, on the basis of comparative analysis and study towards current methods related to human action recognition, we propose a novel fights behavior detection method which is based on spatial-temporal interest point. Since most information of human action in video are indicated by the space-time interest points of video, we combine spatial-temporal features with motion energy image to describe information of video, and local spatial-temporal features are applied to extract fights behavior model by bags of words. Experimental results show that this method can achieve high accuracy and certain practical value.

Download Full-text

A Hierarchical Bag-of-Words Model Based on Local Space-Time Features for Human Action Recognition

2013 International Conference on IT Convergence and Security (ICITCS) ◽

10.1109/icitcs.2013.6717776 ◽

2013 ◽

Cited By ~ 1

Author(s):

Jiangwei Wu ◽

Daobing Zhou ◽

Guoqiang Xiao

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Space Time ◽

Bag Of Words ◽

Model Based ◽

Local Space

Download Full-text

I3D-Shufflenet Based Human Action Recognition

Algorithms ◽

10.3390/a13110301 ◽

2020 ◽

Vol 13 (11) ◽

pp. 301

Author(s):

Guocheng Liu ◽

Caixia Zhang ◽

Qingyang Xu ◽

Ruoshi Cheng ◽

Yong Song ◽

...

Keyword(s):

Neural Network ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Recognition Algorithm ◽

Convolution Kernel ◽

Histogram Of Oriented Gradients ◽

Temporal Features ◽

Convolution Kernels

In view of difficulty in application of optical flow based human action recognition due to large amount of calculation, a human action recognition algorithm I3D-shufflenet model is proposed combining the advantages of I3D neural network and lightweight model shufflenet. The 5 × 5 convolution kernel of I3D is replaced by a double 3 × 3 convolution kernels, which reduces the amount of calculations. The shuffle layer is adopted to achieve feature exchange. The recognition and classification of human action is performed based on trained I3D-shufflenet model. The experimental results show that the shuffle layer improves the composition of features in each channel which can promote the utilization of useful information. The Histogram of Oriented Gradients (HOG) spatial-temporal features of the object are extracted for training, which can significantly improve the ability of human action expression and reduce the calculation of feature extraction. The I3D-shufflenet is testified on the UCF101 dataset, and compared with other models. The final result shows that the I3D-shufflenet has higher accuracy than the original I3D with an accuracy of 96.4%.

Download Full-text

Distinct Two-Stream Convolutional Networks for Human Action Recognition in Videos Using Segment-Based Temporal Modeling

Data ◽

10.3390/data5040104 ◽

2020 ◽

Vol 5 (4) ◽

pp. 104

Author(s):

Ashok Sarabu ◽

Ajit Kumar Santra

Keyword(s):

Action Recognition ◽

Data Augmentation ◽

Main Idea ◽

Human Action Recognition ◽

Human Action ◽

Great Success ◽

Temporal Modeling ◽

Convolutional Networks ◽

Temporal Features ◽

Augmentation Techniques

The Two-stream convolution neural network (CNN) has proven a great success in action recognition in videos. The main idea is to train the two CNNs in order to learn spatial and temporal features separately, and two scores are combined to obtain final scores. In the literature, we observed that most of the methods use similar CNNs for two streams. In this paper, we design a two-stream CNN architecture with different CNNs for the two streams to learn spatial and temporal features. Temporal Segment Networks (TSN) is applied in order to retrieve long-range temporal features, and to differentiate the similar type of sub-action in videos. Data augmentation techniques are employed to prevent over-fitting. Advanced cross-modal pre-training is discussed and introduced to the proposed architecture in order to enhance the accuracy of action recognition. The proposed two-stream model is evaluated on two challenging action recognition datasets: HMDB-51 and UCF-101. The findings of the proposed architecture shows the significant performance increase and it outperforms the existing methods.

Download Full-text

Human action recognition using short-time motion energy template images and PCANet features

Neural Computing and Applications ◽

10.1007/s00521-020-04712-1 ◽

2020 ◽

Vol 32 (16) ◽

pp. 12561-12574 ◽

Cited By ~ 1

Author(s):

Amany Abdelbaky ◽

Saleh Aly

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Time Motion ◽

Motion Energy ◽

Short Time

Download Full-text

Study of Human Action Recognition Based on Improved Spatio-temporal Features

International Journal of Automation and Computing ◽

10.1007/s11633-014-0831-4 ◽

2014 ◽

Vol 11 (5) ◽

pp. 500-509 ◽

Cited By ~ 12

Author(s):

Xiao-Fei Ji ◽

Qian-Qian Wu ◽

Zhao-Jie Ju ◽

Yang-Yang Wang

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text

Human Action Recognition from Motion Trajectory using Fourier Temporal Features of Skeleton Joints

2018 International Conference on Advances in Computing and Communication Engineering (ICACCE) ◽

10.1109/icacce.2018.8441712 ◽

2018 ◽

Author(s):

Naresh Kumar ◽

Nagarajan Sukavanam

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Motion Trajectory ◽

Temporal Features

Download Full-text

Human Action Recognition Using Improved Salient Dense Trajectories

Computational Intelligence and Neuroscience ◽

10.1155/2016/6750459 ◽

2016 ◽

Vol 2016 ◽

pp. 1-11 ◽

Cited By ~ 3

Author(s):

Qingwu Li ◽

Haisu Cheng ◽

Yan Zhou ◽

Guanying Huo

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Human Action Recognition ◽

Human Action ◽

Interest Points ◽

Dense Trajectories ◽

Dense Trajectory ◽

Sparse Coefficient ◽

Active Research ◽

Motion Saliency

Human action recognition in videos is a topic of active research in computer vision. Dense trajectory (DT) features were shown to be efficient for representing videos in state-of-the-art approaches. In this paper, we present a more effective approach of video representation using improved salient dense trajectories: first, detecting the motion salient region and extracting the dense trajectories by tracking interest points in each spatial scale separately and then refining the dense trajectories via the analysis of the motion saliency. Then, we compute several descriptors (i.e., trajectory displacement, HOG, HOF, and MBH) in the spatiotemporal volume aligned with the trajectories. Finally, in order to represent the videos better, we optimize the framework of bag-of-words according to the motion salient intensity distribution and the idea of sparse coefficient reconstruction. Our architecture is trained and evaluated on the four standard video actions datasets of KTH, UCF sports, HMDB51, and UCF50, and the experimental results show that our approach performs competitively comparing with the state-of-the-art results.

Download Full-text

Human Action Recognition Based on Normalized Interest Points and Super-Interest Points

International Journal of Humanoid Robotics ◽

10.1142/s0219843614500054 ◽

2014 ◽

Vol 11 (01) ◽

pp. 1450005

Author(s):

Yangyang Wang ◽

Yibo Li ◽

Xiaofei Ji

Keyword(s):

Action Recognition ◽

Clustering Algorithm ◽

Three Dimensional ◽

Temporal Correlation ◽

Human Action Recognition ◽

Human Action ◽

Feature Representation ◽

Interest Point ◽

Interest Points ◽

Active Research

Visual-based human action recognition is currently one of the most active research topics in computer vision. The feature representation directly has a crucial impact on the performance of the recognition. Feature representation based on bag-of-words is popular in current research, but the spatial and temporal relationship among these features is usually discarded. In order to solve this issue, a novel feature representation based on normalized interest points is proposed and utilized to recognize the human actions. The novel representation is called super-interest point. The novelty of the proposed feature is that the spatial-temporal correlation between the interest points and human body can be directly added to the representation without considering scale and location variance of the points by introducing normalized points clustering. The novelty concerns three tasks. First, to solve the diversity of human location and scale, interest points are normalized based on the normalization of the human region. Second, to obtain the spatial-temporal correlation among the interest points, the normalized points with similar spatial and temporal distance are constructed to a super-interest point by using three-dimensional clustering algorithm. Finally, by describing the appearance characteristic of the super-interest points and location relationship among the super-interest points, a new feature representation is gained. The proposed representation formation sets up the relationship among local features and human figure. Experiments on Weizmann, KTH, and UCF sports dataset demonstrate that the proposed feature is effective for human action recognition.

Download Full-text