scholarly journals A Spatiotemporal Heterogeneous Two-Stream Network for Action Recognition

IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 57267-57275 ◽  
Author(s):  
Enqing Chen ◽  
Xue Bai ◽  
Lei Gao ◽  
Haron Chweya Tinega ◽  
Yingqiang Ding
2020 ◽  
Vol 27 ◽  
pp. 2188-2188
Author(s):  
Didik Purwanto ◽  
Rizard Renanda Adhi Pramono ◽  
Yie-Tarng Chen ◽  
Wen-Hsien Fang

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Fengqing Jiang ◽  
Xiao Chen

The advancements in modern science and technology have greatly promoted the progress of sports science. Advanced technological methods have been widely used in sports training, which have not only improved the scientific level of training but also promoted the continuous growth of sports technology and competition results. With the development of sports science and the gradual deepening of sport practices, the use of scientific training methods and monitoring approaches has improved the effect of sports training and athletes’ performance. This paper takes sprint as the research problem and constructs the image of sprinter’s action recognition based on machine learning. In view of the shortcomings of traditional dual-stream convolutional neural network for processing long-term video information, the time-segmented dual-stream network, based on sparse sampling, is used to better express the characteristics of long-term motion. First, the continuous video frame data is divided into multiple segments, and a short sequence of data containing user actions is formed by randomly sampling each segment of the video frame sequence. Next, it is applied to the dual-stream network for feature extraction. The optical flow image extraction involved in the dual-stream network is implemented by the system using the Lucas–Kanade algorithm. The system in this paper has been tested in actual scenarios, and the results show that the system design meets the expected requirements of the sprinters.


2020 ◽  
Vol 69 (7) ◽  
pp. 7930-7939 ◽  
Author(s):  
Biyun Sheng ◽  
Yuanrun Fang ◽  
Fu Xiao ◽  
Lijuan Sun

Algorithms ◽  
2020 ◽  
Vol 13 (7) ◽  
pp. 169
Author(s):  
Xiao Wu ◽  
Qingge Ji

Modeling spatiotemporal representations is one of the most essential yet challenging issues in video action recognition. Existing methods lack the capacity to accurately model either the correlations between spatial and temporal features or the global temporal dependencies. Inspired by the two-stream network for video action recognition, we propose an encoder–decoder framework named Two-Stream Bidirectional Long Short-Term Memory (LSTM) Residual Network (TBRNet) which takes advantage of the interaction between spatiotemporal representations and global temporal dependencies. In the encoding phase, the two-stream architecture, based on the proposed Residual Convolutional 3D (Res-C3D) network, extracts features with residual connections inserted between the two pathways, and then the features are fused to become the short-term spatiotemporal features of the encoder. In the decoding phase, those short-term spatiotemporal features are first fed into a temporal attention-based bidirectional LSTM (BiLSTM) network to obtain long-term bidirectional attention-pooling dependencies. Subsequently, those temporal dependencies are integrated with short-term spatiotemporal features to obtain global spatiotemporal relationships. On two benchmark datasets, UCF101 and HMDB51, we verified the effectiveness of our proposed TBRNet by a series of experiments, and it achieved competitive or even better results compared with existing state-of-the-art approaches.


Sensors ◽  
2019 ◽  
Vol 19 (6) ◽  
pp. 1382 ◽  
Author(s):  
Jongkwang Hong ◽  
Bora Cho ◽  
Yong Hong ◽  
Hyeran Byun

In action recognition research, two primary types of information are appearance and motion information that is learned from RGB images through visual sensors. However, depending on the action characteristics, contextual information, such as the existence of specific objects or globally-shared information in the image, becomes vital information to define the action. For example, the existence of the ball is vital information distinguishing “kicking” from “running”. Furthermore, some actions share typical global abstract poses, which can be used as a key to classify actions. Based on these observations, we propose the multi-stream network model, which incorporates spatial, temporal, and contextual cues in the image for action recognition. We experimented on the proposed method using C3D or inflated 3D ConvNet (I3D) as a backbone network, regarding two different action recognition datasets. As a result, we observed overall improvement in accuracy, demonstrating the effectiveness of our proposed method.


2019 ◽  
Vol 26 (8) ◽  
pp. 1187-1191 ◽  
Author(s):  
Didik Purwanto ◽  
Rizard Renanda Adhi Pramono ◽  
Yie-Tarng Chen ◽  
Wen-Hsien Fang

Sign in / Sign up

Export Citation Format

Share Document