Temporal Segment Connection Network for Action Recognition

In this work, the authors propose several techniques for accelerating a modern action recognition pipeline. This article reviewed several recent and popular action recognition works and selected two of them as part of the tools used for improving the aforementioned acceleration. Specifically, temporal segment networks (TSN), a convolutional neural network (CNN) framework that makes use of a small number of video frames for obtaining robust predictions which have allowed to win the first place in the 2016 ActivityNet challenge, and MotionNet, a convolutional-transposed CNN that is capable of inferring optical flow RGB frames. Together with the last proposal, this article integrated a new software for decoding videos that takes advantage of NVIDIA GPUs. This article shows a proof of concept for this approach by training the RGB stream of the TSN network in videos loaded with NVIDIA Video Loader (NVVL) of a subset of daily actions from the University of Central Florida 101 dataset.

Download Full-text

ATSN: Attention-Based Temporal Segment Network for Action Recognition

Tehnicki vjesnik - Technical Gazette ◽

10.17559/tv-20190506101459 ◽

2019 ◽

Vol 26 (6) ◽

Keyword(s):

Action Recognition ◽

Temporal Segment

Download Full-text

Vowel Recognition Threshold as a Function of Temporal Segmentations

Journal of Speech and Hearing Research ◽

10.1044/jshr.1304.715 ◽

1970 ◽

Vol 13 (4) ◽

pp. 715-724 ◽

Cited By ~ 6

Author(s):

Richard L. Powell ◽

Oscar Tosi

Keyword(s):

Normal Hearing ◽

Recognition Threshold ◽

Vowel Sound ◽

Significant Difference ◽

The Mean ◽

Vowel Recognition ◽

Temporal Segment

Vowels were segmented into 15 different temporal segments taken from the middle of the vowel and ranging from 4 to 60 msecs, then presented to 6 subjects with normal hearing. The mean temporal-segment recognition threshold of 15 msecs with a range from 9.3 msecs for the /u/ to 27.2 milliseconds for the /a/. Misidenti-fication of vowels was most often confused with the vowel sound adjacent to it on the vowel-hump diagram. There was no significant difference between the cardinal and noncardinal vowels.

Download Full-text

Human action recognition using simple geometric features and a finite state machine

Image Processing & Communications ◽

10.2478/v10248-012-0079-y ◽

2013 ◽

Vol 18 (2-3) ◽

pp. 49-60 ◽

Cited By ~ 2

Author(s):

Damian Dudzńiski ◽

Tomasz Kryjak ◽

Zbigniew Mikrut

Keyword(s):

Action Recognition ◽

Finite State Machine ◽

Recognition Rate ◽

Human Action Recognition ◽

Human Action ◽

Video Stream ◽

State Machine ◽

Recognition Algorithm ◽

Finite State ◽

Correct Recognition Rate

Abstract In this paper a human action recognition algorithm, which uses background generation with shadow elimination, silhouette description based on simple geometrical features and a finite state machine for recognizing particular actions is described. The performed tests indicate that this approach obtains a 81 % correct recognition rate allowing real-time image processing of a 360 X 288 video stream.

Download Full-text