video descriptor Latest Research Papers

Trafﬁc congestion is a signiﬁcant problem in urban cities and affects economic, health, and social questions. Although many works have been published in the last years to trafﬁc applications based on video data, different techniques of computer vision can be explored in this area. In this work, we proposed a method for trafﬁc ﬂow classiﬁcation using StarRGB and Convolutional Neural Networks (CNN). The StarRGB describes a global representation of the trafﬁc video into a colored image based on motion elements in the scene. Then, the generated image passed as input to a pre-trained CNN to extract the features and classify the trafﬁc video activity in three classes: LIGHT, MEDIUM, and HEAVY. In our experiments using a trafﬁc video database, the proposed method reached an accuracy of 96.47%. Also, the results suggest that StarRGB is a good descriptor for trafﬁc video applications.

Download Full-text

3D+T dense motion trajectories as kinematics primitives to recognize gestures on depth video sequences

Revista Politécnica ◽

10.33571/rpolitec.v15n29a7 ◽

2019 ◽

Vol 15 (29) ◽

pp. 82-94

Author(s):

Fabio Martínez Carrillo ◽

Fabián Castillo ◽

Lola Bautista

Keyword(s):

Human Interaction ◽

Video Sequences ◽

The Novel ◽

Motion Trajectories ◽

Average Accuracy ◽

Depth Sequences ◽

Motion Characterization ◽

Novel Strategy ◽

Depth Video ◽

Video Descriptor

RGB-D sensors have allowed attacking many classical problems in computer vision such as segmentation, scene representations and human interaction, among many others. Regarding motion characterization, typical RGB-D strategies are limited to namely analyze global shape changes and capture scene flow fields to describe local motions in depth sequences. Nevertheless, such strategies only recover motion information among a couple of frames, limiting the analysis of coherent large displacements along time. This work presents a novel strategy to compute 3D+t dense and long motion trajectories as fundamental kinematic primitives to represent video sequences. Each motion trajectory models kinematic words primitives that together can describe complex gestures developed along videos. Such kinematic words were processed into a bag-of-kinematic-words framework to obtain an occurrence video descriptor. The novel video descriptor based on 3D+t motion trajectories achieved an average accuracy of 80% in a dataset of 5 gestures and 100 videos.

Download Full-text

Deep Temporal–Spatial Aggregation for Video-Based Facial Expression Recognition

Symmetry ◽

10.3390/sym11010052 ◽

2019 ◽

Vol 11 (1) ◽

pp. 52 ◽

Cited By ~ 5

Author(s):

Xianzhang Pan ◽

Wenping Guo ◽

Xiaoying Guo ◽

Wenshu Li ◽

Junjie Xu ◽

...

Keyword(s):

Facial Expression ◽

Facial Expression Recognition ◽

State Of The Art ◽

Spatial Aggregation ◽

Expression Recognition ◽

Temporal Features ◽

Visual Descriptors ◽

Feature Aggregation ◽

Temporal Feature ◽

Video Descriptor

The proposed method has 30 streams, i.e., 15 spatial streams and 15 temporal streams. Each spatial stream corresponds to each temporal stream. Therefore, this work correlates with the symmetry concept. It is a difficult task to classify video-based facial expression owing to the gap between the visual descriptors and the emotions. In order to bridge the gap, a new video descriptor for facial expression recognition is presented to aggregate spatial and temporal convolutional features across the entire extent of a video. The designed framework integrates a state-of-the-art 30 stream and has a trainable spatial–temporal feature aggregation layer. This framework is end-to-end trainable for video-based facial expression recognition. Thus, this framework can effectively avoid overfitting to the limited emotional video datasets, and the trainable strategy can learn to better represent an entire video. The different schemas for pooling spatial–temporal features are investigated, and the spatial and temporal streams are best aggregated by utilizing the proposed method. The extensive experiments on two public databases, BAUM-1s and eNTERFACE05, show that this framework has promising performance and outperforms the state-of-the-art strategies.

Download Full-text

DVD: Constructing a Discriminative Video Descriptor by Convolving Frame Features

2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM) ◽

10.1109/bigmm.2018.8499251 ◽

2018 ◽

Author(s):

Yang Bo ◽

Yixin Chen ◽

Wenbo He ◽

Jie Xiang

Keyword(s):

Video Descriptor

Download Full-text

Automatic video descriptor for human action recognition

2017 National Information Technology Conference (NITC) ◽

10.1109/nitc.2017.8285657 ◽

2017 ◽

Author(s):

Minoli Perera ◽

Cassim Farook ◽

A. P. Madurapperuma

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Video Descriptor

Download Full-text

Robust and compact video descriptor learned by deep neural network

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2017.7952539 ◽

2017 ◽

Cited By ~ 1

Author(s):

Yue Nan Li ◽

Xue Piao Chen

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Video Descriptor

Download Full-text

A Video Descriptor Using Orientation Tensors and Shape-Based Trajectory Clustering

International Journal of Image and Graphics ◽

10.1142/s0219467816500170 ◽

2016 ◽

Vol 16 (04) ◽

pp. 1650017

Author(s):

Felipe Andrade Caetano ◽

Marcelo Bernardes Vieira ◽

Rodrigo Luis de Souza da Silva

Keyword(s):

Human Action Recognition ◽

Relevant Information ◽

Human Action ◽

The Self ◽

Gradient Field ◽

Trajectory Clustering ◽

Motion Patterns ◽

Orientation Tensors ◽

The Relationship ◽

Video Descriptor

Dense trajectories have been shown as a very promising method in the human action recognition field. In this paper, we propose a new kind of video descriptor, generated from the relationship between the trajectory’s optical flow with the gradient field in its neighborhood. Orientation tensors are used to accumulate relevant information over the video, representing the tendency of direction in the descriptor space for that kind of movement. Furthermore, a method to cluster trajectories using their shape is proposed. This method allows us to accumulate different motion patterns in different tensors and easier distinguish trajectories that are created by real movements from the trajectories created by the camera’s movement. The proposed method is capable to achieve the best known recognition rates for methods based on the self-descriptor constraint in popular datasets — Hollywood2 (up to 46%) and KTH (up to 94%).

Download Full-text