Temporal Context Aggregation for Video Retrieval with Contrastive Learning

Most of the video retrieval systems work with a single shot without considering the temporal context in which the shot appears. However, the meaning of a shot depends on the context in which it is situated and a change in the order of the shots within a scene changes the meaning of the shot. Recently, it has been shown that to find higher-level interpretations of a collection of shots (i.e., a sequence), intershot analysis is at least as important as intrashot analysis. Several such interpretations would be impossible without a context. Contextual characterization of video data involves extracting patterns in the temporal behavior of features of video and mapping these patterns to a high-level interpretation. A Dynamic Bayesian Network (DBN) framework is designed with the temporal context of a segment of a video considered at different granularity depending on the desired application. The novel applications of the system include classifying a group of shots called sequence and parsing a video program into individual segments by building a model of the video program.

Download Full-text

Exploiting spatial-temporal context for trajectory based action video retrieval

Multimedia Tools and Applications ◽

10.1007/s11042-017-4353-2 ◽

2017 ◽

Vol 77 (2) ◽

pp. 2057-2081 ◽

Cited By ~ 2

Author(s):

Lelin Zhang ◽

Zhiyong Wang ◽

Tingting Yao ◽

Shin’ichi Staoh ◽

Tao Mei ◽

...

Keyword(s):

Video Retrieval ◽

Temporal Context

Download Full-text

Context-Based Interpretation and Indexing of Video Data

Managing Multimedia Semantics ◽

10.4018/978-1-59140-569-6.ch004 ◽

2011 ◽

pp. 77-98

Author(s):

Ankush Mittal ◽

Cheong Loong Fah ◽

Ashraf Kassim ◽

Krishnan V. Pagalthivarthi

Keyword(s):

Video Retrieval ◽

Dynamic Bayesian Network ◽

Video Data ◽

Temporal Context ◽

Single Shot ◽

Retrieval Systems ◽

High Level ◽

Novel Applications ◽

Video Retrieval Systems ◽

Video Program

Most of the video retrieval systems work with a single shot without considering the temporal context in which the shot appears. However, the meaning of a shot depends on the context in which it is situated and a change in the order of the shots within a scene changes the meaning of the shot. Recently, it has been shown that to find higher-level interpretations of a collection of shots (i.e., a sequence), intershot analysis is at least as important as intrashot analysis. Several such interpretations would be impossible without a context. Contextual characterization of video data involves extracting patterns in the temporal behavior of features of video and mapping these patterns to a high-level interpretation. A Dynamic Bayesian Network (DBN) framework is designed with the temporal context of a segment of a video considered at different granularity depending on the desired application. The novel applications of the system include classifying a group of shots called sequence and parsing a video program into individual segments by building a model of the video program.

Download Full-text