Context-Based Interpretation and Indexing of Video Data

Author(s):  
Ankush Mittal ◽  
Cheong Loong Fah ◽  
Ashraf Kassim ◽  
Krishnan V. Pagalthivarthi

Most of the video retrieval systems work with a single shot without considering the temporal context in which the shot appears. However, the meaning of a shot depends on the context in which it is situated and a change in the order of the shots within a scene changes the meaning of the shot. Recently, it has been shown that to find higher-level interpretations of a collection of shots (i.e., a sequence), intershot analysis is at least as important as intrashot analysis. Several such interpretations would be impossible without a context. Contextual characterization of video data involves extracting patterns in the temporal behavior of features of video and mapping these patterns to a high-level interpretation. A Dynamic Bayesian Network (DBN) framework is designed with the temporal context of a segment of a video considered at different granularity depending on the desired application. The novel applications of the system include classifying a group of shots called sequence and parsing a video program into individual segments by building a model of the video program.

2008 ◽  
pp. 527-546
Author(s):  
A. Mittal ◽  
Cheong Loong Fah ◽  
Ashraf Kassim ◽  
Krishnan V. Pagalthivarthi

Most of the video retrieval systems work with a single shot without considering the temporal context in which the shot appears. However, the meaning of a shot depends on the context in which it is situated and a change in the order of the shots within a scene changes the meaning of the shot. Recently, it has been shown that to find higher-level interpretations of a collection of shots (i.e., a sequence), intershot analysis is at least as important as intrashot analysis. Several such interpretations would be impossible without a context. Contextual characterization of video data involves extracting patterns in the temporal behavior of features of video and mapping these patterns to a high-level interpretation. A Dynamic Bayesian Network (DBN) framework is designed with the temporal context of a segment of a video considered at different granularity depending on the desired application. The novel applications of the system include classifying a group of shots called sequence and parsing a video program into individual segments by building a model of the video program.


Author(s):  
Jianping Fan ◽  
Xingquan Zhu ◽  
Jing Xiao

Recent advances in digital video compression and networks have made videos more accessible than ever. Several content-based video retrieval systems have been proposed in the past.  In this chapter, we first review these existing content-based video retrieval systems and then propose a new framework, called ClassView, to make some advances towards more efficient content-based video retrieval. This framework includes: (a) an efficient video content analysis and representation scheme to support high-level visual concept characterization; (b) a hierarchical video classification technique to bridge the semantic gap between low-level visual features and high-level semantic visual concepts; and (c) a hierarchical video database indexing structure to enable video access over large-scale database. Integrating video access with efficient database indexing tree structures has provided a great opportunity for supporting more powerful video search engines.


2021 ◽  
Author(s):  
Jun Gao

Detection of human face has many realistic and important applications such as human and computer interface, face recognition, face image database management, security access control systems and content-based indexing video retrieval systems. In this report a face detection scheme will be presented. The scheme is designed to operate on color images. In the first stage of algorithm, the skin color regions are detected based on the chrominance information. A color segmentation stage is then employed to make skin color regions to be divided into smaller regions which have homogenous color. Then, we use the iterative luminance segmentation to further separate the detected skin region from other skin-colored objects such as hair, clothes, and wood, based on the high variance of the luminance component in the neighborhood of edges of objects. Post-processing is applied to determine whether skin color regions fit the face constrains on density of skin, size, shape and symmetry and contain the facial features such as eyes and mouths. Experimental results show that the algorithm is robust and is capable of detecting multiple faces in the presence of a complex background which contains the color similar to the skin tone.


Author(s):  
Min Chen

The fast proliferation of video data archives has increased the need for automatic video content analysis and semantic video retrieval. Since temporal information is critical in conveying video content, in this chapter, an effective temporal-based event detection framework is proposed to support high-level video indexing and retrieval. The core is a temporal association mining process that systematically captures characteristic temporal patterns to help identify and define interesting events. This framework effectively tackles the challenges caused by loose video structure and class imbalance issues. One of the unique characteristics of this framework is that it offers strong generality and extensibility with the capability of exploring representative event patterns with little human interference. The temporal information and event detection results can then be input into our proposed distributed video retrieval system to support the high-level semantic querying, selective video browsing and event-based video retrieval.


Author(s):  
Min Chen

The fast proliferation of video data archives has increased the need for automatic video content analysis and semantic video retrieval. Since temporal information is critical in conveying video content, in this chapter, an effective temporal-based event detection framework is proposed to support high-level video indexing and retrieval. The core is a temporal association mining process that systematically captures characteristic temporal patterns to help identify and define interesting events. This framework effectively tackles the challenges caused by loose video structure and class imbalance issues. One of the unique characteristics of this framework is that it offers strong generality and extensibility with the capability of exploring representative event patterns with little human interference. The temporal information and event detection results can then be input into our proposed distributed video retrieval system to support the high-level semantic querying, selective video browsing and event-based video retrieval.


2018 ◽  
Vol 7 (S1) ◽  
pp. 58-62
Author(s):  
Gowrisankar Kalakoti ◽  
G. Prabhakaran ◽  
P. Sudhakar

With the improvement of mixed media information composes and accessible transfer speed there is immense interest of video retrieving frameworks, as clients move from content based recovery frameworks to content based retrieval frameworks. Determination of removed features assume an imperative job in substance based video retrieving paying little mind to video qualities being under thought. This work assists the up and coming analysts in the field of video retrieving with getting the thought regarding distinctive procedures and strategies accessible for the video recovery. These highlights are proposed for choosing, ordering and positioning as indicated by their potential enthusiasm to the client. Great feature determination likewise permits the time and space expenses of the recovery procedure to be lessened. This overview surveys the fascinating highlights that can be separated from video information for ordering and retrieving alongside likeness estimation techniques. We likewise recognize present research issues in territory of content based video retrieving frameworks.


Sign in / Sign up

Export Citation Format

Share Document