Temporal-Based Video Event Detection and Retrieval

Author(s):  
Min Chen

The fast proliferation of video data archives has increased the need for automatic video content analysis and semantic video retrieval. Since temporal information is critical in conveying video content, in this chapter, an effective temporal-based event detection framework is proposed to support high-level video indexing and retrieval. The core is a temporal association mining process that systematically captures characteristic temporal patterns to help identify and define interesting events. This framework effectively tackles the challenges caused by loose video structure and class imbalance issues. One of the unique characteristics of this framework is that it offers strong generality and extensibility with the capability of exploring representative event patterns with little human interference. The temporal information and event detection results can then be input into our proposed distributed video retrieval system to support the high-level semantic querying, selective video browsing and event-based video retrieval.

Author(s):  
Min Chen

The fast proliferation of video data archives has increased the need for automatic video content analysis and semantic video retrieval. Since temporal information is critical in conveying video content, in this chapter, an effective temporal-based event detection framework is proposed to support high-level video indexing and retrieval. The core is a temporal association mining process that systematically captures characteristic temporal patterns to help identify and define interesting events. This framework effectively tackles the challenges caused by loose video structure and class imbalance issues. One of the unique characteristics of this framework is that it offers strong generality and extensibility with the capability of exploring representative event patterns with little human interference. The temporal information and event detection results can then be input into our proposed distributed video retrieval system to support the high-level semantic querying, selective video browsing and event-based video retrieval.


2015 ◽  
Vol 2015 ◽  
pp. 1-10
Author(s):  
Shao-nian Huang ◽  
Dong-jun Huang ◽  
Mansoor Ahmed Khuhro

Video event detection is a challenging problem in many applications, such as video surveillance and video content analysis. In this paper, we propose a new framework to perceive high-level codewords by analyzing temporal relationship between different channels of video features. The low-level vocabulary words are firstly generated after different audio and visual feature extraction. A weighted undirected graph is constructed by exploring the Granger Causality between low-level words. Then, a greedy agglomerative graph-partitioning method is used to discover low-level word groups which have similar temporal pattern. The high-level codebooks representation is obtained by quantification of low-level words groups. Finally, multiple kernel learning, combined with our high-level codewords, is used to detect the video event. Extensive experimental results show that the proposed method achieves preferable results in video event detection.


2001 ◽  
Vol 01 (01) ◽  
pp. 63-81 ◽  
Author(s):  
ALAN HANJALIC ◽  
REGINALD L. LAGENDIJK ◽  
JAN BIEMOND

This paper addresses the problem of automatically partitioning a video into semantic segments using visual low-level features only. Semantic segments may be understood as building content blocks of a video with a clear sequential content structure. Examples are reports in a news program, episodes in a movie, scenes of a situation comedy or topic segments of a documentary. In some video genres like news programs or documentaries, the usage of different media (visual, audio, speech, text) may be beneficial or is even unavoidable for reliably detecting the boundaries between semantic segments. In many other genres, however, the pay-off in using different media for the purpose of high-level segmentation is not high. On the one hand, relating the audio, speech or text to the semantic temporal structure of video content is generally very difficult. This is especially so in "acting" video genres like movies and situation comedies. On the other hand, the information contained in the visual stream of these video genres often seems to provide the major clue about the position of semantic segments boundaries. Partitioning a video into semantic segments can be performed by measuring the coherence of the content along neighboring video shots of a sequence. The segment boundaries are then found at places (e.g., shot boundaries) where the values of content coherence are sufficiently low. On the basis of two state-of-the-art techniques for content coherence modeling, we illustrate in this paper the current possibilities for detecting the boundaries of semantic segments using visual low-level features only.


In the recent past, video content-based communication hasincreases with a significant consumption of space and time complexity.The introduction of the data is exceedingly improved in video information as the video information incorporates visual and sound data. The mix of these two kinds of information for a single data portrayal is exceedingly compelling as the broad media substance can make an ever-increasing number of effects on the human cerebrum. Thus, most of the substance for training or business or restorative area are video-based substances. This development in video information have impacted a significant number of the professional to fabricate and populate video content library for their use. Hence, retrieval of the accurate video data is the prime task for all video content management frameworks. A good number of researches are been carried out in the field of video retrieval using various methods. Most of the parallel research outcomes have focused on content retrieval based on object classification for the video frames and further matching the object information with other video contents based on the similar information. This method is highly criticised and continuously improving as the method solely relies on fundamental object detection and classification using the preliminary characteristics. These characteristics are primarily depending on shape or colour or area of the objects and cannot be accurate for detection of similarities. Hence, this work proposes, a novel method for similarity-based retrieval of video contents using deep characteristics. The work majorly focuses on extraction of moving objects, static objects separation, motion vector analysis of the moving objects and the traditional parameters as area from the video contents and further perform matching for retrieval or extraction of the video data. The proposed novel algorithm for content retrieval demonstrates 98% accuracy with 90% reduction in time complexity.


Author(s):  
Lin Lin ◽  
Mei-Ling Shyu

Motivated by the growing use of multimedia services and the explosion of multimedia collections, efficient retrieval from large-scale multimedia data has become very important in multimedia content analysis and management. In this paper, a novel ranking algorithm is proposed for video retrieval. First, video content is represented by the global and local features and second, multiple correspondence analysis (MCA) is applied to capture the correlation between video content and semantic concepts. Next, video segments are scored by considering the features with high correlations and the transaction weights converted from correlations. Finally, a user interface is implemented in a video retrieval system that allows the user to enter his/her interested concept, searches videos based on the target concept, ranks the retrieved video segments using the proposed ranking algorithm, and then displays the top-ranked video segments to the user. Experimental results on 30 concepts from the TRECVID high-level feature extraction task have demonstrated that the presented video retrieval system assisted by the proposed ranking algorithm is able to retrieve more video segments belonging to the target concepts and to display more relevant results to the users.


2008 ◽  
pp. 527-546
Author(s):  
A. Mittal ◽  
Cheong Loong Fah ◽  
Ashraf Kassim ◽  
Krishnan V. Pagalthivarthi

Most of the video retrieval systems work with a single shot without considering the temporal context in which the shot appears. However, the meaning of a shot depends on the context in which it is situated and a change in the order of the shots within a scene changes the meaning of the shot. Recently, it has been shown that to find higher-level interpretations of a collection of shots (i.e., a sequence), intershot analysis is at least as important as intrashot analysis. Several such interpretations would be impossible without a context. Contextual characterization of video data involves extracting patterns in the temporal behavior of features of video and mapping these patterns to a high-level interpretation. A Dynamic Bayesian Network (DBN) framework is designed with the temporal context of a segment of a video considered at different granularity depending on the desired application. The novel applications of the system include classifying a group of shots called sequence and parsing a video program into individual segments by building a model of the video program.


2001 ◽  
Vol 01 (03) ◽  
pp. 445-468 ◽  
Author(s):  
CHONG-WAH NGO ◽  
TING-CHUEN PONG ◽  
HONG-JIANG ZHANG

In this paper, we present major issues in video parsing, abstraction, retrieval and semantic analysis. We discuss the success, the difficulties and the expectations in these areas. In addition, we identify important opened problems that can lead to more sophisticated ways of video content analysis. For video parsing, we discuss topics in video partitioning, motion characterization and object segmentation. The success in video parsing, in general, will have a great impact on video representation and retrieval. We present three levels of abstracting video content by scene, keyframe and key object representations. These representation schemes in overall serve as a good start for video retrieval. We then describe visual features, in particular motion, and similarity measures adopted for retrieval. Next, we discuss the recent computational approaches in bridging the semantic gap for video content understanding.


2011 ◽  
Vol 204-210 ◽  
pp. 814-817
Author(s):  
Jie Xin Zhang

Nowadays, video media data is already facilitates generation, transmission, storage and circulation on the global scale. Video data is geometrically fast as the rate of growth, the video data processing and analysis have lagged behind the pace of development in the growth of data, resulting in large amounts of data is wasted. Therefore, it becomes an urgent need for efficient retrieval of video data content. In this paper, firstly, starting from the color feature, the color space of digital mapping and semantic color space conversion technology is proposed according to the problem of Semantic concepts for video does not match with the perceived characteristics. And then we realize the mapping from the low-level features to high-level semantic. Finally, semantic rules of uncertainty reasoning based on cloud model established to complete video content retrieval.


Sign in / Sign up

Export Citation Format

Share Document