Efficient video summarization based on a fuzzy video content representation

Author(s):  
A.D. Doulamis ◽  
N.D. Doulamis ◽  
S.D. Kollias
2000 ◽  
Vol 80 (6) ◽  
pp. 1049-1067 ◽  
Author(s):  
Anastasios D. Doulamis ◽  
Nikolaos D. Doulamis ◽  
Stefanos D. Kollias

2001 ◽  
Vol 01 (03) ◽  
pp. 507-526 ◽  
Author(s):  
TONG LIN ◽  
HONG-JIANG ZHANG ◽  
QING-YUN SHI

In this paper, we present a novel scheme on video content representation by exploring the spatio-temporal information. A pseudo-object-based shot representation containing more semantics is proposed to measure shot similarity and force competition approach is proposed to group shots into scene based on content coherences between shots. Two content descriptors, color objects: Dominant Color Histograms (DCH) and Spatial Structure Histograms (SSH), are introduced. To represent temporal content variations, a shot can be segmented into several subshots that are of coherent content, and shot similarity measure is formulated as subshot similarity measure that serves to shot retrieval. With this shot representation, scene structure can be extracted by analyzing the splitting and merging force competitions at each shot boundary. Experimental results on real-world sports video prove that our proposed approach for video shot retrievals achieve the best performance on the average recall (AR) and average normalized modified retrieval rank (ANMRR), and Experiment on MPEG-7 test videos achieves promising results by the proposed scene extraction algorithm.


Author(s):  
Hehe Fan ◽  
Zhongwen Xu ◽  
Linchao Zhu ◽  
Chenggang Yan ◽  
Jianjun Ge ◽  
...  

We aim to significantly reduce the computational cost for classification of temporally untrimmed videos while retaining similar accuracy. Existing video classification methods sample frames with a predefined frequency over entire video. Differently, we propose an end-to-end deep reinforcement approach which enables an agent to classify videos by watching a very small portion of frames like what we do. We make two main contributions. First, information is not equally distributed in video frames along time. An agent needs to watch more carefully when a clip is informative and skip the frames if they are redundant or irrelevant. The proposed approach enables the agent to adapt sampling rate to video content and skip most of the frames without the loss of information. Second, in order to have a confident decision, the number of frames that should be watched by an agent varies greatly from one video to another. We incorporate an adaptive stop network to measure confidence score and generate timely trigger to stop the agent watching videos, which improves efficiency without loss of accuracy. Our approach reduces the computational cost significantly for the large-scale YouTube-8M dataset, while the accuracy remains the same.


Author(s):  
Jun Wang ◽  
M.J.T. Reinders ◽  
R.L. Lagendijk ◽  
J. Lindenberg ◽  
M.S. Kankanhalli

Sign in / Sign up

Export Citation Format

Share Document