A FRAMEWORK FOR VIDEO SEGMENTATION USING GLOBAL AND LOCAL FEATURES

Author(s):  
JUNAID BABER ◽  
NITIN AFZULPURKAR ◽  
SHIN'ICHI SATOH

Rapid increase in video databases has forced the industry to have efficient and effective frameworks for video retrieval and indexing. Video segmentation into scenes is widely used for video summarization, partitioning, indexing and retrieval. In this paper, we propose a framework for scene detection mainly based on entropy and Speeded Up Robust Features (SURF) features. First, we detect the fade and abrupt boundaries based on frame entropy analysis and SURF features matching. Fade boundaries are smart indication of scenes beginning or ending in many videos and dramas, and are detected by frame entropy analysis. Before abrupt boundary detection, unnecessary frames which are obviously not abrupt boundaries, such as blank screens, high intensity influenced images, sliding credits, are removed. Candidate boundaries are detected to make SURF features efficient for abrupt boundary detection, and SURF features between candidate boundaries and their adjacent frames are used to detect the abrupt boundaries. Second, key frames are extracted from abrupt shots. We evaluate our key frame extraction with other famous algorithms and show the effectiveness of the key frames. Finally, scene boundaries are detected using sliding window of size K over the key frames in temporal order. In experimental evaluation on the TRECVID-2007 shot boundary test set, the algorithm for shot boundary achieves substantial improvements over state-of-the-art methods with the precision of 99% and the recall of 97.8%. Experimental results for video segmentation into scenes are also promising, compared to famous state-of-the-art techniques.

Temporal video segmentation is the primary step of content based video retrieval. The whole processes of video management are coming under the focus of content based video retrieval, which includes, video indexing, video retrieval, and video summarization etc. In this paper, we proposed a computationally efficient and discriminating shot boundary detection method, which uses a local feature descriptor named local Contrast and Ordering (LCO) for feature extraction. The results of the experiments, which are conducted on the video dataset TRECVid, analyzed and compared with some existing shot boundary detection methods. The proposed method has given a promising result, even in the cases of illumination changes, rotated images etc.


Entropy ◽  
2020 ◽  
Vol 22 (11) ◽  
pp. 1285
Author(s):  
WenLin Li ◽  
DeYu Qi ◽  
ChangJian Zhang ◽  
Jing Guo ◽  
JiaJun Yao

This paper proposes a video summarization algorithm called the Mutual Information and Entropy based adaptive Sliding Window (MIESW) method, which is specifically for the static summary of gesture videos. Considering that gesture videos usually have uncertain transition postures and unclear movement boundaries or inexplicable frames, we propose a three-step method where the first step involves browsing a video, the second step applies the MIESW method to select candidate key frames, and the third step removes most redundant key frames. In detail, the first step is to convert the video into a sequence of frames and adjust the size of the frames. In the second step, a key frame extraction algorithm named MIESW is executed. The inter-frame mutual information value is used as a metric to adaptively adjust the size of the sliding window to group similar content of the video. Then, based on the entropy value of the frame and the average mutual information value of the frame group, the threshold method is applied to optimize the grouping, and the key frames are extracted. In the third step, speeded up robust features (SURF) analysis is performed to eliminate redundant frames in these candidate key frames. The calculation of Precision, Recall, and Fmeasure are optimized from the perspective of practicality and feasibility. Experiments demonstrate that key frames extracted using our method provide high-quality video summaries and basically cover the main content of the gesture video.


2017 ◽  
Vol 9 (4) ◽  
pp. 15-29
Author(s):  
Lingchen Gu ◽  
Ju Liu ◽  
Aixi Qu

The advancement of multimedia technology has contributed to a large number of videos, so it is important to know how to retrieve information from video, especially for crime prevention and forensics. For the convenience of retrieving video data, content-based video retrieval (CBVR) has got great publicity. Aiming at improving the retrieval performance, we focus on the two key technologies: shot boundary detection and keyframe extraction. After being compared with pixel analysis and chi-square histogram, histogram-based method is chosen in this paper. Then we combine it with adaptive threshold method and use HSV color space to get the histogram. For keyframe extraction, four methods are analyzed and four evaluation criteria are summarized, both objective and subjective, so the opinion is finally given that different types of keyframe extraction methods can be used for varied types of videos. Then the retrieval can be based on keyframes, simplifying the process of video investigation, and helping criminal investigation personnel to improve work efficiency.


2013 ◽  
Vol 13 (01) ◽  
pp. 1350001 ◽  
Author(s):  
PARTHA PRATIM MOHANTA ◽  
SANJOY KUMAR SAHA ◽  
BHABATOSH CHANDA

Storyboard consisting of key-frames is a popular format of video summarization as it helps in efficient indexing, browsing and partial or complete retrieval of video. In this paper, we have presented a size constrained storyboard generation scheme. Given the shots i.e. the output of the video segmentation process, the method has two major steps: extraction of appropriate key-frame(s) from each shot and finally, selection of a specified number of key-frames from the set thus obtained. The set of selected key-frames should retain the variation in visual content originally possessed by the video. The number of key-frames or representative frames in a shot may vary depending on the variation in its visual content. Thus, automatic selection of suitable number of representative frames from a shot still remains a challenge. In this work, we propose a novel scheme for detecting the sub-shots, having consistent visual content, from a shot using Wald–Wolfowitz runs test. Then from each sub-shot a frame rendering the highest fidelity is extracted as key-frame. Finally, a spanning tree based novel method is proposed to select a subset of key-frames having specific cardinality. Chronological arrangement of such frames generates the size constrained storyboard. Experimental result and comparative study show that the scheme works satisfactorily for a wide variety of shots. Moreover, the proposed technique rectifies mis-detection error, if any, incurred in video segmentation process. Similarly, though not implemented, the proposed hypothesis test has ability to rectify the false-alarm in shot detection if it is applied on pair of adjacent shots.


2020 ◽  
Vol 8 (5) ◽  
pp. 4763-4769

Now days as the progress of digital image technology, video files raise fast, there is a great demand for automatic video semantic study in many scenes, such as video semantic understanding, content-based analysis, video retrieval. Shot boundary detection is an elementary step for video analysis. However, recent methods are time consuming and perform badly in the gradual transition detection. In this paper we have projected a novel approach for video shot boundary detection using CNN which is based on feature extraction. We designed couple of steps to implement this method for automatic video shot boundary detection (VSBD). Primarily features are extracted using H, V&S parameters based on mean log difference along with implementation of histogram distribution function. This feature is given as an input to CNN algorithm which detects shots which is based on probability function. CNN is implemented using convolution and rectifier linear unit activation matrix which is followed after filter application and zero padding. After downsizing the matrix it is given as a input to fully connected layer which indicates shot boundaries comparing the proposed method with CNN method based on GPU the results are encouraging with substantially high values of precision Recall & F1 measures. CNN methods perform moderately better for animated videos while it excels for complex video which is observed in the results.


Sign in / Sign up

Export Citation Format

Share Document