An Efficient Method for Video Shot Transition Detection Using Probability Binary Weight Approach

Author(s):  
Nandini H. M. ◽  
Chethan H. K. ◽  
Rashmi B. S.

Shot boundary detection in videos is one of the most fundamental tasks towards content-based video retrieval and analysis. In this aspect, an efficient approach to detect abrupt and gradual transition in videos is presented. The proposed method detects the shot boundaries in videos by extracting block-based mean probability binary weight (MPBW) histogram from the normalized Kirsch magnitude frames as an amalgamation of local and global features. Abrupt transitions in videos are detected by utilizing the distance measure between consecutive MPBW histograms and employing an adaptive threshold. In the subsequent step, co-efficient of mean deviation and variance statistical measure is applied on MPBW histograms to detect gradual transitions in the video. Experiments were conducted on TRECVID 2001 and 2007 datasets to analyse and validate the proposed method. Experimental result shows significant improvement of the proposed SBD approach over some of the state-of-the-art algorithms in terms of recall, precision, and F1-score.

2020 ◽  
Vol 8 (5) ◽  
pp. 4763-4769

Now days as the progress of digital image technology, video files raise fast, there is a great demand for automatic video semantic study in many scenes, such as video semantic understanding, content-based analysis, video retrieval. Shot boundary detection is an elementary step for video analysis. However, recent methods are time consuming and perform badly in the gradual transition detection. In this paper we have projected a novel approach for video shot boundary detection using CNN which is based on feature extraction. We designed couple of steps to implement this method for automatic video shot boundary detection (VSBD). Primarily features are extracted using H, V&S parameters based on mean log difference along with implementation of histogram distribution function. This feature is given as an input to CNN algorithm which detects shots which is based on probability function. CNN is implemented using convolution and rectifier linear unit activation matrix which is followed after filter application and zero padding. After downsizing the matrix it is given as a input to fully connected layer which indicates shot boundaries comparing the proposed method with CNN method based on GPU the results are encouraging with substantially high values of precision Recall & F1 measures. CNN methods perform moderately better for animated videos while it excels for complex video which is observed in the results.


2011 ◽  
Vol 10 (03) ◽  
pp. 247-259 ◽  
Author(s):  
Dianting Liu ◽  
Mei-Ling Shyu ◽  
Chao Chen ◽  
Shu-Ching Chen

In consequence of the popularity of family video recorders and the surge of Web 2.0, increasing amounts of videos have made the management and integration of the information in videos an urgent and important issue in video retrieval. Key frames, as a high-quality summary of videos, play an important role in the areas of video browsing, searching, categorisation, and indexing. An effective set of key frames should include major objects and events of the video sequence, and should contain minimum content redundancies. In this paper, an innovative key frame extraction method is proposed to select representative key frames for a video. By analysing the differences between frames and utilising the clustering technique, a set of key frame candidates (KFCs) is first selected at the shot level, and then the information within a video shot and between video shots is used to filter the candidate set to generate the final set of key frames. Experimental results on the TRECVID 2007 video dataset have demonstrated the effectiveness of our proposed key frame extraction method in terms of the percentage of the extracted key frames and the retrieval precision.


2021 ◽  
Author(s):  
Tahir Amin

In this study we present a new approach to feature extraction for image and video retrieval. A Laplacian mixture model is proposed to model the peaky distributions of the wavelet coefficients. The proposed method extracts a low dimensional feature vector which is very important for the retrieval efficiency of the system in terms of response time. Although the importance of effective feature set cannot be overemphasized, yet it is very hard to describe image similarity with only low level features. Learning from the user feedback may enhance the system performance significantly. This approach, known as the relevance feedback, is adopted to further improve the efficiency of the system. The system learns from the user input in the form of positive and negative examples. The parameters of the system are modified by the user behavior. The parameters of the Laplacian mixture model are used to represent texture information of the images. The experimental evaluation indicates the high discriminatory power of the proposed features. The traditional measures of distance between two vectors like city-block or Euclidean are linear in nature. The human visual system does not follow this simple linear model. Therefore, a non-linear approach to the distance measure for defining the similarity between the two images is also explored in this work. It is observed that non-linear modelling of similarity yields more satisfactory performance and increases the retrieval performance by 7.5 per cent. Video is primarily mult-model, i.e., it contains different media components like audio, speech, visual information (frames) and caption (text). Traditionally, visual information is used for the video indexing and retrieval. The visual contents in the videos are very important; however, in some cases visual information is not very helpful for finding clues to the events. For example, certain action sequences such as goal events in a soccer game and explosion in a news video are easier to identify in the audio domain than in the visual domain. Since the proposed feature extraction scheme is based on the shape of the wavelet coefficient distribution, therefore it can also be applied to analyze the embedded audio contents of the video. We use audio information for indexing video clips. A feedback mechanism is also studied to improve the performance of the system.


2020 ◽  
Vol 10 (10) ◽  
pp. 2490-2500
Author(s):  
Sadaf Zahid Mahmood ◽  
Humaira Afzal ◽  
Muhammad Rafiq Mufti ◽  
Nadeem Akhtar ◽  
Asad Habib ◽  
...  

The demand of accurate and visually fair images is increasing with the passage of time and bang of the number of digital images especially in the domain of medical and healthcare systems. The visual image quality of modern cameras affected due to edges, textures and sharp structures noise. Though research community has introduced several techniques such as BM3D (Block Matching and 3D) for image denoising. However, edges and texture preservation capabilities remain issues due to hard thresholds values and captured image diversity. In order to address these issues, we propose a new variant of BM3D namely BM3DMA (Block Matching and 3D with Mahalanobis and Adaptive filter) which is employed through the use of Mahalanobis distance measure (for diversity coverage) and adaptive filter (for soft thresholds). We used two widely known datasets consist of set of standard and medical images. We observe 5% to 10% enhancement in the performance of BM3DMA as compared to BM3D in terms of improving the PSNR (Peak Signal to Noise Ratio) value. The promising experimental result indicates the effectiveness of BM3DMA in terms preserving the edge and texture image noise.


2011 ◽  
Vol 2011 ◽  
pp. 1-7 ◽  
Author(s):  
Xian-Hua Han ◽  
Yen-Wei Chen

We describe an approach for the automatic modality classification in medical image retrieval task of the 2010 CLEF cross-language image retrieval campaign (ImageCLEF). This paper is focused on the process of feature extraction from medical images and fuses the different extracted visual features and textual feature for modality classification. To extract visual features from the images, we used histogram descriptor of edge, gray, or color intensity and block-based variation as global features and SIFT histogram as local feature. For textual feature of image representation, the binary histogram of some predefined vocabulary words from image captions is used. Then, we combine the different features using normalized kernel functions for SVM classification. Furthermore, for some easy misclassified modality pairs such as CT and MR or PET and NM modalities, a local classifier is used for distinguishing samples in the pair modality to improve performance. The proposed strategy is evaluated with the provided modality dataset by ImageCLEF 2010.


Author(s):  
LIANG-HUA CHEN ◽  
KUO-HAO CHIN ◽  
HONG-YUAN MARK LIAO

The usefulness of a video database depends on whether the video of interest can be easily located. In this paper, we propose a video retrieval algorithm based on the integration of several visual cues. In contrast to key-frame based representation of shot, our approach analyzes all frames within a shot to construct a compact representation of video shot. In the video matching step, by integrating the color and motion features, a similarity measure is defined to locate the occurrence of similar video clips in the database. Therefore, our approach is able to fully exploit the spatio-temporal information contained in video. Experimental results indicate that the proposed approach is effective and outperforms some existing technique.


Sign in / Sign up

Export Citation Format

Share Document