Comparative Analysis of Key-frame Extraction Techniques For Video Summarization

Author(s):  
Vishal Parikh ◽  
Jay Mehta ◽  
Saumyaa Shah ◽  
Priyanka Sharma

Background: With the technological advancement, the quality of life of a human were improved. Also with the technological advancement large amount of data were produced by human. The data is in the forms of text, images and videos. Hence there is a need for significant efforts and means of devising methodologies for analyzing and summarizing them to manage with the space constraints. Video summaries can be generated either by keyframes or by skim/shot. The keyframe extraction is done based on deep learning based object detection techniques. Various object detection algorithms have been reviewed for generating and selecting the best possible frames as keyframes. A set of frames were extracted out of the original video sequence and based on the technique used, one or more frames of the set are decided as a keyframe, which then becomes the part of the summarized video. The following paper discusses the selection of various keyframe extraction techniques in detail. Methods : The research paper is focused at summary generation for office surveillance videos. The major focus for the summary generation is based on various keyframe extraction techniques. For the same various training models like Mobilenet, SSD, and YOLO were used. A comparative analysis of the efficiency for the same showed YOLO giving better performance as compared to the others. Keyframe selection techniques like sufficient content change, maximum frame coverage, minimum correlation, curve simplification, and clustering based on human presence in the frame have been implemented. Results: Variable and fixed length video summaries were generated and analyzed for each keyframe selection techniques for office surveillance videos. The analysis shows that he output video obtained after using the Clustering and the Curve Simplification approaches is compressed to half the size of the actual video but requires considerably less storage space. The technique depending on the change of frame content between consecutive frames for keyframe selection produces the best output for office room scenarios. The technique depending on frame content between consecutive frames for keyframe selection produces the best output for office surveillance videos. Conclusion: In this paper, we discussed the process of generating a synopsis of a video to highlight the important portions and discard the trivial and redundant parts. First, we have described various object detection algorithms like YOLO and SSD, used in conjunction with neural networks like MobileNet to obtain the probabilistic score of an object that is present in the video. These algorithms generate the probability of a person being a part of the image, for every frame in the input video. The results of object detection are passed to keyframe extraction algorithms to obtain the summarized video. From our comparative analysis for keyframe selection techniques for office videos will help in determining which keyframe selection technique is preferable.

Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 279
Author(s):  
Rafael Padilla ◽  
Wesley L. Passos ◽  
Thadeu L. B. Dias ◽  
Sergio L. Netto ◽  
Eduardo A. B. da Silva

Recent outstanding results of supervised object detection in competitions and challenges are often associated with specific metrics and datasets. The evaluation of such methods applied in different contexts have increased the demand for annotated datasets. Annotation tools represent the location and size of objects in distinct formats, leading to a lack of consensus on the representation. Such a scenario often complicates the comparison of object detection methods. This work alleviates this problem along the following lines: (i) It provides an overview of the most relevant evaluation methods used in object detection competitions, highlighting their peculiarities, differences, and advantages; (ii) it examines the most used annotation formats, showing how different implementations may influence the assessment results; and (iii) it provides a novel open-source toolkit supporting different annotation formats and 15 performance metrics, making it easy for researchers to evaluate the performance of their detection algorithms in most known datasets. In addition, this work proposes a new metric, also included in the toolkit, for evaluating object detection in videos that is based on the spatio-temporal overlap between the ground-truth and detected bounding boxes.


Author(s):  
Rohan Kanotra ◽  
Akash Analyst ◽  
Neelendu Wadhwa ◽  
N. Jeyanthi*

COVID-19 has made mankind see unprecedented and unbelievable times with millions of people being affected due to it. Multiple countries have started vaccinating their populations in the hope that it will end the pandemic. Given the inequitable access to vaccines across the world and the highly mutating coronavirus it remains to be seen when will everyone get access to vaccines and how effective the vaccines might prove over the virus variants. Therefore, standard COVID behaviour is here to stay for some time. Wearing face masks is one such etiquette which greatly reduces risk of getting infected. Employing public face mask detection systems has helped multiple countries to bring the pandemic under control. In this paper we have done a quantitative analysis of different object detection algorithms namely ResNet,MobileNetV2 and CNN on face mask detection on accuracy and recall parameters using an unbiased, large and diverse dataset in order the algorithm which can be applied on a mass scale.


Author(s):  
Carolina Toledo Ferraz ◽  
William Barcellos ◽  
Osmando Pereira Junior ◽  
Tamiris Trevisan Negri Borges ◽  
Marcelo Garcia Manzato ◽  
...  

Author(s):  
Samuel Humphries ◽  
Trevor Parker ◽  
Bryan Jonas ◽  
Bryan Adams ◽  
Nicholas J Clark

Quick identification of building and roads is critical for execution of tactical US military operations in an urban environment. To this end, a gridded, referenced, satellite images of an objective, often referred to as a gridded reference graphic or GRG, has become a standard product developed during intelligence preparation of the environment. At present, operational units identify key infrastructure by hand through the work of individual intelligence officers. Recent advances in Convolutional Neural Networks, however, allows for this process to be streamlined through the use of object detection algorithms. In this paper, we describe an object detection algorithm designed to quickly identify and label both buildings and road intersections present in an image. Our work leverages both the U-Net architecture as well the SpaceNet data corpus to produce an algorithm that accurately identifies a large breadth of buildings and different types of roads. In addition to predicting buildings and roads, our model numerically labels each building by means of a contour finding algorithm. Most importantly, the dual U-Net model is capable of predicting buildings and roads on a diverse set of test images and using these predictions to produce clean GRGs.


Sign in / Sign up

Export Citation Format

Share Document