video summaries
Recently Published Documents


TOTAL DOCUMENTS

74
(FIVE YEARS 11)

H-INDEX

10
(FIVE YEARS 2)

2021 ◽  
pp. 648-654
Author(s):  
Rubel Biswas ◽  
Deisy Chaves ◽  
Laura Fernández-Robles ◽  
Eduardo Fidalgo ◽  
Enrique Alegre

Identifying key content from a video is essential for many security applications such as motion/action detection, person re-identification and recognition. Moreover, summarizing the key information from Child Sexual Exploitation Materials, especially videos, which mainly contain distinctive scenes including people’s faces is crucial to speed-up the investigation of Law Enforcement Agencies. In this paper, we present a video summarization strategy that combines perceptual hashing and face detection algorithms to keep the most relevant frames of a video containing people’s faces that may correspond to victims or offenders. Due to legal constraints to access Child Sexual Abuse datasets, we evaluated the performance of the proposed strategy during the detection of adult pornography content with the NDPI-800 dataset. Also, we assessed the capability of our strategy to create video summaries preserving frames with distinctive faces from the original video using ten additional short videos manually labeled. Results showed that our approach can detect pornography content with an accuracy of 84.15% at a speed of 8.05 ms/frame making this appropriate for realtime applications.


Author(s):  
Yongbiao Gao ◽  
Ning Xu ◽  
Xin Geng

Reinforcement learning maps from perceived state representation to actions, which is adopted to solve the video summarization problem. The reward is crucial for deal with the video summarization task via reinforcement learning, since the reward signal defines the goal of video summarization. However, existing reward mechanism in reinforcement learning cannot handle the ambiguity which appears frequently in video summarization, i.e., the diverse consciousness by different people on the same video. To solve this problem, in this paper label distributions are mapped from the CNN and LSTM-based state representation to capture the subjectiveness of video summaries. The dual-reward is designed by measuring the similarity between user score distributions and the generated label distributions. Not only the average score but also the the variance of the subjective opinions are considered in summary generation. Experimental results on several benchmark datasets show that our proposed method outperforms other approaches under various settings.


2021 ◽  
Vol 11 (11) ◽  
pp. 5260
Author(s):  
Theodoros Psallidas ◽  
Panagiotis Koromilas ◽  
Theodoros Giannakopoulos ◽  
Evaggelos Spyrou

The exponential growth of user-generated content has increased the need for efficient video summarization schemes. However, most approaches underestimate the power of aural features, while they are designed to work mainly on commercial/professional videos. In this work, we present an approach that uses both aural and visual features in order to create video summaries from user-generated videos. Our approach produces dynamic video summaries, that is, comprising the most “important” parts of the original video, which are arranged so as to preserve their temporal order. We use supervised knowledge from both the aforementioned modalities and train a binary classifier, which learns to recognize the important parts of videos. Moreover, we present a novel user-generated dataset which contains videos from several categories. Every 1 sec part of each video from our dataset has been annotated by more than three annotators as being important or not. We evaluate our approach using several classification strategies based on audio, video and fused features. Our experimental results illustrate the potential of our approach.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1035
Author(s):  
Hugo Meyer ◽  
Peter Wei ◽  
Xiaofan Jiang

In this paper, we present HOMER, a cloud-based system for video highlight generation which enables the automated, relevant, and flexible segmentation of videos. Our system outperforms state-of-the-art solutions by fusing internal video content-based features with the user’s emotion data. While current research mainly focuses on creating video summaries without the use of affective data, our solution achieves the subjective task of detecting highlights by leveraging human emotions. In two separate experiments, including videos filmed with a dual camera setup, and home videos randomly picked from Microsoft’s Video Titles in the Wild (VTW) dataset, HOMER demonstrates an improvement of up to 38% in F1-score from baseline, while not requiring any external hardware. We demonstrated both the portability and scalability of HOMER through the implementation of two smartphone applications.


Author(s):  
Boyuan Tang ◽  
Weiting Chen

With the rapid growth of online videos, it is crucial to generate overviews of videos to help audiences make viewing decisions and save time. Video summarization and video captioning are two of the most common solutions. In this paper, we proposed a new solution in the form of a series of scene-person pairs generated from our proposed video description scheme. This new formation takes substantially less time than watching video summaries and is more acceptable than video captions. In addition, our method can be generalized to different types of videos. We also proposed a face clustering method and a scene detection method. The experimental results indicate that our methods outperform other state-of-the-art methods and are highly generalizable. As an example, a demo application is developed to demonstrate the proposed description scheme.


Author(s):  
Vishal Parikh ◽  
Jay Mehta ◽  
Saumyaa Shah ◽  
Priyanka Sharma

Background: With the technological advancement, the quality of life of a human were improved. Also with the technological advancement large amount of data were produced by human. The data is in the forms of text, images and videos. Hence there is a need for significant efforts and means of devising methodologies for analyzing and summarizing them to manage with the space constraints. Video summaries can be generated either by keyframes or by skim/shot. The keyframe extraction is done based on deep learning based object detection techniques. Various object detection algorithms have been reviewed for generating and selecting the best possible frames as keyframes. A set of frames were extracted out of the original video sequence and based on the technique used, one or more frames of the set are decided as a keyframe, which then becomes the part of the summarized video. The following paper discusses the selection of various keyframe extraction techniques in detail. Methods : The research paper is focused at summary generation for office surveillance videos. The major focus for the summary generation is based on various keyframe extraction techniques. For the same various training models like Mobilenet, SSD, and YOLO were used. A comparative analysis of the efficiency for the same showed YOLO giving better performance as compared to the others. Keyframe selection techniques like sufficient content change, maximum frame coverage, minimum correlation, curve simplification, and clustering based on human presence in the frame have been implemented. Results: Variable and fixed length video summaries were generated and analyzed for each keyframe selection techniques for office surveillance videos. The analysis shows that he output video obtained after using the Clustering and the Curve Simplification approaches is compressed to half the size of the actual video but requires considerably less storage space. The technique depending on the change of frame content between consecutive frames for keyframe selection produces the best output for office room scenarios. The technique depending on frame content between consecutive frames for keyframe selection produces the best output for office surveillance videos. Conclusion: In this paper, we discussed the process of generating a synopsis of a video to highlight the important portions and discard the trivial and redundant parts. First, we have described various object detection algorithms like YOLO and SSD, used in conjunction with neural networks like MobileNet to obtain the probabilistic score of an object that is present in the video. These algorithms generate the probability of a person being a part of the image, for every frame in the input video. The results of object detection are passed to keyframe extraction algorithms to obtain the summarized video. From our comparative analysis for keyframe selection techniques for office videos will help in determining which keyframe selection technique is preferable.


Author(s):  
Huma Qayyum ◽  
Muhammad Majid ◽  
Ehatisham ul Haq ◽  
Syed Muhammad Anwar

Sign in / Sign up

Export Citation Format

Share Document