Multimodal Summarization of User-Generated Videos

The exponential growth of user-generated content has increased the need for efficient video summarization schemes. However, most approaches underestimate the power of aural features, while they are designed to work mainly on commercial/professional videos. In this work, we present an approach that uses both aural and visual features in order to create video summaries from user-generated videos. Our approach produces dynamic video summaries, that is, comprising the most “important” parts of the original video, which are arranged so as to preserve their temporal order. We use supervised knowledge from both the aforementioned modalities and train a binary classifier, which learns to recognize the important parts of videos. Moreover, we present a novel user-generated dataset which contains videos from several categories. Every 1 sec part of each video from our dataset has been annotated by more than three annotators as being important or not. We evaluate our approach using several classification strategies based on audio, video and fused features. Our experimental results illustrate the potential of our approach.

Download Full-text

Video Summarization for Sign Languages Using the Median of Entropy of Mean Frames Method

Entropy ◽

10.3390/e20100748 ◽

2018 ◽

Vol 20 (10) ◽

pp. 748 ◽

Cited By ~ 3

Author(s):

Shazia Saqib ◽

Syed Kazmi

Keyword(s):

Temporal Order ◽

Video Summarization ◽

Sliding Window ◽

Data Retrieval ◽

Video Data ◽

Original Video ◽

Video Content ◽

Sign Languages ◽

The Mean ◽

Audio Video

Multimedia information requires large repositories of audio-video data. Retrieval and delivery of video content is a very time-consuming process and is a great challenge for researchers. An efficient approach for faster browsing of large video collections and more efficient content indexing and access is video summarization. Compression of data through extraction of keyframes is a solution to these challenges. A keyframe is a representative frame of the salient features of the video. The output frames must represent the original video in temporal order. The proposed research presents a method of keyframe extraction using the mean of consecutive k frames of video data. A sliding window of size k / 2 is employed to select the frame that matches the median entropy value of the sliding window. This is called the Median of Entropy of Mean Frames (MME) method. MME is mean-based keyframes selection using the median of the entropy of the sliding window. The method was tested for more than 500 videos of sign language gestures and showed satisfactory results.

Download Full-text

An optimal video summarization of surveillance systems using LFOB-COA with deep features and RBLSTM

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-212800 ◽

2022 ◽

pp. 1-9

Author(s):

D. Minola Davids ◽

C. Seldev Christopher

Keyword(s):

Video Summarization ◽

Original Video ◽

Surveillance Systems ◽

Single Camera ◽

Major Task ◽

Shot Segmentation ◽

Deep Feature ◽

Processing Step ◽

Deep Feature Extraction ◽

Efficient Video

The visual data attained from surveillance single-camera or multi-view camera networks is exponentially increasing every day. Identifying the important shots in the presented video which faithfully signify the original video is the major task in video summarization. For executing efficient video summarization of the surveillance systems, optimization algorithm like LFOB-COA is proposed in this paper. Data collection, pre-processing, deep feature extraction (FE), shot segmentation JSFCM, classification using Rectified Linear Unit activated BLSTM, and LFOB-COA are the proposed method’s five steps. Finally a post-processing step is utilized. For recognizing the proposed method’s effectiveness, the results are then contrasted with the existent methods.

Download Full-text

A Video Summarization Approach to Speed-up the Analysis of Child Sexual Exploitation Material

10.17979/spudc.9788497498043.648 ◽

2021 ◽

pp. 648-654

Author(s):

Rubel Biswas ◽

Deisy Chaves ◽

Laura Fernández-Robles ◽

Eduardo Fidalgo ◽

Enrique Alegre

Keyword(s):

Video Summarization ◽

Sexual Exploitation ◽

Original Video ◽

Law Enforcement Agencies ◽

Detection Algorithms ◽

Security Applications ◽

Child Sexual Exploitation ◽

Speed Up ◽

Video Summaries ◽

Legal Constraints

Identifying key content from a video is essential for many security applications such as motion/action detection, person re-identification and recognition. Moreover, summarizing the key information from Child Sexual Exploitation Materials, especially videos, which mainly contain distinctive scenes including people’s faces is crucial to speed-up the investigation of Law Enforcement Agencies. In this paper, we present a video summarization strategy that combines perceptual hashing and face detection algorithms to keep the most relevant frames of a video containing people’s faces that may correspond to victims or offenders. Due to legal constraints to access Child Sexual Abuse datasets, we evaluated the performance of the proposed strategy during the detection of adult pornography content with the NDPI-800 dataset. Also, we assessed the capability of our strategy to create video summaries preserving frames with distinctive faces from the original video using ten additional short videos manually labeled. Results showed that our approach can detect pornography content with an accuracy of 84.15% at a speed of 8.05 ms/frame making this appropriate for realtime applications.

Download Full-text

Audio-video summarization of TV news using speech recognition and shot change detection

10.21437/interspeech.2005-57 ◽

2005 ◽

Author(s):

Chien-Lin Huang ◽

Chia-Hsin Hsieh ◽

Chung-Hsien Wu

Keyword(s):

Speech Recognition ◽

Change Detection ◽

Video Summarization ◽

Tv News ◽

Audio Video

Download Full-text

A Research on Multi-View Video Summarization Techniques

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a2985.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 6837-6846

Keyword(s):

Video Summarization ◽

Research Work ◽

Original Video ◽

Multiple Views ◽

Public Places ◽

Still Images ◽

Maximum Information ◽

The Public ◽

Surveillance Cameras ◽

Accident Scenarios

Video Surveillance System uses video cameras to capture images and videos that can be compressed, stored and send to place with the limited set of monitors .Now a Days all the public places such as bank, educational institutions, Offices, Hospitals are equipped with multiple surveillance cameras having overlapping field of view for security and environment monitoring purposes. A Video Summarization is a technique to generate the summary of entire Video Content either by still images or through video skim. The summarized video length should be less than the original video length and it should covers maximum information from the original video. Video summarization studies concentrating on monocular videos cannot be applied directly to multiple-view videos due to redundancy in multiple views. Generating Summary for Surveillance videos is more challenging because, videos Captured by surveillance cameras is long, contains uninteresting events, same scene recorded in different views leading to inter-view dependencies and variation in illuminations. In this paper, we present a survey on the research work carried on video summarization techniques for videos captured through multiple views. The summarized video generated can be used for the analysis of post-accident scenarios, identifying suspicious events, theft in public which supports Crime department for the investigation purposes.

Download Full-text

Video Synopsis Generation Using GPGPU

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.433-435.297 ◽

2013 ◽

Vol 433-435 ◽

pp. 297-300

Author(s):

Zong Yue Wang

Keyword(s):

Parallel Computing ◽

Original Video ◽

Generation Efficiency ◽

Generation Algorithm ◽

Video Representation ◽

Gpu Parallel Computing ◽

Video Synopsis ◽

Video Summaries

Video summaries provide a compact video representation preserving the essential activities of the original video, but the summaries may be confusing when mixing different activities together. Summaries Clustered methodology, showing similar activities simultaneously, enables to view much easier and more efficiently. However, it is very time consuming in generating summaries, especially in calculating motion distance and collision cost. To improve the efficiency of generating summaries, a parallel video synopsis generation algorithm is proposed based on GPGPU. The experiment result shows generation efficiency is improved greatly through GPU parallel computing. The acceleration radio can reach at 5.75 when data size is above 1600*960*30000.

Download Full-text

Social media video summarization using multi-Visual features and Kohnen's Self Organizing Map

Information Processing & Management ◽

10.1016/j.ipm.2019.102190 ◽

2020 ◽

Vol 57 (3) ◽

pp. 102190 ◽

Cited By ~ 1

Author(s):

Seema Rani ◽

Mukesh Kumar

Keyword(s):

Social Media ◽

Video Summarization ◽

Visual Features ◽

Self Organizing Map ◽

Self Organizing

Download Full-text

Using content models to build audio-video summaries

10.1117/12.333853 ◽

1998 ◽

Cited By ~ 4

Author(s):

Janne Saarela ◽

Bernard Merialdo

Keyword(s):

Video Summaries ◽

Audio Video

Download Full-text

Efficient video summarization based on a fuzzy video content representation

2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353) ◽

10.1109/iscas.2000.858748 ◽

2002 ◽

Cited By ~ 4

Author(s):

A.D. Doulamis ◽

N.D. Doulamis ◽

S.D. Kollias

Keyword(s):

Video Summarization ◽

Video Content ◽

Content Representation ◽

Efficient Video

Download Full-text

Effective Entity Linking and Disambiguation Algorithms for User-Generated Content (UGC)

Advances in Computer and Electrical Engineering - Handbook of Research on Contemporary Perspectives on Web-Based Systems ◽

10.4018/978-1-5225-5384-7.ch018 ◽

2018 ◽

pp. 416-433

Author(s):

Senthil Kumar Narayanasamy ◽

Dinakaran Muruganantham

Keyword(s):

Decision Making ◽

Social Media ◽

Information Extraction ◽

Knowledge Base ◽

Exponential Growth ◽

Extraction Process ◽

Unstructured Data ◽

User Generated Content ◽

Entity Linking ◽

Named Entities

The exponential growth of data emerging out of social media is causing challenges in decision-making systems and poses a critical hindrance in searching for the potential information. The major objective of this chapter is to convert the unstructured data in social media into the meaningful structure format, which in return brings the robustness to the information extraction process. Further, it has the inherent capability to prune for named entities from the unstructured data and store the entities into the knowledge base for important facts. In this chapter, the authors explain the methods to identify all the critical interpretations taken over to find the named entities from Twitter streams and the techniques to proportionally link it with appropriate knowledge sources such as DBpedia.

Download Full-text