High-Performance Video Retrieval Based on Spatio-Temporal Features

Author(s):  
G. S. N. Kumar ◽  
V. S. K. Reddy ◽  
S. Srinivas Kumar

Thousands of movies along with TV shows, documentaries are being produced each year around the world with different genres and languages. Making a movie scene impactful as well as original is challenging task for the director. On the other hand, users demands to retrieve similar scenes from their queries is also challenging task as there is no proper maintenance of database of movie scene videos with proper semantic tags associated with it. So to fulfill the requirement of these two (but not the least) application areas there is a need of content based retrieval system for movie scenes. Content based video retrieval is a problem of retrieving most similar videos to a given query video by analyzing the visual contents of videos. Traditional video level features based on key frame level hand engineered features which does not exploit rich dynamics present in the video. In this paper we propose a Content based Movie Scene Retrieval (CB-MSR) framework using spatio-temporal features learned by deep learning. Specifically deep CNN along with LSTM is deploy to learn spatio-temporal representations of video. On the basis of these learned features similar movie scenes can be retrieve from the collection of movies. Hollywood2 dataset is used to test the proposed system. Two types of features: spatial and spatio-temporal features are used to evaluate the proposed framework.


Author(s):  
Azra Nasreen ◽  
Shobha G

<p>Video Retrieval is an important technology that helps to design video search engines and allow users to browse and retrieve videos of interest from huge databases. Though, there are many existing techniques to search and retrieve videos based on spatial and temporal features but are unable to perform well resulting in high ranking of irrelevant videos leading to poor user satisfaction. In this paper an efficient multi-featured method for matching and extraction is proposed in parallel paradigm to retrieve videos accurately and quickly from the collection. Proposed system is tested on datasets that contains various categories of videos of varying length such as traffic, sports, nature etc. Experimental results show that around 80% of accuracy is achieved in searching and retrieving video. Through the use of high performance computing, the parallel execution performs 5 times faster in locating and retrieving videos of intrest than the sequential execution.</p>


Videos are recorded and uploaded daily to the sites like YouTube, Facebook etc. from devices such as mobile phones and digital cameras with less or without metadata (semantic tags) associated with it. This makes extremely difficult to retrieve similar videos based on this metadata without using content based semantic search. Content based video retrieval is problem of retrieving most similar videos to a given query video and has wide range of applications such as video browsing, content filtering, video indexing, etc. Traditional video level features based on key frame level hand engineered features which does not exploit rich dynamics present in the video. In this paper we propose a fast content based video retrieval framework using compact spatio-temporal features learned by deep learning. Specifically, deep CNN along with LSTM is deploy to learn spatio-temporal representations of video. For fast retrieval, binary code is generated by hashing learning component in the framework. For fast and effective learning of hash code proposed framework is trained in two stages. First stage learns the video dynamics and in second stage compact code is learn using learned video’s temporal variation from the first stage. UCF101 dataset is used to test the proposed method and results compared by other hashing methods. Results show that our approach is able to improve the performance over existing methods.


Sensors ◽  
2021 ◽  
Vol 21 (12) ◽  
pp. 4205
Author(s):  
Christian Knaak ◽  
Jakob von Eßen ◽  
Moritz Kröger ◽  
Frederic Schulze ◽  
Peter Abels ◽  
...  

In modern production environments, advanced and intelligent process monitoring strategies are required to enable an unambiguous diagnosis of the process situation and thus of the final component quality. In addition, the ability to recognize the current state of product quality in real-time is an important prerequisite for autonomous and self-improving manufacturing systems. To address these needs, this study investigates a novel ensemble deep learning architecture based on convolutional neural networks (CNN), gated recurrent units (GRU) combined with high-performance classification algorithms such as k-nearest neighbors (kNN) and support vector machines (SVM). The architecture uses spatio-temporal features extracted from infrared image sequences to locate critical welding defects including lack of fusion (false friends), sagging, lack of penetration, and geometric deviations of the weld seam. In order to evaluate the proposed architecture, this study investigates a comprehensive scheme based on classical machine learning methods using manual feature extraction and state-of-the-art deep learning algorithms. Optimal hyperparameters for each algorithm are determined by an extensive grid search. Additional work is conducted to investigate the significance of various geometrical, statistical and spatio-temporal features extracted from the keyhole and weld pool regions. The proposed method is finally validated on previously unknown welding trials, achieving the highest detection rates and the most robust weld defect recognition among all classification methods investigated in this work. Ultimately, the ensemble deep neural network is implemented and optimized to operate on low-power embedded computing devices with low latency (1.1 ms), demonstrating sufficient performance for real-time applications.


2021 ◽  
Author(s):  
Monir Torabian ◽  
Hossein Pourghassem ◽  
Homayoun Mahdavi-Nasab

2021 ◽  
pp. 115472
Author(s):  
Parameshwaran Ramalingam ◽  
Lakshminarayanan Gopalakrishnan ◽  
Manikandan Ramachandran ◽  
Rizwan Patan

2016 ◽  
Vol 12 ◽  
pp. P1115-P1115
Author(s):  
Vera Niederkofler ◽  
Christina Hoeller ◽  
Joerg Neddens ◽  
Ewald Auer ◽  
Heinrich Roemer ◽  
...  

2021 ◽  
Vol 12 (6) ◽  
pp. 1-23
Author(s):  
Shuo Tao ◽  
Jingang Jiang ◽  
Defu Lian ◽  
Kai Zheng ◽  
Enhong Chen

Mobility prediction plays an important role in a wide range of location-based applications and services. However, there are three problems in the existing literature: (1) explicit high-order interactions of spatio-temporal features are not systemically modeled; (2) most existing algorithms place attention mechanisms on top of recurrent network, so they can not allow for full parallelism and are inferior to self-attention for capturing long-range dependence; (3) most literature does not make good use of long-term historical information and do not effectively model the long-term periodicity of users. To this end, we propose MoveNet and RLMoveNet. MoveNet is a self-attention-based sequential model, predicting each user’s next destination based on her most recent visits and historical trajectory. MoveNet first introduces a cross-based learning framework for modeling feature interactions. With self-attention on both the most recent visits and historical trajectory, MoveNet can use an attention mechanism to capture the user’s long-term regularity in a more efficient way. Based on MoveNet, to model long-term periodicity more effectively, we add the reinforcement learning layer and named RLMoveNet. RLMoveNet regards the human mobility prediction as a reinforcement learning problem, using the reinforcement learning layer as the regularization part to drive the model to pay attention to the behavior with periodic actions, which can help us make the algorithm more effective. We evaluate both of them with three real-world mobility datasets. MoveNet outperforms the state-of-the-art mobility predictor by around 10% in terms of accuracy, and simultaneously achieves faster convergence and over 4x training speedup. Moreover, RLMoveNet achieves higher prediction accuracy than MoveNet, which proves that modeling periodicity explicitly from the perspective of reinforcement learning is more effective.


Sign in / Sign up

Export Citation Format

Share Document