A Multigranularity Surveillance Video Retrieval Algorithm for Human Targets

Author(s):  
Zhenkun Wen ◽  
Jinhua Gao ◽  
Fumi Liu ◽  
Huisi Wu
2018 ◽  
Vol 10 (4) ◽  
pp. 52-61
Author(s):  
Xiaoxi Liu ◽  
Ju Liu ◽  
Lingchen Gu ◽  
Yannan Ren

This article describes how due to the diversification of electronic equipment in public security forensics, vehicle surveillance video as a burgeoning way attracts us attention. The vehicle surveillance videos contain useful evidence, and video retrieval can help us find evidence contained in them. In order to get the evidence videos accurately and effectively, a convolution neural network (CNN) is widely applied to improve performance in surveillance video retrieval. In this article, it is proposed that a vehicle surveillance video retrieval method with deep feature derived from CNN and with iterative quantization (ITQ) encoding, when given any frame of a video, it can generate a short video which can be applied to public security forensics. Experiments show that the retrieved video can describe the video content before and after entering the keyframe directly and efficiently, and the final short video for an accident scene in the surveillance video can be regarded as forensic evidence.


2007 ◽  
Vol 16 (4) ◽  
pp. 1168-1181 ◽  
Author(s):  
Weiming Hu ◽  
Dan Xie ◽  
Zhouyu Fu ◽  
Wenrong Zeng ◽  
Steve Maybank

Electronics ◽  
2020 ◽  
Vol 9 (12) ◽  
pp. 2125
Author(s):  
Xiaoyu Wu ◽  
Tiantian Wang ◽  
Shengjin Wang

Text-video retrieval tasks face a great challenge in the semantic gap between cross modal information. Some existing methods transform the text or video into the same subspace to measure their similarity. However, this kind of method does not consider adding a semantic consistency constraint when associating the two modalities of semantic encoding, and the associated result is poor. In this paper, we propose a multi-modal retrieval algorithm based on semantic association and multi-task learning. Firstly, the multi-level features of video or text are extracted based on multiple deep learning networks, so that the information of the two modalities can be fully encoded. Then, in the public feature space where the two modalities information are mapped together, we propose a semantic similarity measurement and semantic consistency classification based on text-video features for a multi-task learning framework. With the semantic consistency classification task, the learning of semantic association task is restrained. So multi-task learning guides the better feature mapping of two modalities and optimizes the construction of unified feature subspace. Finally, the experimental results of our proposed algorithm on the Microsoft Video Description dataset (MSVD) and MSR-Video to Text (MSR-VTT) are better than the existing research, which prove that our algorithm can improve the performance of cross-modal retrieval.


Author(s):  
Behrang QasemiZadeh ◽  
Jiali Shen ◽  
Ian O'Neill ◽  
Paul Miller ◽  
Philip Hanna ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document