Global-view hashing: harnessing global relations in near-duplicate video retrieval

2018 ◽  
Vol 22 (2) ◽  
pp. 771-789 ◽  
Author(s):  
Weizhen Jing ◽  
Xiushan Nie ◽  
Chaoran Cui ◽  
Xiaoming Xi ◽  
Gongping Yang ◽  
...  
Author(s):  
Giorgos Kordopatis-Zilos ◽  
Symeon Papadopoulos ◽  
Ioannis Patras ◽  
Yiannis Kompatsiaris

2020 ◽  
Vol 14 (5) ◽  
Author(s):  
Ling Shen ◽  
Richang Hong ◽  
Yanbin Hao

Author(s):  
Wenzhe Wang ◽  
Mengdan Zhang ◽  
Runnan Chen ◽  
Guanyu Cai ◽  
Penghao Zhou ◽  
...  

Multi-modal cues presented in videos are usually beneficial for the challenging video-text retrieval task on internet-scale datasets. Recent video retrieval methods take advantage of multi-modal cues by aggregating them to holistic high-level semantics for matching with text representations in a global view. In contrast to this global alignment, the local alignment of detailed semantics encoded within both multi-modal cues and distinct phrases is still not well conducted. Thus, in this paper, we leverage the hierarchical video-text alignment to fully explore the detailed diverse characteristics in multi-modal cues for fine-grained alignment with local semantics from phrases, as well as to capture a high-level semantic correspondence. Specifically, multi-step attention is learned for progressively comprehensive local alignment and a holistic transformer is utilized to summarize multi-modal cues for global alignment. With hierarchical alignment, our model outperforms state-of-the-art methods on three public video retrieval datasets.


Author(s):  
Pavlos Avgoustinakis ◽  
Giorgos Kordopatis-Zilos ◽  
Symeon Papadopoulos ◽  
Andreas L. Symeonidis ◽  
Ioannis Kompatsiaris

2019 ◽  
pp. 1-1 ◽  
Author(s):  
Heng Tao Shen ◽  
Jiajun Liu ◽  
Zi Huang ◽  
Chong-Wah Ngo ◽  
Wei Wang

Author(s):  
John R. Zhang ◽  
Jennifer Y. Ren ◽  
Fangzhe Chang ◽  
Thomas L. Wood ◽  
John R. Kender

Sign in / Sign up

Export Citation Format

Share Document