Multimodal Video Description

Author(s):  
Vasili Ramanishka ◽  
Abir Das ◽  
Dong Huk Park ◽  
Subhashini Venugopalan ◽  
Lisa Anne Hendricks ◽  
...  
Keyword(s):  
Author(s):  
Aditya Bodi ◽  
Pooyan Fazli ◽  
Shasta Ihorn ◽  
Yue-Ting Siu ◽  
Andrew T Scott ◽  
...  
Keyword(s):  

Author(s):  
Benedetto Ielpo ◽  
Antonio Giuliani ◽  
Patricia Sanchez ◽  
Fernando Burdio ◽  
Mikel Gastaka ◽  
...  

2018 ◽  
Vol 3 (1) ◽  
pp. 74-76
Author(s):  
Serge Kobsa ◽  
Robert A. Sorabella ◽  
Kyle Eudailey ◽  
Raymond Lee ◽  
Michael Borger ◽  
...  

2012 ◽  
Vol 21 (4) ◽  
pp. 1465-1477 ◽  
Author(s):  
Guoying Zhao ◽  
T. Ahonen ◽  
J. Matas ◽  
M. Pietikainen

ASVIDE ◽  
2018 ◽  
Vol 5 ◽  
pp. 749-749
Author(s):  
Edward D. Percy ◽  
Carlyn McNeely ◽  
Tamara Coffin ◽  
Mark J. Kearns ◽  
Ajmal Hafizi ◽  
...  

Author(s):  
Asha Shetty ◽  
Bryan Abreo ◽  
Adline D’Souza ◽  
Akarsha Kondana ◽  
Kavitha Mahesh Karimbi
Keyword(s):  

2020 ◽  
Vol 10 (12) ◽  
pp. 4312 ◽  
Author(s):  
Jie Xu ◽  
Haoliang Wei ◽  
Linke Li ◽  
Qiuru Fu ◽  
Jinhong Guo

Video description plays an important role in the field of intelligent imaging technology. Attention perception mechanisms are extensively applied in video description models based on deep learning. Most existing models use a temporal-spatial attention mechanism to enhance the accuracy of models. Temporal attention mechanisms can obtain the global features of a video, whereas spatial attention mechanisms obtain local features. Nevertheless, because each channel of the convolutional neural network (CNN) feature maps has certain spatial semantic information, it is insufficient to merely divide the CNN features into regions and then apply a spatial attention mechanism. In this paper, we propose a temporal-spatial and channel attention mechanism that enables the model to take advantage of various video features and ensures the consistency of visual features between sentence descriptions to enhance the effect of the model. Meanwhile, in order to prove the effectiveness of the attention mechanism, this paper proposes a video visualization model based on the video description. Experimental results show that, our model has achieved good performance on the Microsoft Video Description (MSVD) dataset and a certain improvement on the Microsoft Research-Video to Text (MSR-VTT) dataset.


2019 ◽  
Vol 29 (13) ◽  
pp. 773
Author(s):  
V. Misrai ◽  
E. Rijo ◽  
K. Zorn ◽  
N. Barry delongchamps ◽  
A. Descazeaud
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document