Graph convolutional network meta-learning with multi-granularity POS guidance for video captioning

2021 ◽  
Author(s):  
Ping Li ◽  
Pan Zhang ◽  
Xianghua Xu
Author(s):  
Kuncheng Fang ◽  
Lian Zhou ◽  
Cheng Jin ◽  
Yuejie Zhang ◽  
Kangnian Weng ◽  
...  

Automatically generating natural language description for video is an extremely complicated and challenging task. To tackle the obstacles of traditional LSTM-based model for video captioning, we propose a novel architecture to generate the optimal descriptions for videos, which focuses on constructing a new network structure that can generate sentences superior to the basic model with LSTM, and establishing special attention mechanisms that can provide more useful visual information for caption generation. This scheme discards the traditional LSTM, and exploits the fully convolutional network with coarse-to-fine and inherited attention designed according to the characteristics of fully convolutional structure. Our model cannot only outperform the basic LSTM-based model, but also achieve the comparable performance with those of state-of-the-art methods


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 172859-172868
Author(s):  
Zhengwei Ma ◽  
Sensen Guo ◽  
Gang Xu ◽  
Saddam Aziz

Sensors ◽  
2020 ◽  
Vol 20 (20) ◽  
pp. 5966
Author(s):  
Ke Wang ◽  
Gong Zhang

The challenge of small data has emerged in synthetic aperture radar automatic target recognition (SAR-ATR) problems. Most SAR-ATR methods are data-driven and require a lot of training data that are expensive to collect. To address this challenge, we propose a recognition model that incorporates meta-learning and amortized variational inference (AVI). Specifically, the model consists of global parameters and task-specific parameters. The global parameters, trained by meta-learning, construct a common feature extractor shared between all recognition tasks. The task-specific parameters, modeled by probability distributions, can adapt to new tasks with a small amount of training data. To reduce the computation and storage cost, the task-specific parameters are inferred by AVI implemented with set-to-set functions. Extensive experiments were conducted on a real SAR dataset to evaluate the effectiveness of the model. The results of the proposed approach compared with those of the latest SAR-ATR methods show the superior performance of our model, especially on recognition tasks with limited data.


2019 ◽  
Vol 23 (1) ◽  
pp. 147-159
Author(s):  
Shagan Sah ◽  
Thang Nguyen ◽  
Ray Ptucha

Author(s):  
Alok Singh ◽  
Thoudam Doren Singh ◽  
Sivaji Bandyopadhyay
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document