Learning to Generate Grounded Visual Captions Without Localization Supervision
2020 ◽
pp. 353-370