Gated Object-Attribute Matching Network for Detailed Image Caption
Image caption enables computers to generate a text description of images automatically. However, the generated description is not good enough recently. Computers can describe what objects are in the image but cannot give more details about these objects. In this study, we present a novel image caption approach to give more details when describing objects. In detail, a visual attention-based LSTM is used to find the objects, as well as a semantic attention-based LSTM is used for giving semantic attributes. At last, a gated object-attribute matching network is used to match the objects to their semantic attributes. The experiments on the public datasets of Flickr30k and MSCOCO demonstrate that the proposed approach improved the quality of the image caption, compared with the most advanced methods at present.