image description
Recently Published Documents


TOTAL DOCUMENTS

302
(FIVE YEARS 82)

H-INDEX

20
(FIVE YEARS 5)

2022 ◽  
Vol 2022 ◽  
pp. 1-9
Author(s):  
Junlong Feng ◽  
Jianping Zhao

Recent image captioning models based on the encoder-decoder framework have achieved remarkable success in humanlike sentence generation. However, an explicit separation between encoder and decoder brings out a disconnection between the image and sentence. It usually leads to a rough image description: the generated caption only contains main instances but neglects additional objects and scenes unexpectedly, which reduces the caption consistency of the image. To address this issue, we proposed an image captioning system within context-fused guidance in this paper. It incorporates regional and global image representation as the compositional visual features to learn the objects and attributes in images. To integrate image-level semantic information, the visual concept is employed. To avoid misleading decoding, a context fusion gate is introduced to calculate the textual context by selectively aggregating the information of visual concept and word embedding. Subsequently, the context-fused image guidance is formulated based on the compositional visual features and textual context. It provides the decoder with informative semantic knowledge. Finally, a captioner with a two-layer LSTM architecture is constructed to generate captions. Moreover, to overcome the exposure bias, we train the proposed model through sequence decision-making. The experiments conducted on the MS COCO dataset show the outstanding performance of our work. The linguistic analysis demonstrates that our model improves the caption consistency of the image.


2022 ◽  
Vol 183 ◽  
pp. 79-93
Author(s):  
Lulu Chen ◽  
Yongqiang Zhao ◽  
Jonathan Cheung-Wai Chan ◽  
Seong G. Kong

Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 7982
Author(s):  
Ziwei Tang ◽  
Yaohua Yi ◽  
Hao Sheng

Image captioning generates written descriptions of an image. In recent image captioning research, attention regions seldom cover all objects, and generated captions may lack the details of objects and may remain far from reality. In this paper, we propose a word guided attention (WGA) method for image captioning. First, WGA extracts word information using the embedded word and memory cell by applying transformation and multiplication. Then, WGA applies word information to the attention results and obtains the attended feature vectors via elementwise multiplication. Finally, we apply WGA with the words from different time steps to obtain previous word guided attention (PW) and current word attention (CW) in the decoder. Experiments on the MSCOCO dataset show that our proposed WGA can achieve competitive performance against state-of-the-art methods, with PW results of a 39.1 Bilingual Evaluation Understudy score (BLEU-4) and a 127.6 Consensus-Based Image Description Evaluation score (CIDEr-D); and CW results of a 39.1 BLEU-4 score and a 127.2 CIDER-D score on a Karpathy test split.


Author(s):  
Marina Di Napoli Pastore

Objetivo: Este ensaio fotográfico propõe pensarmos as práticas com as crianças nos mais diversos territórios, em diálogos constantes com suas realidades e contextos. Descrição da imagem: é trazida uma imagem de duas crianças numa comunidade urbana e a interlocução com a terapeuta ocupacional, em que mostram seus espaços de significado e de sentidos, em meio ao território, e nos fazem repensar, juntamente com as demais imagens ao longo do texto, as ações territoriais e práticas com crianças a partir e em diálogo com suas realidades e a produção das imagens por elas como apropriação do espaço.Palavras-chave: Criança. Fotografia. Terapia Ocupacional AbstractObjective: this photo essay proposes to think about the practices with children in the most diverse territories, in constant dialogues with their realities and contexts. Image description: An image of two children in an urban community is brought and the dialogue with the occupational therapist, in which they show their spaces of meaning and meanings in the middle of the territory and make us rethink, together with the other images throughout the text, the territorial and practical actions with children from and in dialogue with their realities and the production of images by them as appropriation of space.Keywords: Children. Photography. Occupational Therapy ResumenObjetivo: este ensayo fotográfico propone pensar las prácticas con los niños en los más diversos territorios, en diálogos constantes con sus realidades y contextos. Descripción de la imagen: se trae una imagen de dos niños en una comunidad urbana y el diálogo con el terapeuta ocupacional, en el que muestran sus espacios de significado y significados en medio del territorio y nos hacen repensar, junto con las otras imágenes a lo largo del texto, las acciones territoriales y prácticas con los niños desde y en diálogo con sus realidades y la producción de imágenes por ellos como apropiación del espacio.Palavras Clave: Niños y Niñas. Fotografía. Terapia Ocupacional 


2021 ◽  
Vol 5 (3) ◽  
pp. 22-30
Author(s):  
Valerii Onyshchenko ◽  
Yana Korolova ◽  
Andrii Nosyk

The subject of research is the features of formation of a generalized semantic network of concepts. The purpose of article is to substantiate the composition and main types of nodes and relationships characteristic of the semantic network of concepts, as well as the formation of a generalized semantic network of concepts for structural and linguistic recognition of images in computer systems and networks. Research methods: methods of the theory of mathematical logic, mathematical linguistics and set theory; methods of information visualization using graphs. Results: an approach to the formation of a generalized semantic network of structural and linguistic concepts of contour images of objects obtained at different shooting angles is proposed; the substantiation of the composition and types of nodes and relations, characteristic of the semantic network of concepts, has been carried out; the basic principles of constructing a semantic network of structural and linguistic concepts of recognition objects are formulated. Conclusions: To construct an image description that corresponds to the concept of semantic information processing and can be used in systems for collecting relevant images in computer systems and networks, it is advisable to use a structural-linguistic approach to recognition using a generalized semantic network of concepts. The use of this network in the classification and identification of objects can significantly expand the range of images accepted for consideration, taking into account different directions of shooting and different angles of camera deviation from nadir position.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Beibei Sun

In view of the issue that the features of the images in the shallow layer cannot be fully utilized when the image description is generated and the target association of the image cannot be sufficiently obtained, a generation method for the description of the acquisition of attention images is put forward in this paper. The proportions of the features of images at various depths are autonomously assigned based on the content data of the language model, and the images thus generated are all pictures with image features with attention. In this way, the effect of description generation of images has been improved. After the testing of the database, the results indicate that the calculation method of the algorithm put forward in this paper is more accurate than the top-down multimedia image algorithm generated by a single attention.


Author(s):  
Anish Banda

Abstract: In the model we proposed, we examine the deep neural networks-based image caption generation technique. We give image as input to the model, the technique give output in three different forms i.e., sentence in three different languages describing the image, mp3 audio file and an image file is also generated. In this model, we use the techniques of both computer vision and natural language processing. We are aiming to develop a model using the techniques of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) to build a model to generate a Caption. Target image is compared with the training images, we have a large dataset containing the training images, this is done by convolutional neural network. This model generates a decent description utilizing the trained data. To extract features from images we need encoder, we use CNN as encoder. To decode the description of image generated we use LSTM. To evaluate the accuracy of generated caption we use BLEU metric algorithm. It grades the quality of content generated. Performance is calculated by the standard calculation matrices. Keywords: CNN, RNN, LSTM, BLEU score, encoder, decoder, captions, image description.


Author(s):  
Javavrinda Vrindavanam ◽  
Raghunandan Srinath ◽  
Anisa Fathima ◽  
S. Arpitha ◽  
Chaitanya S Rao ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document