image description Latest Research Papers

Context-Fused Guidance for Image Captioning Using Sequence-Level Training

Computational Intelligence and Neuroscience ◽

10.1155/2022/9743123 ◽

2022 ◽

Vol 2022 ◽

pp. 1-9

Author(s):

Junlong Feng ◽

Jianping Zhao

Keyword(s):

Image Guidance ◽

Semantic Information ◽

Semantic Knowledge ◽

Visual Features ◽

Image Description ◽

Visual Concept ◽

Image Captioning ◽

Proposed Model ◽

Fused Image ◽

Level Training

Recent image captioning models based on the encoder-decoder framework have achieved remarkable success in humanlike sentence generation. However, an explicit separation between encoder and decoder brings out a disconnection between the image and sentence. It usually leads to a rough image description: the generated caption only contains main instances but neglects additional objects and scenes unexpectedly, which reduces the caption consistency of the image. To address this issue, we proposed an image captioning system within context-fused guidance in this paper. It incorporates regional and global image representation as the compositional visual features to learn the objects and attributes in images. To integrate image-level semantic information, the visual concept is employed. To avoid misleading decoding, a context fusion gate is introduced to calculate the textual context by selectively aggregating the information of visual concept and word embedding. Subsequently, the context-fused image guidance is formulated based on the compositional visual features and textual context. It provides the decoder with informative semantic knowledge. Finally, a captioner with a two-layer LSTM architecture is constructed to generate captions. Moreover, to overcome the exposure bias, we train the proposed model through sequence decision-making. The experiments conducted on the MS COCO dataset show the outstanding performance of our work. The linguistic analysis demonstrates that our model improves the caption consistency of the image.

Download Full-text

Histograms of oriented mosaic gradients for snapshot spectral image description

ISPRS Journal of Photogrammetry and Remote Sensing ◽

10.1016/j.isprsjprs.2021.10.018 ◽

2022 ◽

Vol 183 ◽

pp. 79-93

Author(s):

Lulu Chen ◽

Yongqiang Zhao ◽

Jonathan Cheung-Wai Chan ◽

Seong G. Kong

Keyword(s):

Image Description ◽

Spectral Image

Download Full-text

Grayscale-inversion and rotation invariant image description using local ternary derivative pattern with dominant structure encoding

Expert Systems with Applications ◽

10.1016/j.eswa.2021.116327 ◽

2021 ◽

pp. 116327

Author(s):

Tiecheng Song ◽

Yuanjing Han ◽

Shuang Li ◽

Chuchu Zhao

Keyword(s):

Image Description ◽

Rotation Invariant ◽

Dominant Structure ◽

Ternary Derivative

Download Full-text

Attention-Guided Image Captioning through Word Information

Sensors ◽

10.3390/s21237982 ◽

2021 ◽

Vol 21 (23) ◽

pp. 7982

Author(s):

Ziwei Tang ◽

Yaohua Yi ◽

Hao Sheng

Keyword(s):

State Of The Art ◽

Memory Cell ◽

Competitive Performance ◽

Image Description ◽

Image Captioning ◽

Research Attention ◽

Evaluation Score ◽

Written Descriptions ◽

Word Attention ◽

Current Word

Image captioning generates written descriptions of an image. In recent image captioning research, attention regions seldom cover all objects, and generated captions may lack the details of objects and may remain far from reality. In this paper, we propose a word guided attention (WGA) method for image captioning. First, WGA extracts word information using the embedded word and memory cell by applying transformation and multiplication. Then, WGA applies word information to the attention results and obtains the attended feature vectors via elementwise multiplication. Finally, we apply WGA with the words from different time steps to obtain previous word guided attention (PW) and current word attention (CW) in the decoder. Experiments on the MSCOCO dataset show that our proposed WGA can achieve competitive performance against state-of-the-art methods, with PW results of a 39.1 Bilingual Evaluation Understudy score (BLEU-4) and a 127.6 Consensus-Based Image Description Evaluation score (CIDEr-D); and CW results of a 39.1 BLEU-4 score and a 127.2 CIDER-D score on a Karpathy test split.

Download Full-text

“Vamos te levar no ponto mais alto daqui”: conhecendo ações da terapia ocupacional com crianças/"We'll take you to the highest point here": knowing occupational therapy actions with children

Revista Interinstitucional Brasileira de Terapia Ocupacional - REVISBRATO ◽

10.47222/2526-3544.rbto44499 ◽

2021 ◽

Vol 5 (4) ◽

pp. 475-483

Author(s):

Marina Di Napoli Pastore

Keyword(s):

Occupational Therapy ◽

Urban Community ◽

Occupational Therapist ◽

The Other ◽

Image Description ◽

Photo Essay

Objetivo: Este ensaio fotográfico propõe pensarmos as práticas com as crianças nos mais diversos territórios, em diálogos constantes com suas realidades e contextos. Descrição da imagem: é trazida uma imagem de duas crianças numa comunidade urbana e a interlocução com a terapeuta ocupacional, em que mostram seus espaços de significado e de sentidos, em meio ao território, e nos fazem repensar, juntamente com as demais imagens ao longo do texto, as ações territoriais e práticas com crianças a partir e em diálogo com suas realidades e a produção das imagens por elas como apropriação do espaço.Palavras-chave: Criança. Fotografia. Terapia Ocupacional AbstractObjective: this photo essay proposes to think about the practices with children in the most diverse territories, in constant dialogues with their realities and contexts. Image description: An image of two children in an urban community is brought and the dialogue with the occupational therapist, in which they show their spaces of meaning and meanings in the middle of the territory and make us rethink, together with the other images throughout the text, the territorial and practical actions with children from and in dialogue with their realities and the production of images by them as appropriation of space.Keywords: Children. Photography. Occupational Therapy ResumenObjetivo: este ensayo fotográfico propone pensar las prácticas con los niños en los más diversos territorios, en diálogos constantes con sus realidades y contextos. Descripción de la imagen: se trae una imagen de dos niños en una comunidad urbana y el diálogo con el terapeuta ocupacional, en el que muestran sus espacios de significado y significados en medio del territorio y nos hacen repensar, junto con las otras imágenes a lo largo del texto, las acciones territoriales y prácticas con los niños desde y en diálogo con sus realidades y la producción de imágenes por ellos como apropiación del espacio.Palavras Clave: Niños y Niñas. Fotografía. Terapia Ocupacional

Download Full-text

Formation of a generalized semantic network of concepts

Advanced Information Systems ◽

10.20998/2522-9052.2021.3.04 ◽

2021 ◽

Vol 5 (3) ◽

pp. 22-30

Author(s):

Valerii Onyshchenko ◽

Yana Korolova ◽

Andrii Nosyk

Keyword(s):

Set Theory ◽

Information Visualization ◽

Semantic Information ◽

Semantic Network ◽

Computer Systems ◽

Mathematical Linguistics ◽

Image Description ◽

Basic Principles ◽

The Subject ◽

Linguistic Approach

The subject of research is the features of formation of a generalized semantic network of concepts. The purpose of article is to substantiate the composition and main types of nodes and relationships characteristic of the semantic network of concepts, as well as the formation of a generalized semantic network of concepts for structural and linguistic recognition of images in computer systems and networks. Research methods: methods of the theory of mathematical logic, mathematical linguistics and set theory; methods of information visualization using graphs. Results: an approach to the formation of a generalized semantic network of structural and linguistic concepts of contour images of objects obtained at different shooting angles is proposed; the substantiation of the composition and types of nodes and relations, characteristic of the semantic network of concepts, has been carried out; the basic principles of constructing a semantic network of structural and linguistic concepts of recognition objects are formulated. Conclusions: To construct an image description that corresponds to the concept of semantic information processing and can be used in systems for collecting relevant images in computer systems and networks, it is advisable to use a structural-linguistic approach to recognition using a generalized semantic network of concepts. The use of this network in the classification and identification of objects can significantly expand the range of images accepted for consideration, taking into account different directions of shooting and different angles of camera deviation from nadir position.

Download Full-text

Attention Feature Network Extraction Combined with the Generation Algorithm of Multimedia Image Description

Advances in Multimedia ◽

10.1155/2021/6484128 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Beibei Sun

Keyword(s):

Calculation Method ◽

Language Model ◽

Image Features ◽

Image Description ◽

Top Down ◽

Generation Algorithm ◽

Shallow Layer ◽

Multimedia Image ◽

Target Association

In view of the issue that the features of the images in the shallow layer cannot be fully utilized when the image description is generated and the target association of the image cannot be sufficiently obtained, a generation method for the description of the acquisition of attention images is put forward in this paper. The proportions of the features of images at various depths are autonomously assigned based on the content data of the language model, and the images thus generated are all pictures with image features with attention. In this way, the effect of description generation of images has been improved. After the testing of the database, the results indicate that the calculation method of the algorithm put forward in this paper is more accurate than the top-down multimedia image algorithm generated by a single attention.

Download Full-text

Grayscale-inversion and rotation invariant image description with sorted LBP features

Signal Processing Image Communication ◽

10.1016/j.image.2021.116491 ◽

2021 ◽

pp. 116491

Author(s):

Yuanjing Han ◽

Tiecheng Song ◽

Jie Feng ◽

Yurui Xie

Keyword(s):

Image Description ◽

Rotation Invariant

Download Full-text

Image Captioning using CNN and LSTM

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.37846 ◽

2021 ◽

Vol 9 (8) ◽

pp. 2666-2669

Author(s):

Anish Banda

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Language Processing ◽

Short Term Memory ◽

Image Description ◽

Image Captioning ◽

Training Images ◽

Long Short Term Memory ◽

Standard Calculation

Abstract: In the model we proposed, we examine the deep neural networks-based image caption generation technique. We give image as input to the model, the technique give output in three different forms i.e., sentence in three different languages describing the image, mp3 audio file and an image file is also generated. In this model, we use the techniques of both computer vision and natural language processing. We are aiming to develop a model using the techniques of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) to build a model to generate a Caption. Target image is compared with the training images, we have a large dataset containing the training images, this is done by convolutional neural network. This model generates a decent description utilizing the trained data. To extract features from images we need encoder, we use CNN as encoder. To decode the description of image generated we use LSTM. To evaluate the accuracy of generated caption we use BLEU metric algorithm. It grades the quality of content generated. Performance is calculated by the standard calculation matrices. Keywords: CNN, RNN, LSTM, BLEU score, encoder, decoder, captions, image description.

Download Full-text

Machine Learning based approach to Image Description for the Visually Impaired

10.1109/asiancon51346.2021.9544867 ◽

2021 ◽

Author(s):

Javavrinda Vrindavanam ◽

Raghunandan Srinath ◽

Anisa Fathima ◽

S. Arpitha ◽

Chaitanya S Rao ◽

...

Keyword(s):

Machine Learning ◽

Visually Impaired ◽

Image Description

Download Full-text

image description
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Context-Fused Guidance for Image Captioning Using Sequence-Level Training

Histograms of oriented mosaic gradients for snapshot spectral image description

Grayscale-inversion and rotation invariant image description using local ternary derivative pattern with dominant structure encoding

Attention-Guided Image Captioning through Word Information

“Vamos te levar no ponto mais alto daqui”: conhecendo ações da terapia ocupacional com crianças/"We'll take you to the highest point here": knowing occupational therapy actions with children

Formation of a generalized semantic network of concepts

Attention Feature Network Extraction Combined with the Generation Algorithm of Multimedia Image Description

Grayscale-inversion and rotation invariant image description with sorted LBP features

Image Captioning using CNN and LSTM

Machine Learning based approach to Image Description for the Visually Impaired

Export Citation Format

image descriptionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Context-Fused Guidance for Image Captioning Using Sequence-Level Training

Histograms of oriented mosaic gradients for snapshot spectral image description

Grayscale-inversion and rotation invariant image description using local ternary derivative pattern with dominant structure encoding

Attention-Guided Image Captioning through Word Information

“Vamos te levar no ponto mais alto daqui”: conhecendo ações da terapia ocupacional com crianças/"We'll take you to the highest point here": knowing occupational therapy actions with children

Formation of a generalized semantic network of concepts

Attention Feature Network Extraction Combined with the Generation Algorithm of Multimedia Image Description

Grayscale-inversion and rotation invariant image description with sorted LBP features

Image Captioning using CNN and LSTM

Machine Learning based approach to Image Description for the Visually Impaired

image description
Recently Published Documents