scholarly journals A thorough review of models, evaluation metrics, and datasets on image captioning

2021 ◽  
Author(s):  
Gaifang Luo ◽  
Lijun Cheng ◽  
Chao Jing ◽  
Can Zhao ◽  
Guozhu Song
2019 ◽  
Vol 8 (2S11) ◽  
pp. 3290-3293

Image Description involves generating a textual description of images which is essential for the problem of image understanding. The variable and ambiguous nature of possible image descriptions make this task challenging. There are different approaches for automated image captioning which explain the image contents along with a complete understanding of the image, rather than just simply classifying it into a particular object type. However, learning image contexts from the text and generating image descriptions similar to human's description requires to focus on important features of the image using attention mechanism. We provide an outline of the various recent works in image description models employing various attention mechanism. We present an analysis of the various approaches, datasets and evaluation metrics that are utilized for image description. We showcase a model using the encoder-decoder attention mechanism based on Flickr dataset and evaluate the performance using BLEU metrics.


2020 ◽  
Vol 68 ◽  
pp. 661-689
Author(s):  
Omid Mohamad Nezami ◽  
Mark Dras ◽  
Stephen Wan ◽  
Cecile Paris

Benefiting from advances in machine vision and natural language processing techniques, current image captioning systems are able to generate detailed visual descriptions. For the most part, these descriptions represent an objective characterisation of the image, although some models do incorporate subjective aspects related to the observer’s view of the image, such as sentiment; current models, however, usually do not consider the emotional content of images during the caption generation process. This paper addresses this issue by proposing novel image captioning models which use facial expression features to generate image captions. The models generate image captions using long short-term memory networks applying facial features in addition to other visual features at different time steps. We compare a comprehensive collection of image captioning models with and without facial features using all standard evaluation metrics. The evaluation metrics indicate that applying facial features with an attention mechanism achieves the best performance, showing more expressive and more correlated image captions, on an image caption dataset extracted from the standard Flickr 30K dataset, consisting of around 11K images containing faces. An analysis of the generated captions finds that, perhaps unexpectedly, the improvement in caption quality appears to come not from the addition of adjectives linked to emotional aspects of the images, but from more variety in the actions described in the captions.


Author(s):  
Chen Chen ◽  
Shuai Mu ◽  
Wanpeng Xiao ◽  
Zexiong Ye ◽  
Liesi Wu ◽  
...  

In this paper, we propose a novel conditional-generativeadversarial-nets-based image captioning framework as an extension of traditional reinforcement-learning (RL)-based encoder-decoder architecture. To deal with the inconsistent evaluation problem among different objective language metrics, we are motivated to design some “discriminator” networks to automatically and progressively determine whether generated caption is human described or machine generated. Two kinds of discriminator architectures (CNN and RNNbased structures) are introduced since each has its own advantages. The proposed algorithm is generic so that it can enhance any existing RL-based image captioning framework and we show that the conventional RL training method is just a special case of our approach. Empirically, we show consistent improvements over all language evaluation metrics for different state-of-the-art image captioning models. In addition, the well-trained discriminators can also be viewed as objective image captioning evaluators.


2019 ◽  
Vol 31 (7) ◽  
pp. 1122
Author(s):  
Fan Lyu ◽  
Fuyuan Hu ◽  
Yanning Zhang ◽  
Zhenping Xia ◽  
S Sheng Victor

2020 ◽  
Author(s):  
Abdulrahman Takiddin ◽  
Jens Schneider ◽  
Yin Yang ◽  
Alaa Abd-Alrazaq ◽  
Mowafa Househ

BACKGROUND Skin cancer is the most common cancer type affecting humans. Traditional skin cancer diagnosis methods are costly, require a professional physician, and take time. Hence, to aid in diagnosing skin cancer, Artificial Intelligence (AI) tools are being used, including shallow and deep machine learning-based techniques that are trained to detect and classify skin cancer using computer algorithms and deep neural networks. OBJECTIVE The aim of this study is to identify and group the different types of AI-based technologies used to detect and classify skin cancer. The study also examines the reliability of the selected papers by studying the correlation between the dataset size and number of diagnostic classes with the performance metrics used to evaluate the models. METHODS We conducted a systematic search for articles using IEEE Xplore, ACM DL, and Ovid MEDLINE databases following the PRISMA Extension for Scoping Reviews (PRISMA-ScR) guidelines. The study included in this scoping review had to fulfill several selection criteria; to be specifically about skin cancer, detecting or classifying skin cancer, and using AI technologies. Study selection and data extraction were conducted by two reviewers independently. Extracted data were synthesized narratively, where studies were grouped based on the diagnostic AI techniques and their evaluation metrics. RESULTS We retrieved 906 papers from the 3 databases, but 53 studies were eligible for this review. While shallow techniques were used in 14 studies, deep techniques were utilized in 39 studies. The studies used accuracy (n=43/53), the area under receiver operating characteristic curve (n=5/53), sensitivity (n=3/53), and F1-score (n=2/53) to assess the proposed models. Studies that use smaller datasets and fewer diagnostic classes tend to have higher reported accuracy scores. CONCLUSIONS The adaptation of AI in the medical field facilitates the diagnosis process of skin cancer. However, the reliability of most AI tools is questionable since small datasets or low numbers of diagnostic classes are used. In addition, a direct comparison between methods is hindered by a varied use of different evaluation metrics and image types.


Sign in / Sign up

Export Citation Format

Share Document