Image Captioning Using Deep Learning

Author(s):  
Bhavana D. ◽  
K. Chaitanya Krishna ◽  
Tejaswini K. ◽  
N. Venkata Vikas ◽  
A. N. V. Sahithya

The task of image caption generator is mainly about extracting the features and ongoings of an image and generating human-readable captions that translate the features of the objects in the image. The contents of an image can be described by having knowledge about natural language processing and computer vision. The features can be extracted using convolution neural networks which makes use of transfer learning to implement the exception model. It stands for extreme inception, which has a feature extraction base with 36 convolution layers. This shows accurate results when compared with the other CNNs. Recurrent neural networks are used for describing the image and to generate accurate sentences. The feature vector that is extracted by using the CNN is fed to the LSTM. The Flicker 8k dataset is used to train the network in which the data is labeled properly. The model will be able to generate accurate captions that nearly describe the activities carried in the image when an input image is given to it. Further, the authors use the BLEU scores to validate the model.

2019 ◽  
Vol 3 (2) ◽  
pp. 31-40 ◽  
Author(s):  
Ahmed Shamsaldin ◽  
Polla Fattah ◽  
Tarik Rashid ◽  
Nawzad Al-Salihi

At present, deep learning is widely used in a broad range of arenas. A convolutional neural networks (CNN) is becoming the star of deep learning as it gives the best and most precise results when cracking real-world problems. In this work, a brief description of the applications of CNNs in two areas will be presented: First, in computer vision, generally, that is, scene labeling, face recognition, action recognition, and image classification; Second, in natural language processing, that is, the fields of speech recognition and text classification.


2019 ◽  
Author(s):  
Antônio Franco ◽  
Leonardo Oliveira

Currently, there are several approaches to provide anonymity on the Internet. However, one can still identify anonymous users through their writing style. With the advances in neural network and natural language processing research, the success of a classifier when accurately identify the author of a text is growing. On the other hand, new approaches that use recurrent neural networks for automatic generation of obfuscated texts have also arisen to fight anonymity adversaries. In this work, we evaluate two approaches that use neural networks to generate obfuscated texts. In our experiments, we compared the efficiency of both techniques when removing the stylistic attributes of a text and preserving its original semantics. Our results show a trade-off between the obfuscation level and the text semantics.


2019 ◽  
Vol 27 (3) ◽  
pp. 457-470 ◽  
Author(s):  
Stephen Wu ◽  
Kirk Roberts ◽  
Surabhi Datta ◽  
Jingcheng Du ◽  
Zongcheng Ji ◽  
...  

Abstract Objective This article methodically reviews the literature on deep learning (DL) for natural language processing (NLP) in the clinical domain, providing quantitative analysis to answer 3 research questions concerning methods, scope, and context of current research. Materials and Methods We searched MEDLINE, EMBASE, Scopus, the Association for Computing Machinery Digital Library, and the Association for Computational Linguistics Anthology for articles using DL-based approaches to NLP problems in electronic health records. After screening 1,737 articles, we collected data on 25 variables across 212 papers. Results DL in clinical NLP publications more than doubled each year, through 2018. Recurrent neural networks (60.8%) and word2vec embeddings (74.1%) were the most popular methods; the information extraction tasks of text classification, named entity recognition, and relation extraction were dominant (89.2%). However, there was a “long tail” of other methods and specific tasks. Most contributions were methodological variants or applications, but 20.8% were new methods of some kind. The earliest adopters were in the NLP community, but the medical informatics community was the most prolific. Discussion Our analysis shows growing acceptance of deep learning as a baseline for NLP research, and of DL-based NLP in the medical community. A number of common associations were substantiated (eg, the preference of recurrent neural networks for sequence-labeling named entity recognition), while others were surprisingly nuanced (eg, the scarcity of French language clinical NLP with deep learning). Conclusion Deep learning has not yet fully penetrated clinical NLP and is growing rapidly. This review highlighted both the popular and unique trends in this active field.


It is always beneficial to reassess the previously done work to create interest and develop understanding about the subject in importance. In computer vision, to perform the task of feature extraction, classification or segmentation, measurement and assessment of image structures (medical images, natural images etc.) is to be done very efficiently. In the field of image processing numerous techniques are available, but it is very difficult to perform these tasks due to noise and other variable artifacts. Various Deep machine learning algorithms are used to perform complex task of recognition and computer vision. Recently Convolutional Neural Networks (CNNs-back bone of numerous deep learning algorithms) have shown state of the art performance in high level computer vision tasks, such as object detection, object recognition, classification, machine translation, semantic segmentation, speech recognition, scene labelling, medical imaging, robotics and control, , natural language processing (NLP), bio-informatics, cybersecurity, and many others. Convolution neural networks is the attempt to combine mathematics to computer science with icing of biology on it. CNNs work in two parts. The first part is mathematics that supports feature extraction and second part is about classification and prediction at pixel level. This review is intended for those who want to grab the complete knowledge about CNN, their development form ancient age to modern state of art system of deep learning system. This review paper is organized in three steps: in the first step introduction about the concept is given along with necessary background information. In the second step other highlights and related work proposed by various authors is explained. Third step is the complete layer wise architecture of convolution networks. The last section is followed by detailed discussion on improvements, and challenges on these deep learning techniques. Most papers consider for this review are later than 2012 from when the history of convolution neural networks and deep learning begins


2020 ◽  
Author(s):  
Kyle Mahowald ◽  
George Kachergis ◽  
Michael C. Frank

Ambridge (2019) calls for exemplar-based accounts of language acquisition. Do modern neural networks such as transformers or word2vec – which have been extremely successful in modern natural language processing (NLP) applications – count? Although these models often have ample parametric complexity to store exemplars from their training data, they also go far beyond simple storage by processing and compressing the input via their architectural constraints. The resulting representations have been shown to encode emergent abstractions. If these models are exemplar-based then Ambridge’s theory only weakly constrains future work. On the other hand, if these systems are not exemplar models, why is it that true exemplar models are not contenders in modern NLP?


2021 ◽  
Vol 17 (9) ◽  
pp. e1009345
Author(s):  
Zhengqiao Zhao ◽  
Stephen Woloszynek ◽  
Felix Agbavor ◽  
Joshua Chang Mell ◽  
Bahrad A. Sokhansanj ◽  
...  

Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno, achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool).


2021 ◽  
Author(s):  
Weihao Zhuang ◽  
Tristan Hascoet ◽  
Xunquan Chen ◽  
Ryoichi Takashima ◽  
Tetsuya Takiguchi ◽  
...  

Abstract Currently, deep learning plays an indispensable role in many fields, including computer vision, natural language processing, and speech recognition. Convolutional Neural Networks (CNNs) have demonstrated excellent performance in computer vision tasks thanks to their powerful feature extraction capability. However, as the larger models have shown higher accuracy, recent developments have led to state-of-the-art CNN models with increasing resource consumption. This paper investigates a conceptual approach to reduce the memory consumption of CNN inference. Our method consists of processing the input image in a sequence of carefully designed tiles within the lower subnetwork of the CNN, so as to minimize its peak memory consumption, while keeping the end-to-end computation unchanged. This method introduces a trade-off between memory consumption and computations, which is particularly suitable for high-resolution inputs. Our experimental results show that MobileNetV2 memory consumption can be reduced by up to 5.3 times with our proposed method. For ResNet50, one of the most commonly used CNN models in computer vision tasks, memory can be optimized by up to 2.3 times.


2020 ◽  
Vol 2020 ◽  
pp. 1-13 ◽  
Author(s):  
Haoran Wang ◽  
Yue Zhang ◽  
Xiaosheng Yu

In recent years, with the rapid development of artificial intelligence, image caption has gradually attracted the attention of many researchers in the field of artificial intelligence and has become an interesting and arduous task. Image caption, automatically generating natural language descriptions according to the content observed in an image, is an important part of scene understanding, which combines the knowledge of computer vision and natural language processing. The application of image caption is extensive and significant, for example, the realization of human-computer interaction. This paper summarizes the related methods and focuses on the attention mechanism, which plays an important role in computer vision and is recently widely used in image caption generation tasks. Furthermore, the advantages and the shortcomings of these methods are discussed, providing the commonly used datasets and evaluation criteria in this field. Finally, this paper highlights some open challenges in the image caption task.


Sign in / Sign up

Export Citation Format

Share Document