A Multi-instance Multi-label Dual Learning Approach for Video Captioning

Author(s):  
Wanting Ji ◽  
Ruili Wang

Video captioning is a challenging task in the field of multimedia processing, which aims to generate informative natural language descriptions/captions to describe video contents. Previous video captioning approaches mainly focused on capturing visual information in videos using an encoder-decoder structure to generate video captions. Recently, a new encoder-decoder-reconstructor structure was proposed for video captioning, which captured the information in both videos and captions. Based on this, this article proposes a novel multi-instance multi-label dual learning approach (MIMLDL) to generate video captions based on the encoder-decoder-reconstructor structure. Specifically, MIMLDL contains two modules: caption generation and video reconstruction modules. The caption generation module utilizes a lexical fully convolutional neural network (Lexical FCN) with a weakly supervised multi-instance multi-label learning mechanism to learn a translatable mapping between video regions and lexical labels to generate video captions. Then the video reconstruction module synthesizes visual sequences to reproduce raw videos using the outputs of the caption generation module. A dual learning mechanism fine-tunes the two modules according to the gap between the raw and the reproduced videos. Thus, our approach can minimize the semantic gap between raw videos and the generated captions by minimizing the differences between the reproduced and the raw visual sequences. Experimental results on a benchmark dataset demonstrate that MIMLDL can improve the accuracy of video captioning.


2021 ◽  
Vol 30 ◽  
pp. 2826-2836 ◽  
Author(s):  
Yifeng Ding ◽  
Zhanyu Ma ◽  
Shaoguo Wen ◽  
Jiyang Xie ◽  
Dongliang Chang ◽  
...  


2018 ◽  
Vol 2018 ◽  
pp. 1-9 ◽  
Author(s):  
Siyuan Zhao ◽  
Zhiwei Xu ◽  
Limin Liu ◽  
Mengjie Guo ◽  
Jing Yun

Convolutional neural network (CNN) has revolutionized the field of natural language processing, which is considerably efficient at semantics analysis that underlies difficult natural language processing problems in a variety of domains. The deceptive opinion detection is an important application of the existing CNN models. The detection mechanism based on CNN models has better self-adaptability and can effectively identify all kinds of deceptive opinions. Online opinions are quite short, varying in their types and content. In order to effectively identify deceptive opinions, we need to comprehensively study the characteristics of deceptive opinions and explore novel characteristics besides the textual semantics and emotional polarity that have been widely used in text analysis. In this paper, we optimize the convolutional neural network model by embedding the word order characteristics in its convolution layer and pooling layer, which makes convolutional neural network more suitable for short text classification and deceptive opinions detection. The TensorFlow-based experiments demonstrate that the proposed detection mechanism achieves more accurate deceptive opinion detection results.



2017 ◽  
Vol 2017 (13) ◽  
pp. 1847-1850 ◽  
Author(s):  
Bendong Tan ◽  
Jun Yang ◽  
Xueli Pan ◽  
Jun Li ◽  
Peiyuan Xie ◽  
...  




2021 ◽  
pp. 20201263
Author(s):  
Mohammad Salehi ◽  
Reza Mohammadi ◽  
Hamed Ghaffari ◽  
Nahid Sadighi ◽  
Reza Reiazi

Objective: Pneumonia is a lung infection and causes the inflammation of the small air sacs (Alveoli) in one or both lungs. Proper and faster diagnosis of pneumonia at an early stage is imperative for optimal patient care. Currently, chest X-ray is considered as the best imaging modality for diagnosing pneumonia. However, the interpretation of chest X-ray images is challenging. To this end, we aimed to use an automated convolutional neural network-based transfer-learning approach to detect pneumonia in paediatric chest radiographs. Methods: Herein, an automated convolutional neural network-based transfer-learning approach using four different pre-trained models (i.e. VGG19, DenseNet121, Xception, and ResNet50) was applied to detect pneumonia in children (1–5 years) chest X-ray images. The performance of different proposed models for testing data set was evaluated using five performances metrics, including accuracy, sensitivity/recall, Precision, area under curve, and F1 score. Results: All proposed models provide accuracy greater than 83.0% for binary classification. The pre-trained DenseNet121 model provides the highest classification performance of automated pneumonia classification with 86.8% accuracy, followed by Xception model with an accuracy of 86.0%. The sensitivity of the proposed models was greater than 91.0%. The Xception and DenseNet121 models achieve the highest classification performance with F1-score greater than 89.0%. The plotted area under curve of receiver operating characteristics of VGG19, Xception, ResNet50, and DenseNet121 models are 0.78, 0.81, 0.81, and 0.86, respectively. Conclusion: Our data showed that the proposed models achieve a high accuracy for binary classification. Transfer learning was used to accelerate training of the proposed models and resolve the problem associated with insufficient data. We hope that these proposed models can help radiologists for a quick diagnosis of pneumonia at radiology departments. Moreover, our proposed models may be useful to detect other chest-related diseases such as novel Coronavirus 2019. Advances in knowledge: Herein, we used transfer learning as a machine learning approach to accelerate training of the proposed models and resolve the problem associated with insufficient data. Our proposed models achieved accuracy greater than 83.0% for binary classification.





Sign in / Sign up

Export Citation Format

Share Document