A Multi-task Learning Approach for Image Captioning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/168 ◽

2018 ◽

Cited By ~ 10

Author(s):

Wei Zhao ◽

Benyou Wang ◽

Jianbo Ye ◽

Min Yang ◽

Zhou Zhao ◽

...

Keyword(s):

Object Classification ◽

Classification Model ◽

Learning Approach ◽

Generation Model ◽

Image Captioning ◽

Generation Task ◽

Task Learning ◽

Image Representations ◽

Multiple Domains ◽

Image Descriptions

In this paper, we propose a Multi-task Learning Approach for Image Captioning (MLAIC ), motivated by the fact that humans have no difficulty performing such task because they possess capabilities of multiple domains. Specifically, MLAIC consists of three key components: (i) A multi-object classification model that learns rich category-aware image representations using a CNN image encoder; (ii) A syntax generation model that learns better syntax-aware LSTM based decoder; (iii) An image captioning model that generates image descriptions in text, sharing its CNN encoder and LSTM decoder with the object classification task and the syntax generation task, respectively. In particular, the image captioning model can benefit from the additional object categorization and syntax knowledge. To verify the effectiveness of our approach, we conduct extensive experiments on MS-COCO dataset. The experimental results demonstrate that our model achieves impressive results compared to other strong competitors.

Download Full-text

A Multi-Task Learning Approach for Answer Selection: A Study and a Chinese Law Dataset

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019935 ◽

2019 ◽

Vol 33 ◽

pp. 9935-9936

Author(s):

Wenyu Du ◽

Baocheng Li ◽

Min Yang ◽

Qiang Qu ◽

Ying Shen

Keyword(s):

Question Answering ◽

Selection Model ◽

Classification Model ◽

Learning Approach ◽

Selection System ◽

Document Representation ◽

High Quality ◽

Chinese Law ◽

Task Learning ◽

Multiple Domains

In this paper, we propose a Multi-Task learning approach for Answer Selection (MTAS), motivated by the fact that humans have no difficulty performing such task because they possess capabilities of multiple domains (tasks). Specifically, MTAS consists of two key components: (i) A category classification model that learns rich category-aware document representation; (ii) An answer selection model that provides the matching scores of question-answer pairs. These two tasks work on a shared document encoding layer, and they cooperate to learn a high-quality answer selection system. In addition, a multi-head attention mechanism is proposed to learn important information from different representation subspaces at different positions. We manually annotate the first Chinese question answering dataset in law domain (denoted as LawQA) to evaluate the effectiveness of our model. The experimental results show that our model MTAS consistently outperforms the compared methods.1

Download Full-text

A Multi-Task Learning Framework for Abstractive Text Summarization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019987 ◽

2019 ◽

Vol 33 ◽

pp. 9987-9988 ◽

Cited By ~ 1

Author(s):

Yao Lu ◽

Linqing Liu ◽

Zhile Jiang ◽

Min Yang ◽

Randy Goebel

Keyword(s):

Text Categorization ◽

Text Summarization ◽

Significant Benefit ◽

Experimental Results ◽

Learning Approach ◽

Learning Framework ◽

Task Learning ◽

Multiple Domains ◽

Categorization Model ◽

Three Components

We propose a Multi-task learning approach for Abstractive Text Summarization (MATS), motivated by the fact that humans have no difficulty performing such task because they have the capabilities of multiple domains. Specifically, MATS consists of three components: (i) a text categorization model that learns rich category-specific text representations using a bi-LSTM encoder; (ii) a syntax labeling model that learns to improve the syntax-aware LSTM decoder; and (iii) an abstractive text summarization model that shares its encoder and decoder with the text categorization and the syntax labeling tasks, respectively. In particular, the abstractive text summarization model enjoys significant benefit from the additional text categorization and syntax knowledge. Our experimental results show that MATS outperforms the competitors.1

Download Full-text

Understanding Natural Disaster Scenes from Mobile Images Using Deep Learning

Applied Sciences ◽

10.3390/app11093952 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3952

Author(s):

Shimin Tang ◽

Zhiqiang Chen

Keyword(s):

Deep Learning ◽

Natural Disaster ◽

Scene Understanding ◽

Computing Methods ◽

Classification Model ◽

Learning Approach ◽

Learning Models ◽

Damage Level ◽

Feature Extractor ◽

Mobile Imaging

With the ubiquitous use of mobile imaging devices, the collection of perishable disaster-scene data has become unprecedentedly easy. However, computing methods are unable to understand these images with significant complexity and uncertainties. In this paper, the authors investigate the problem of disaster-scene understanding through a deep-learning approach. Two attributes of images are concerned, including hazard types and damage levels. Three deep-learning models are trained, and their performance is assessed. Specifically, the best model for hazard-type prediction has an overall accuracy (OA) of 90.1%, and the best damage-level classification model has an explainable OA of 62.6%, upon which both models adopt the Faster R-CNN architecture with a ResNet50 network as a feature extractor. It is concluded that hazard types are more identifiable than damage levels in disaster-scene images. Insights are revealed, including that damage-level recognition suffers more from inter- and intra-class variations, and the treatment of hazard-agnostic damage leveling further contributes to the underlying uncertainties.

Download Full-text

A Multi-Task Learning Approach to Personalized Progression Modeling

2020 IEEE International Conference on Healthcare Informatics (ICHI) ◽

10.1109/ichi48887.2020.9374391 ◽

2020 ◽

Author(s):

Mohamed Ghalwash ◽

Daby Dow

Keyword(s):

Learning Approach ◽

Task Learning

Download Full-text

Robust License Plate Signatures Matching Based on Multi-Task Learning Approach

Neurocomputing ◽

10.1016/j.neucom.2020.12.102 ◽

2021 ◽

Author(s):

Abul Hasnat ◽

Amir Nakib

Keyword(s):

License Plate ◽

Learning Approach ◽

Task Learning

Download Full-text

MulCode: A Multi-task Learning Approach for Source Code Understanding

2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) ◽

10.1109/saner50967.2021.00014 ◽

2021 ◽

Author(s):

Deze Wang ◽

Yue Yu ◽

Shanshan Li ◽

Wei Dong ◽

Ji Wang ◽

...

Keyword(s):

Source Code ◽

Learning Approach ◽

Task Learning ◽

Code Understanding

Download Full-text

SuperClass: A Deep Duo-Task Learning Approach to Improving QoS in Image-driven Smart Urban Sensing Applications

10.1109/iwqos52092.2021.9521332 ◽

2021 ◽

Author(s):

Yang Zhang ◽

Ruohan Zong ◽

Lanyu Shang ◽

Md Tahmid Rashid ◽

Dong Wang

Keyword(s):

Learning Approach ◽

Task Learning ◽

Sensing Applications ◽

Urban Sensing

Download Full-text

A Deep Learning Approach For Bangla Image Captioning System

10.1109/icievicivpr52578.2021.9564129 ◽

2021 ◽

Author(s):

Toshiba Kamruzzaman ◽

Soomanib Kamruzzaman ◽

Abir Zaman

Keyword(s):

Deep Learning ◽

Learning Approach ◽

Image Captioning

Download Full-text

Learning Long- and Short-Term User Literal-Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6503 ◽

2020 ◽

Vol 34 (05) ◽

pp. 9571-9578 ◽

Cited By ~ 1

Author(s):

Wei Zhang ◽

Yue Ying ◽

Pan Lu ◽

Hongyuan Zha

Keyword(s):

State Of The Art ◽

Natural Extension ◽

Target Image ◽

Short Term ◽

Image Representations ◽

High Level ◽

Image Descriptions ◽

Shed Light ◽

Image Caption

Personalized image caption, a natural extension of the standard image caption task, requires to generate brief image descriptions tailored for users' writing style and traits, and is more practical to meet users' real demands. Only a few recent studies shed light on this crucial task and learn static user representations to capture their long-term literal-preference. However, it is insufficient to achieve satisfactory performance due to the intrinsic existence of not only long-term user literal-preference, but also short-term literal-preference which is associated with users' recent states. To bridge this gap, we develop a novel multimodal hierarchical transformer network (MHTN) for personalized image caption in this paper. It learns short-term user literal-preference based on users' recent captions through a short-term user encoder at the low level. And at the high level, the multimodal encoder integrates target image representations with short-term literal-preference, as well as long-term literal-preference learned from user IDs. These two encoders enjoy the advantages of the powerful transformer networks. Extensive experiments on two real datasets show the effectiveness of considering two types of user literal-preference simultaneously and better performance over the state-of-the-art models.

Download Full-text

A Mask-guided Attention Deep Learning Model for COVID-19 Diagnosis based on an Integrated CT Scan Images Database

10.36227/techrxiv.18166667.v1 ◽

2022 ◽

Author(s):

Maede Maftouni ◽

Bo Shen ◽

Andrew Chung Chee Law ◽

Niloofar Ayoobi Yazdi ◽

Zhenyu Kong

Keyword(s):

Deep Learning ◽

Ct Scan ◽

Imaging Modality ◽

Learning Model ◽

Classification Performance ◽

Computer Assisted ◽

Learning Approach ◽

Learning Models ◽

Task Learning ◽

Data Efficiency

The global extent of COVID-19 mutations and the consequent depletion of hospital resources highlighted the necessity of effective computer-assisted medical diagnosis. COVID-19 detection mediated by deep learning models can help diagnose this highly contagious disease and lower infectivity and mortality rates. Computed tomography (CT) is the preferred imaging modality for building automatic COVID-19 screening and diagnosis models. It is well-known that the training set size significantly impacts the performance and generalization of deep learning models. However, accessing a large dataset of CT scan images from an emerging disease like COVID-19 is challenging. Therefore, data efficiency becomes a significant factor in choosing a learning model. To this end, we present a multi-task learning approach, namely, a mask-guided attention (MGA) classifier, to improve the generalization and data efficiency of COVID-19 classification on lung CT scan images.The novelty of this method is compensating for the scarcity of data by employing more supervision with lesion masks, increasing the sensitivity of the model to COVID-19 manifestations, and helping both generalization and classification performance. Our proposed model achieves better overall performance than the single-task baseline and state-of-the-art models, as measured by various popular metrics. In our experiment with different percentages of data from our curated dataset, the classification performance gain from this multi-task learning approach is more significant for the smaller training sizes. Furthermore, experimental results demonstrate that our method enhances the focus on the lesions, as witnessed by bothattention and attribution maps, resulting in a more interpretable model.

Download Full-text