Federated Learning for Vision-and-Language Grounding Problems

Recently, vision-and-language grounding problems, e.g., image captioning and visual question answering (VQA), has attracted extensive interests from both academic and industrial worlds. However, given the similarity of these tasks, the efforts to obtain better results by combining the merits of their algorithms are not well studied. Inspired by the recent success of federated learning, we propose a federated learning framework to obtain various types of image representations from different tasks, which are then fused together to form fine-grained image representations. The representations merge useful features from different vision-and-language grounding problems, and are thus much more powerful than the original representations alone in individual tasks. To learn such image representations, we propose the Aligning, Integrating and Mapping Network (aimNet). The aimNet is validated on three federated learning settings, which include horizontal federated learning, vertical federated learning, and federated transfer learning. Experiments of aimNet-based federated learning framework on two representative tasks, i.e., image captioning and VQA, demonstrate the effective and universal improvements of all metrics over the baselines. In image captioning, we are able to get 14% and 13% relative gain on the task-specific metrics CIDEr and SPICE, respectively. In VQA, we could also boost the performance of strong baselines by up to 3%.

Download Full-text

Question Answering through Transfer Learning from Large Fine-grained Supervision Data

10.18653/v1/p17-2081 ◽

2017 ◽

Cited By ~ 11

Author(s):

Sewon Min ◽

Minjoon Seo ◽

Hannaneh Hajishirzi

Keyword(s):

Transfer Learning ◽

Question Answering ◽

Fine Grained

Download Full-text

Exploring and Distilling Cross-Modal Information for Image Captioning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/708 ◽

2019 ◽

Cited By ~ 2

Author(s):

Fenglin Liu ◽

Xuancheng Ren ◽

Yuanxin Liu ◽

Kai Lei ◽

Xu Sun

Keyword(s):

Image Understanding ◽

Great Difficulty ◽

Source Information ◽

Image Captioning ◽

Fine Grained ◽

Deep Image ◽

Word Selection ◽

Global And Local ◽

Accuracy Speed ◽

Vision And Language

Recently, attention-based encoder-decoder models have been used extensively in image captioning. Yet there is still great difficulty for the current methods to achieve deep image understanding. In this work, we argue that such understanding requires visual attention to correlated image regions and semantic attention to coherent attributes of interest. To perform effective attention, we explore image captioning from a cross-modal perspective and propose the Global-and-Local Information Exploring-and-Distilling approach that explores and distills the source information in vision and language. It globally provides the aspect vector, a spatial and relational representation of images based on caption contexts, through the extraction of salient region groupings and attribute collocations, and locally extracts the fine-grained regions and attributes in reference to the aspect vector for word selection. Our fully-attentive model achieves a CIDEr score of 129.3 in offline COCO evaluation with remarkable efficiency in terms of accuracy, speed, and parameter budget.

Download Full-text

A Model of Two Tales: Dual Transfer Learning Framework for Improved Long-tail Item Recommendation

Proceedings of the Web Conference 2021 ◽

10.1145/3442381.3450086 ◽

2021 ◽

Author(s):

Yin Zhang ◽

Derek Zhiyuan Cheng ◽

Tiansheng Yao ◽

Xinyang Yi ◽

Lichan Hong ◽

...

Keyword(s):

Transfer Learning ◽

Long Tail ◽

Learning Framework

Download Full-text

CNN Based Transfer Learning Framework For Classification Of COVID-19 Disease From Chest X-ray

2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS) ◽

10.1109/iciccs51141.2021.9432181 ◽

2021 ◽

Author(s):

Pramit Brata Chanda ◽

Surajit Banerjee ◽

Vivek Dalai ◽

Rahul Ray

Keyword(s):

Transfer Learning ◽

X Ray ◽

Learning Framework ◽

Chest X Ray

Download Full-text

Deep transfer learning mechanism for fine-grained cross-domain sentiment classification

Connection Science ◽

10.1080/09540091.2021.1912711 ◽

2021 ◽

pp. 1-18

Author(s):

Zixuan Cao ◽

Yongmei Zhou ◽

Aimin Yang ◽

Sancheng Peng

Keyword(s):

Transfer Learning ◽

Sentiment Classification ◽

Learning Mechanism ◽

Fine Grained ◽

Cross Domain

Download Full-text

BeautyNet: Joint Multiscale CNN and Transfer Learning Method for Unconstrained Facial Beauty Prediction

Computational Intelligence and Neuroscience ◽

10.1155/2019/1910624 ◽

2019 ◽

Vol 2019 ◽

pp. 1-14 ◽

Cited By ~ 4

Author(s):

Yikui Zhai ◽

He Cao ◽

Wenbo Deng ◽

Junying Gan ◽

Vincenzo Piuri ◽

...

Keyword(s):

Transfer Learning ◽

Classification Accuracy ◽

Learning Strategy ◽

State Of The Art ◽

Activation Function ◽

Training Data ◽

Fine Grained ◽

Pattern Recognition Problem ◽

Face Features ◽

Facial Beauty

Because of the lack of discriminative face representations and scarcity of labeled training data, facial beauty prediction (FBP), which aims at assessing facial attractiveness automatically, has become a challenging pattern recognition problem. Inspired by recent promising work on fine-grained image classification using the multiscale architecture to extend the diversity of deep features, BeautyNet for unconstrained facial beauty prediction is proposed in this paper. Firstly, a multiscale network is adopted to improve the discriminative of face features. Secondly, to alleviate the computational burden of the multiscale architecture, MFM (max-feature-map) is utilized as an activation function which can not only lighten the network and speed network convergence but also benefit the performance. Finally, transfer learning strategy is introduced here to mitigate the overfitting phenomenon which is caused by the scarcity of labeled facial beauty samples and improves the proposed BeautyNet’s performance. Extensive experiments performed on LSFBD demonstrate that the proposed scheme outperforms the state-of-the-art methods, which can achieve 67.48% classification accuracy.

Download Full-text

A novel transfer learning framework for chatter detection using convolutional neural networks

Journal of Intelligent Manufacturing ◽

10.1007/s10845-021-01839-3 ◽

2021 ◽

Author(s):

Hakki Ozgur Unver ◽

Batihan Sener

Keyword(s):

Neural Networks ◽

Transfer Learning ◽

Convolutional Neural Networks ◽

Chatter Detection ◽

Learning Framework

Download Full-text

Semi-Supervised Aspect-Based Sentiment Analysis for Case-Related Microblog Reviews Using Case Knowledge Graph Embedding

International Journal of Asian Language Processing ◽

10.1142/s2717554520500125 ◽

2021 ◽

pp. 2050012

Author(s):

Peilian Zhao ◽

Cunli Mao ◽

Zhengtao Yu

Keyword(s):

Sentiment Analysis ◽

Domain Knowledge ◽

Opinion Mining ◽

Data Augmentation ◽

Training Data ◽

Knowledge Graph ◽

Fine Grained ◽

Learning Framework ◽

Proposed Model ◽

Real World Applications

Aspect-Based Sentiment Analysis (ABSA), a fine-grained task of opinion mining, which aims to extract sentiment of specific target from text, is an important task in many real-world applications, especially in the legal field. Therefore, in this paper, we study the problem of limitation of labeled training data required and ignorance of in-domain knowledge representation for End-to-End Aspect-Based Sentiment Analysis (E2E-ABSA) in legal field. We proposed a new method under deep learning framework, named Semi-ETEKGs, which applied E2E framework using knowledge graph (KG) embedding in legal field after data augmentation (DA). Specifically, we pre-trained the BERT embedding and in-domain KG embedding for unlabeled data and labeled data with case elements after DA, and then we put two embeddings into the E2E framework to classify the polarity of target-entity. Finally, we built a case-related dataset based on a popular benchmark for ABSA to prove the efficiency of Semi-ETEKGs, and experiments on case-related dataset from microblog comments show that our proposed model outperforms the other compared methods significantly.

Download Full-text

Ensemble Transfer Learning Framework for Vessel Size Estimation from 2D Images

Advances in Computational Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-030-20518-8_22 ◽

2019 ◽

pp. 258-269

Author(s):

Mario Miličević ◽

Krunoslav Žubrinić ◽

Ivan Grbavac ◽

Ana Kešelj

Keyword(s):

Transfer Learning ◽

Size Estimation ◽

Vessel Size ◽

Learning Framework ◽

2D Images

Download Full-text

Measuring Machine Intelligence Through Visual Question Answering

AI Magazine ◽

10.1609/aimag.v37i1.2647 ◽

2016 ◽

Vol 37 (1) ◽

pp. 63-72 ◽

Cited By ~ 10

Author(s):

C. Lawrence Zitnick ◽

Aishwarya Agrawal ◽

Stanislaw Antol ◽

Margaret Mitchell ◽

Dhruv Batra ◽

...

Keyword(s):

Question Answering ◽

Machine Intelligence ◽

Image Captioning ◽

Visual Question Answering ◽

Language And Vision ◽

Measuring Machine

As machines have become more intelligent, there has been a renewed interest in methods for measuring their intelligence. A common approach is to propose tasks for which a human excels, but one which machines find difficult. However, an ideal task should also be easy to evaluate and not be easily gameable. We begin with a case study exploring the recently popular task of image captioning and its limitations as a task for measuring machine intelligence. An alternative and more promising task is Visual Question Answering that tests a machine’s ability to reason about language and vision. We describe a dataset unprecedented in size created for the task that contains over 760,000 human generated questions about images. Using around 10 million human generated answers, machines may be easily evaluated.

Download Full-text