Learning Sparse Sharing Architectures for Multiple Tasks

Tianxiang Sun; Yunfan Shao; Xiaonan Li; Pengfei Liu; Hang Yan; Xipeng Qiu; Xuanjing Huang

doi:10.1609/aaai.v34i05.6424

Learning Sparse Sharing Architectures for Multiple Tasks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6424 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8936-8943

Author(s):

Tianxiang Sun ◽

Yunfan Shao ◽

Xiaonan Li ◽

Pengfei Liu ◽

Hang Yan ◽

...

Keyword(s):

Learning Models ◽

Single Task ◽

Sequence Labeling ◽

Task Learning ◽

Multiple Tasks ◽

Base Network ◽

Parameter Sharing

Most existing deep multi-task learning models are based on parameter sharing, such as hard sharing, hierarchical sharing, and soft sharing. How choosing a suitable sharing mechanism depends on the relations among the tasks, which is not easy since it is difficult to understand the underlying shared factors among these tasks. In this paper, we propose a novel parameter sharing mechanism, named Sparse Sharing. Given multiple tasks, our approach automatically finds a sparse sharing structure. We start with an over-parameterized base network, from which each task extracts a subnetwork. The subnetworks of multiple tasks are partially overlapped and trained in parallel. We show that both hard sharing and hierarchical sharing can be formulated as particular instances of the sparse sharing framework. We conduct extensive experiments on three sequence labeling tasks. Compared with single-task models and three typical multi-task learning baselines, our proposed approach achieves consistent improvement while requiring fewer parameters.

Download Full-text

Abstract 279: Multi-task Learning Improves Model Performance in Predicting Rare Catastrophic Events in Healthcare Claims Dataset

Circulation ◽

10.1161/circ.142.suppl_4.279 ◽

2020 ◽

Vol 142 (Suppl_4) ◽

Author(s):

ChienYu Chi ◽

Yen-Pin Chen ◽

Adrian Winkler ◽

Kuan-Chun Fu ◽

Fie Xu ◽

...

Keyword(s):

Cardiac Arrest ◽

Deep Learning ◽

Short Term Memory ◽

Learning System ◽

Research Database ◽

Learning Models ◽

Catastrophic Events ◽

Single Task ◽

Task Learning ◽

Hospital Cardiac Arrest

Introduction: Predicting rare catastrophic events is challenging due to lack of targets. Here we employed a multi-task learning method and demonstrated that substantial gains in accuracy and generalizability was achieved by sharing representations between related tasks Methods: Starting from Taiwan National Health Insurance Research Database, we selected adult people (>20 year) experienced in-hospital cardiac arrest but not out-of-hospital cardiac arrest during 8 years (2003-2010), and built a dataset using de-identified claims of Emergency Department (ED) and hospitalization. Final dataset had 169,287 patients, randomly split into 3 sections, train 70%, validation 15%, and test 15%.Two outcomes, 30-day readmission and 30-day mortality are chosen. We constructed the deep learning system in two steps. We first used a taxonomy mapping system Text2Node to generate a distributed representation for each concept. We then applied a multilevel hierarchical model based on long short-term memory (LSTM) architecture. Multi-task models used gradient similarity to prioritize the desired task over auxiliary tasks. Single-task models were trained for each desired task. All models share the same architecture and are trained with the same input data Results: Each model was optimized to maximize AUROC on the validation set with the final metrics calculated on the held-out test set. We demonstrated multi-task deep learning models outperform single task deep learning models on both tasks. While readmission had roughly 30% positives and showed miniscule improvements, the mortality task saw more improvement between models. We hypothesize that this is a result of the data imbalance, mortality occurred roughly 5% positive; the auxiliary tasks help the model interpret the data and generalize better. Conclusion: Multi-task deep learning models outperform single task deep learning models in predicting 30-day readmission and mortality in in-hospital cardiac arrest patients.

Download Full-text

Constructing Multiple Tasks for Augmentation: Improving Neural Image Classification with K-Means Features

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6719 ◽

2020 ◽

Vol 34 (07) ◽

pp. 10877-10884

Author(s):

Tao Gui ◽

Lizhi Qing ◽

Qi Zhang ◽

Jiacheng Ye ◽

Hang Yan ◽

...

Keyword(s):

Data Augmentation ◽

Learning Approach ◽

Main Task ◽

Single Task ◽

Task Learning ◽

Multiple Tasks ◽

Meta Learning ◽

Meta Optimization ◽

The Impact ◽

Human Specific

Multi-task learning (MTL) has received considerable attention, and numerous deep learning applications benefit from MTL with multiple objectives. However, constructing multiple related tasks is difficult, and sometimes only a single task is available for training in a dataset. To tackle this problem, we explored the idea of using unsupervised clustering to construct a variety of auxiliary tasks from unlabeled data or existing labeled data. We found that some of these newly constructed tasks could exhibit semantic meanings corresponding to certain human-specific attributes, but some were non-ideal. In order to effectively reduce the impact of non-ideal auxiliary tasks on the main task, we further proposed a novel meta-learning-based multi-task learning approach, which trained the shared hidden layers on auxiliary tasks, while the meta-optimization objective was to minimize the loss on the main task, ensuring that the optimizing direction led to an improvement on the main task. Experimental results across five image datasets demonstrated that the proposed method significantly outperformed existing single task learning, semi-supervised learning, and some data augmentation methods, including an improvement of more than 9% on the Omniglot dataset.

Download Full-text

Multi-task Learning and Catastrophic Forgetting in Continual Reinforcement Learning

10.29007/g7bg ◽

2019 ◽

Author(s):

João Ribeiro ◽

Francisco Melo ◽

João Dias

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Single Task ◽

Similar Performance ◽

The Third ◽

Task Learning ◽

Multiple Tasks ◽

Similar Task ◽

Reinforcement Learning Algorithm

In this paper we investigate two hypothesis regarding the use of deep reinforcement learning in multiple tasks. The first hypothesis is driven by the question of whether a deep reinforcement learning algorithm, trained on two similar tasks, is able to outperform two single-task, individually trained algorithms, by more efficiently learning a new, similar task, that none of the three algorithms has encountered before. The second hypothesis is driven by the question of whether the same multi-task deep RL algorithm, trained on two similar tasks and augmented with elastic weight consolidation (EWC), is able to retain similar performance on the new task, as a similar algorithm without EWC, whilst being able to overcome catastrophic forgetting in the two previous tasks. We show that a multi-task Asynchronous Advantage Actor-Critic (GA3C) algorithm, trained on Space Invaders and Demon Attack, is in fact able to outperform two single-tasks GA3C versions, trained individually for each single-task, when evaluated on a new, third task—namely, Phoenix. We also show that, when training two trained multi-task GA3C algorithms on the third task, if one is augmented with EWC, it is not only able to achieve similar performance on the new task, but also capable of overcoming a substantial amount of catastrophic forgetting on the two previous tasks.

Download Full-text

Loss-Balanced Task Weighting to Reduce Negative Transfer in Multi-Task Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019977 ◽

2019 ◽

Vol 33 ◽

pp. 9977-9978 ◽

Cited By ~ 1

Author(s):

Shengchao Liu ◽

Yingyu Liang ◽

Anthony Gitter

Keyword(s):

Computational Chemistry ◽

Task Performance ◽

Negative Transfer ◽

Learning Models ◽

Improve Performance ◽

Single Task ◽

Task Learning ◽

Model Training

In settings with related prediction tasks, integrated multi-task learning models can often improve performance relative to independent single-task models. However, even when the average task performance improves, individual tasks may experience negative transfer in which the multi-task model’s predictions are worse than the single-task model’s. We show the prevalence of negative transfer in a computational chemistry case study with 128 tasks and introduce a framework that provides a foundation for reducing negative transfer in multitask models. Our Loss-Balanced Task Weighting approach dynamically updates task weights during model training to control the influence of individual tasks.

Download Full-text

Comparison study between conventional machine learning and distributed multi-task learning models

2020 21st International Arab Conference on Information Technology (ACIT) ◽

10.1109/acit50332.2020.9300096 ◽

2020 ◽

Author(s):

Salam Hamdan ◽

Sufyan Almajali ◽

Moussa Ayyash

Keyword(s):

Machine Learning ◽

Comparison Study ◽

Learning Models ◽

Task Learning ◽

Conventional Machine

Download Full-text

A Mask-guided Attention Deep Learning Model for COVID-19 Diagnosis based on an Integrated CT Scan Images Database

10.36227/techrxiv.18166667.v1 ◽

2022 ◽

Author(s):

Maede Maftouni ◽

Bo Shen ◽

Andrew Chung Chee Law ◽

Niloofar Ayoobi Yazdi ◽

Zhenyu Kong

Keyword(s):

Deep Learning ◽

Ct Scan ◽

Imaging Modality ◽

Learning Model ◽

Classification Performance ◽

Computer Assisted ◽

Learning Approach ◽

Learning Models ◽

Task Learning ◽

Data Efficiency

The global extent of COVID-19 mutations and the consequent depletion of hospital resources highlighted the necessity of effective computer-assisted medical diagnosis. COVID-19 detection mediated by deep learning models can help diagnose this highly contagious disease and lower infectivity and mortality rates. Computed tomography (CT) is the preferred imaging modality for building automatic COVID-19 screening and diagnosis models. It is well-known that the training set size significantly impacts the performance and generalization of deep learning models. However, accessing a large dataset of CT scan images from an emerging disease like COVID-19 is challenging. Therefore, data efficiency becomes a significant factor in choosing a learning model. To this end, we present a multi-task learning approach, namely, a mask-guided attention (MGA) classifier, to improve the generalization and data efficiency of COVID-19 classification on lung CT scan images.The novelty of this method is compensating for the scarcity of data by employing more supervision with lesion masks, increasing the sensitivity of the model to COVID-19 manifestations, and helping both generalization and classification performance. Our proposed model achieves better overall performance than the single-task baseline and state-of-the-art models, as measured by various popular metrics. In our experiment with different percentages of data from our curated dataset, the classification performance gain from this multi-task learning approach is more significant for the smaller training sizes. Furthermore, experimental results demonstrate that our method enhances the focus on the lesions, as witnessed by bothattention and attribution maps, resulting in a more interpretable model.

Download Full-text

Latent Multi-Task Architecture Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014822 ◽

2019 ◽

Vol 33 ◽

pp. 4822-4829 ◽

Cited By ~ 9

Author(s):

Sebastian Ruder ◽

Joachim Bingel ◽

Isabelle Augenstein ◽

Anders Søgaard

Keyword(s):

Neural Networks ◽

Recent Work ◽

Deep Neural Networks ◽

Synthetic Data ◽

Approaches To Learning ◽

Average Error ◽

Task Learning ◽

Task Architecture ◽

Parameter Sharing

Multi-task learning (MTL) allows deep neural networks to learn from related tasks by sharing parameters with other networks. In practice, however, MTL involves searching an enormous space of possible parameter sharing architectures to find (a) the layers or subspaces that benefit from sharing, (b) the appropriate amount of sharing, and (c) the appropriate relative weights of the different task losses. Recent work has addressed each of the above problems in isolation. In this work we present an approach that learns a latent multi-task architecture that jointly addresses (a)–(c). We present experiments on synthetic data and data from OntoNotes 5.0, including four different tasks and seven different domains. Our extension consistently outperforms previous approaches to learning latent architectures for multi-task problems and achieves up to 15% average error reductions over common approaches to MTL.

Download Full-text

Evolutionary trade-offs and the structure of polymorphisms

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2017.0105 ◽

2018 ◽

Vol 373 (1747) ◽

pp. 20170105 ◽

Cited By ~ 7

Author(s):

Hila Sheftel ◽

Pablo Szekely ◽

Avi Mayo ◽

Guy Sella ◽

Uri Alon

Keyword(s):

High Frequency ◽

Cell Biology ◽

Pareto Front ◽

Complex Structure ◽

Self Organization ◽

Theme Issue ◽

Single Task ◽

Trade Offs ◽

Simple Geometry ◽

Multiple Tasks

Populations of organisms show genetic differences called polymorphisms. Understanding the effects of polymorphisms is important for biology and medicine. Here, we ask which polymorphisms occur at high frequency when organisms evolve under trade-offs between multiple tasks. Multiple tasks present a problem, because it is not possible to be optimal at all tasks simultaneously and hence compromises are necessary. Recent work indicates that trade-offs lead to a simple geometry of phenotypes in the space of traits: phenotypes fall on the Pareto front, which is shaped as a polytope: a line, triangle, tetrahedron etc. The vertices of these polytopes are the optimal phenotypes for a single task. Up to now, work on this Pareto approach has not considered its genetic underpinnings. Here, we address this by asking how the polymorphism structure of a population is affected by evolution under trade-offs. We simulate a multi-task selection scenario, in which the population evolves to the Pareto front: the line segment between two archetypes or the triangle between three archetypes. We find that polymorphisms that become prevalent in the population have pleiotropic phenotypic effects that align with the Pareto front. Similarly, epistatic effects between prevalent polymorphisms are parallel to the front. Alignment with the front occurs also for asexual mating. Alignment is reduced when drift or linkage is strong, and is replaced by a more complex structure in which many perpendicular allele effects cancel out. Aligned polymorphism structure allows mating to produce offspring that stand a good chance of being optimal multi-taskers in at least one of the locales available to the species. This article is part of the theme issue ‘Self-organization in cell biology’.

Download Full-text

Guitar Chord Sensing and Recognition Using Multi-Task Learning and Physical Data Augmentation with Robotics

Sensors ◽

10.3390/s20216077 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6077

Author(s):

Gerelmaa Byambatsogt ◽

Lodoiravsal Choimaa ◽

Gou Koutaki

Keyword(s):

Deep Learning ◽

Data Augmentation ◽

Recognition Task ◽

Learning Model ◽

Training Data ◽

Single Task ◽

Task Learning ◽

Physical Data ◽

Music Information ◽

Chord Recognition

In recent years, many researchers have shown increasing interest in music information retrieval (MIR) applications, with automatic chord recognition being one of the popular tasks. Many studies have achieved/demonstrated considerable improvement using deep learning based models in automatic chord recognition problems. However, most of the existing models have focused on simple chord recognition, which classifies the root note with the major, minor, and seventh chords. Furthermore, in learning-based recognition, it is critical to collect high-quality and large amounts of training data to achieve the desired performance. In this paper, we present a multi-task learning (MTL) model for a guitar chord recognition task, where the model is trained using a relatively large-vocabulary guitar chord dataset. To solve data scarcity issues, a physical data augmentation method that directly records the chord dataset from a robotic performer is employed. Deep learning based MTL is proposed to improve the performance of automatic chord recognition with the proposed physical data augmentation dataset. The proposed MTL model is compared with four baseline models and its corresponding single-task learning model using two types of datasets, including a human dataset and a human combined with the augmented dataset. The proposed methods outperform the baseline models, and the results show that most scores of the proposed multi-task learning model are better than those of the corresponding single-task learning model. The experimental results demonstrate that physical data augmentation is an effective method for increasing the dataset size for guitar chord recognition tasks.

Download Full-text

A Bio-Cybernetic System for Adaptive Automation

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/154193129503902102 ◽

1995 ◽

Vol 39 (21) ◽

pp. 1365-1369 ◽

Cited By ~ 6

Author(s):

Lawrence J. Prinzel ◽

Mark W. Scerbo ◽

Frederick G. Freeman ◽

Peter J. Mikulka

Keyword(s):

Closed Loop ◽

Loop System ◽

Closed Loop System ◽

Single Task ◽

Adaptive Automation ◽

Task Load ◽

Multiple Tasks ◽

Cybernetic System ◽

Nasa Tlx ◽

Automation Technology

A bio-cybernetic, closed-loop system was validated for use in an adaptive automation environment. Subjects were asked to perform either a single task or multiple tasks from the Multi-Attribute Task Battery. EEG was continuously sampled while they performed the task(s) and an EEG index was derived (20 Beta/Alpha + Theta). The system switched between manual and automatic modes according to the level of operator engagement based upon the EEG index. The NASA-TLX was administered after each trial. The results of the study demonstrated that it was possible to moderate an operator's level of engagement through a closed-loop system driven by the operator's EEG. In addition, the system was sensitive to increases in task load. These findings show promise for designing adaptive automation technology around psychophysiological input.

Download Full-text