scholarly journals Continuous self-adaptive optimization to learn multi-task multi-agent

Author(s):  
Wenqian Liang ◽  
Ji Wang ◽  
Weidong Bao ◽  
Xiaomin Zhu ◽  
Qingyong Wang ◽  
...  

AbstractMulti-agent reinforcement learning (MARL) methods have shown superior performance to solve a variety of real-world problems focusing on learning distinct policies for individual tasks. These approaches face problems when applied to the non-stationary real-world: agents trained in specialized tasks cannot achieve satisfied generalization performance across multiple tasks; agents have to learn and store specialized policies for individual task and reliable identities of tasks are hardly observable in practice. To address the challenge continuously adapting to multiple tasks in MARL, we formalize the problem into a two-stage curriculum. Single-task policies are learned with MARL approaches, after that we develop a gradient-based Self-Adaptive Meta-Learning algorithm, SAML, that cannot only distill single-task policies into a unified policy but also can facilitate the unified policy to continuously adapt to new incoming tasks. In addition, to validate the continuous adaptation performance on complex task, we extend the widely adopted StarCraft benchmark SMAC and develop a new multi-task multi-agent StarCraft environment, Meta-SMAC, for testing various aspects of continuous adaptation method. Our experiments with a population of agents show that our method enables significantly more efficient adaptation than reactive baselines across different scenarios.

2019 ◽  
Vol 28 (07) ◽  
pp. 1950022 ◽  
Author(s):  
Haiou Qin ◽  
Du Zhang ◽  
Xibin Sun ◽  
Jiahua Tang ◽  
Jun Peng

One of the emerging research opportunities in machine learning is to develop computing systems that learn many tasks continuously and improve the performance of learned tasks incrementally over time. In real world, learners have to adapt to labeled and unlabeled samples from various tasks which arrive randomly. In this paper, we propose an efficient algorithm called Efficient Perpetual Learning Algorithm (EPLA) which is suitable for learning multiple tasks in both offline and online settings. The algorithm, which is an extension of ELLA,4 is part of what we call perpetual learning that can learn new tasks or refine knowledge of learned tasks for improved performance with newly arrived labeled samples in an incremental fashion. Several salient features exist for EPLA. The learning episodes are triggered via either extrinsic or intrinsic stimuli. Agent systems based on the proposed algorithm can be engaged in an open-ended and alternating sequence of learning episodes and working episodes. Unlabeled samples can be used to self-train the learner in small data setting. Compared with ELLA, EPLA shows almost equivalent performance without memorizing any labeled samples learned previously.


2018 ◽  
Vol 26 (1) ◽  
pp. 43-66 ◽  
Author(s):  
Uday Kamath ◽  
Carlotta Domeniconi ◽  
Kenneth De Jong

Many real-world problems involve massive amounts of data. Under these circumstances learning algorithms often become prohibitively expensive, making scalability a pressing issue to be addressed. A common approach is to perform sampling to reduce the size of the dataset and enable efficient learning. Alternatively, one customizes learning algorithms to achieve scalability. In either case, the key challenge is to obtain algorithmic efficiency without compromising the quality of the results. In this article we discuss a meta-learning algorithm (PSBML) that combines concepts from spatially structured evolutionary algorithms (SSEAs) with concepts from ensemble and boosting methodologies to achieve the desired scalability property. We present both theoretical and empirical analyses which show that PSBML preserves a critical property of boosting, specifically, convergence to a distribution centered around the margin. We then present additional empirical analyses showing that this meta-level algorithm provides a general and effective framework that can be used in combination with a variety of learning classifiers. We perform extensive experiments to investigate the trade-off achieved between scalability and accuracy, and robustness to noise, on both synthetic and real-world data. These empirical results corroborate our theoretical analysis, and demonstrate the potential of PSBML in achieving scalability without sacrificing accuracy.


Author(s):  
Zhaoyang Yang ◽  
Kathryn Merrick ◽  
Hussein Abbass ◽  
Lianwen Jin

In this paper, we propose a deep reinforcement learning algorithm to learn multiple tasks concurrently. A new network architecture is proposed in the algorithm which reduces the number of parameters needed by more than 75% per task compared to typical single-task deep reinforcement learning algorithms. The proposed algorithm and network fuse images with sensor data and were tested with up to 12 movement-based control tasks on a simulated Pioneer 3AT robot equipped with a camera and range sensors. Results show that the proposed algorithm and network can learn skills that are as good as the skills learned by a comparable single-task learning algorithm. Results also show that learning performance is consistent even when the number of tasks and the number of constraints on the tasks increased.


2020 ◽  
Vol 34 (07) ◽  
pp. 10877-10884
Author(s):  
Tao Gui ◽  
Lizhi Qing ◽  
Qi Zhang ◽  
Jiacheng Ye ◽  
Hang Yan ◽  
...  

Multi-task learning (MTL) has received considerable attention, and numerous deep learning applications benefit from MTL with multiple objectives. However, constructing multiple related tasks is difficult, and sometimes only a single task is available for training in a dataset. To tackle this problem, we explored the idea of using unsupervised clustering to construct a variety of auxiliary tasks from unlabeled data or existing labeled data. We found that some of these newly constructed tasks could exhibit semantic meanings corresponding to certain human-specific attributes, but some were non-ideal. In order to effectively reduce the impact of non-ideal auxiliary tasks on the main task, we further proposed a novel meta-learning-based multi-task learning approach, which trained the shared hidden layers on auxiliary tasks, while the meta-optimization objective was to minimize the loss on the main task, ensuring that the optimizing direction led to an improvement on the main task. Experimental results across five image datasets demonstrated that the proposed method significantly outperformed existing single task learning, semi-supervised learning, and some data augmentation methods, including an improvement of more than 9% on the Omniglot dataset.


2011 ◽  
Vol 187 ◽  
pp. 39-44
Author(s):  
Jing Li ◽  
Yue Jin Zhou

The purpose of the paper is to study the conflict resolution in virtual teams. Multi-agent technology is used to simulate the virtual team. In the team, agents adapt the Q-learning algorithm to adjust their behaviors. Through the interaction of virtual members, part of conflicts can be resolved by team members. The experiments are manipulated to study the process of the interaction in the team. The results of experiments show a new rule for conflict resolution emerged from the dynamic interactions of agents. The conclusions show significance on the management of team in real world.


2021 ◽  
Vol 12 (3) ◽  
pp. 1-23
Author(s):  
Yan Liu ◽  
Bin Guo ◽  
Daqing Zhang ◽  
Djamal Zeghlache ◽  
Jingmin Chen ◽  
...  

Optimal store placement aims to identify the optimal location for a new brick-and-mortar store that can maximize its sale by analyzing and mining users’ preferences from large-scale urban data. In recent years, the expansion of chain enterprises in new cities brings some challenges because of two aspects: (1) data scarcity in new cities, so most existing models tend to not work (i.e., overfitting), because the superior performance of these works is conditioned on large-scale training samples; (2) data distribution discrepancy among different cities, so knowledge learned from other cities cannot be utilized directly in new cities. In this article, we propose a task-adaptative model-agnostic meta-learning framework, namely, MetaStore, to tackle these two challenges and improve the prediction performance in new cities with insufficient data for optimal store placement, by transferring prior knowledge learned from multiple data-rich cities. Specifically, we develop a task-adaptative meta-learning algorithm to learn city-specific prior initializations from multiple cities, which is capable of handling the multimodal data distribution and accelerating the adaptation in new cities compared to other methods. In addition, we design an effective learning strategy for MetaStore to promote faster convergence and optimization by sampling high-quality data for each training batch in view of noisy data in practical applications. The extensive experimental results demonstrate that our proposed method leads to state-of-the-art performance compared with various baselines.


10.29007/g7bg ◽  
2019 ◽  
Author(s):  
João Ribeiro ◽  
Francisco Melo ◽  
João Dias

In this paper we investigate two hypothesis regarding the use of deep reinforcement learning in multiple tasks. The first hypothesis is driven by the question of whether a deep reinforcement learning algorithm, trained on two similar tasks, is able to outperform two single-task, individually trained algorithms, by more efficiently learning a new, similar task, that none of the three algorithms has encountered before. The second hypothesis is driven by the question of whether the same multi-task deep RL algorithm, trained on two similar tasks and augmented with elastic weight consolidation (EWC), is able to retain similar performance on the new task, as a similar algorithm without EWC, whilst being able to overcome catastrophic forgetting in the two previous tasks. We show that a multi-task Asynchronous Advantage Actor-Critic (GA3C) algorithm, trained on Space Invaders and Demon Attack, is in fact able to outperform two single-tasks GA3C versions, trained individually for each single-task, when evaluated on a new, third task—namely, Phoenix. We also show that, when training two trained multi-task GA3C algorithms on the third task, if one is augmented with EWC, it is not only able to achieve similar performance on the new task, but also capable of overcoming a substantial amount of catastrophic forgetting on the two previous tasks.


Author(s):  
Sungyong Seo ◽  
Chuizheng Meng ◽  
Sirisha Rambhatla ◽  
Yan Liu

Modeling the dynamics of real-world physical systems is critical for spatiotemporal prediction tasks, but challenging when data is limited. The scarcity of real-world data and the difficulty in reproducing the data distribution hinder directly applying meta-learning techniques. Although the knowledge of governing partial differential equations (PDE) of the data can be helpful for the fast adaptation to few observations, it is mostly infeasible to exactly find the equation for observations in real-world physical systems. In this work, we propose a framework, physics-aware meta-learning with auxiliary tasks, whose spatial modules incorporate PDE-independent knowledge and temporal modules utilize the generalized features from the spatial modules to be adapted to the limited data, respectively. The framework is inspired by a local conservation law expressed mathematically as a continuity equation and does not require the exact form of governing equation to model the spatiotemporal observations. The proposed method mitigates the need for a large number of real-world tasks for meta-learning by leveraging spatial information in simulated data to meta-initialize the spatial modules. We apply the proposed framework to both synthetic and real-world spatiotemporal prediction tasks and demonstrate its superior performance with limited observations.


Sensors ◽  
2020 ◽  
Vol 20 (20) ◽  
pp. 5966
Author(s):  
Ke Wang ◽  
Gong Zhang

The challenge of small data has emerged in synthetic aperture radar automatic target recognition (SAR-ATR) problems. Most SAR-ATR methods are data-driven and require a lot of training data that are expensive to collect. To address this challenge, we propose a recognition model that incorporates meta-learning and amortized variational inference (AVI). Specifically, the model consists of global parameters and task-specific parameters. The global parameters, trained by meta-learning, construct a common feature extractor shared between all recognition tasks. The task-specific parameters, modeled by probability distributions, can adapt to new tasks with a small amount of training data. To reduce the computation and storage cost, the task-specific parameters are inferred by AVI implemented with set-to-set functions. Extensive experiments were conducted on a real SAR dataset to evaluate the effectiveness of the model. The results of the proposed approach compared with those of the latest SAR-ATR methods show the superior performance of our model, especially on recognition tasks with limited data.


Sign in / Sign up

Export Citation Format

Share Document