Multiagent Decision Making and Learning in Urban Environments

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/895 ◽

2019 ◽

Author(s):

Akshat Kumar

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Strategies ◽

Multiagent Systems ◽

Intelligent Agents ◽

Large Scale ◽

Urban Environments ◽

Aggregate Level ◽

Domain Models ◽

Self Driving Cars

Our increasingly interconnected urban environments provide several opportunities to deploy intelligent agents---from self-driving cars, ships to aerial drones---that promise to radically improve productivity and safety. Achieving coordination among agents in such urban settings presents several algorithmic challenges---ability to scale to thousands of agents, addressing uncertainty, and partial observability in the environment. In addition, accurate domain models need to be learned from data that is often noisy and available only at an aggregate level. In this paper, I will overview some of our recent contributions towards developing planning and reinforcement learning strategies to address several such challenges present in large-scale urban multiagent systems.

Download Full-text

A Confrontation Decision-Making Method with Deep Reinforcement Learning and Knowledge Transfer for Multi-Agent System

Symmetry ◽

10.3390/sym12040631 ◽

2020 ◽

Vol 12 (4) ◽

pp. 631

Author(s):

Chunyang Hu

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Knowledge Transfer ◽

Large Scale ◽

Effective Control ◽

Small Scale ◽

Learning Agent ◽

Multi Agent ◽

Transfer Method ◽

Parameter Sharing

In this paper, deep reinforcement learning (DRL) and knowledge transfer are used to achieve the effective control of the learning agent for the confrontation in the multi-agent systems. Firstly, a multi-agent Deep Deterministic Policy Gradient (DDPG) algorithm with parameter sharing is proposed to achieve confrontation decision-making of multi-agent. In the process of training, the information of other agents is introduced to the critic network to improve the strategy of confrontation. The parameter sharing mechanism can reduce the loss of experience storage. In the DDPG algorithm, we use four neural networks to generate real-time action and Q-value function respectively and use a momentum mechanism to optimize the training process to accelerate the convergence rate for the neural network. Secondly, this paper introduces an auxiliary controller using a policy-based reinforcement learning (RL) method to achieve the assistant decision-making for the game agent. In addition, an effective reward function is used to help agents balance losses of enemies and our side. Furthermore, this paper also uses the knowledge transfer method to extend the learning model to more complex scenes and improve the generalization of the proposed confrontation model. Two confrontation decision-making experiments are designed to verify the effectiveness of the proposed method. In a small-scale task scenario, the trained agent can successfully learn to fight with the competitors and achieve a good winning rate. For large-scale confrontation scenarios, the knowledge transfer method can gradually improve the decision-making level of the learning agent.

Download Full-text

Development of Reinforcement Learning Methods in Control and Decision Making in the Large Scale Dynamic Game Environments

IEEE International Symposium on Intelligent Control ◽

10.1109/isic.2006.285608 ◽

2006 ◽

Author(s):

S. Orafa ◽

M.J. Yazdanpanah ◽

C. Lucas ◽

A. Rahimikian ◽

M. Ahmadabadi

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Large Scale ◽

Dynamic Game ◽

Learning Methods ◽

Game Environments

Download Full-text

A Deep Learning Algorithm for the Max-Cut Problem Based on Pointer Network Structure with Supervised Learning and Reinforcement Learning Strategies

Mathematics ◽

10.3390/math8020298 ◽

2020 ◽

Vol 8 (2) ◽

pp. 298 ◽

Cited By ~ 2

Author(s):

Shenshen Gu ◽

Yue Yang

Keyword(s):

Neural Network ◽

Deep Learning ◽

Reinforcement Learning ◽

Combinatorial Optimization ◽

Supervised Learning ◽

Learning Strategies ◽

Large Scale ◽

Deep Neural Network ◽

Max Cut Problem ◽

Cut Problems

The Max-cut problem is a well-known combinatorial optimization problem, which has many real-world applications. However, the problem has been proven to be non-deterministic polynomial-hard (NP-hard), which means that exact solution algorithms are not suitable for large-scale situations, as it is too time-consuming to obtain a solution. Therefore, designing heuristic algorithms is a promising but challenging direction to effectively solve large-scale Max-cut problems. For this reason, we propose a unique method which combines a pointer network and two deep learning strategies (supervised learning and reinforcement learning) in this paper, in order to address this challenge. A pointer network is a sequence-to-sequence deep neural network, which can extract data features in a purely data-driven way to discover the hidden laws behind data. Combining the characteristics of the Max-cut problem, we designed the input and output mechanisms of the pointer network model, and we used supervised learning and reinforcement learning to train the model to evaluate the model performance. Through experiments, we illustrated that our model can be well applied to solve large-scale Max-cut problems. Our experimental results also revealed that the new method will further encourage broader exploration of deep neural network for large-scale combinatorial optimization problems.

Download Full-text

REINFORCEMENT LEARNING FOR DECISION-MAKING IN A BUSINESS SIMULATOR

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622012500277 ◽

2012 ◽

Vol 11 (05) ◽

pp. 935-960 ◽

Cited By ~ 6

Author(s):

JAVIER GARCÍA ◽

FERNANDO BORRAJO ◽

FERNANDO FERNÁNDEZ

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Business Education ◽

Intelligent Agents ◽

Continuous Variables ◽

Web Based ◽

Business Decision ◽

Decision Variables ◽

Learning Scenarios ◽

Multi Agent

Business simulators are powerful tools for both supporting the decision-making process of business managers as well as for business education. An example is SIMBA (SIMulator for Business Administration), a powerful simulator which is currently used as a web-based platform for business education in different institutions. In this paper, we propose the application of reinforcement learning (RL) for the creation of intelligent agents that can manage virtual companies in SIMBA. This application is not trivial, given the particular intrinsic characteristics of SIMBA: it is a generalized domain where hundreds of parameters modify the domain behavior; it is a multi-agent domain where both cooperation and competition among different agents can coexist; it is required to set dozens of continuous decision variables for a given business decision, which is made only after the study of hundreds of continuous variables. We will demonstrate empirically that all these challenges can be overcome through the use of RL, showing results for different learning scenarios.

Download Full-text

Reinforcement Learning Strategies for Decision Making in Knowledge-Based Adaptive Radiation Therapy: Application in Liver Cancer

International Journal of Radiation Oncology*Biology*Physics ◽

10.1016/j.ijrobp.2016.06.119 ◽

2016 ◽

Vol 96 (2) ◽

pp. S45 ◽

Cited By ~ 3

Author(s):

I. El Naqa ◽

M. Feng ◽

L. Bazzi ◽

J. Dow ◽

K.C. Cuneo ◽

...

Keyword(s):

Decision Making ◽

Radiation Therapy ◽

Reinforcement Learning ◽

Liver Cancer ◽

Learning Strategies ◽

Adaptive Radiation ◽

Adaptive Radiation Therapy ◽

Knowledge Based

Download Full-text

Organizational Policy Learning and Evaluation Using Monte Carlo Methods

Encyclopedia of Organizational Knowledge, Administration, and Technology - Advances in Logistics, Operations, and Management Science ◽

10.4018/978-1-7998-3473-1.ch016 ◽

2021 ◽

pp. 195-208

Author(s):

Clement Leung ◽

Nikki Lijing Kuang ◽

Vienne W. K. Sung

Keyword(s):

Decision Making ◽

Monte Carlo ◽

Reinforcement Learning ◽

Learning Strategies ◽

Policy Evaluation ◽

Stopping Rules ◽

Policy Learning ◽

Sequential Trials ◽

The Cost ◽

New Strategies

Organizations need to constantly learn, develop, and evaluate new strategies and policies for their effective operation. Unsupervised reinforcement learning is becoming a highly useful tool, since rewards and punishments in different forms are pervasive and present in a wide variety of decision-making scenarios. By observing the outcome of a sufficient number of repeated trials, one would gradually learn the value and usefulness of a particular policy or strategy. However, in a given environment, the outcomes resulting from different trials are subject to external chance influence and variations. In learning about the usefulness of a given policy, significant costs are involved in systematically undertaking the sequential trials; therefore, in most learning episodes, one would wish to keep the cost within bounds by adopting learning efficient stopping rules. In this Chapter, we explain the deployment of different learning strategies in given environments for reinforcement learning policy evaluation and review, and we present suggestions for their practical use and applications.

Download Full-text

Behavioural and neural limits in competitive decision making: The roles of outcome, opponency and observation

10.1101/571257 ◽

2019 ◽

Cited By ~ 1

Author(s):

Benjamin James Dyson ◽

Ben Albert Steward ◽

Tea Meneghetti ◽

Lewis Forder

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Visual Attention ◽

Large Scale ◽

Environmental Responsibility ◽

Positive Outcomes ◽

Outcome Valence ◽

Future Events ◽

Neural Gain ◽

Random Behaviour

AbstractTo understand the boundaries we set for ourselves in terms of environmental responsibility during competition, we examined a neural index of outcome valence (feedback-related negativity; FRN) in relation to earlier indices of visual attention (N1), later indices of motivational significance (P3), and, eventual behaviour. In Experiment 1 (n=36), participants either were (play) or were not (observe) responsible for action selection. In Experiment 2 (n=36), opponents additionally either could (exploitable) or could not (unexploitable) be beaten. Various failures in reinforcement learning expression were revealed including large-scale approximations of random behaviour. Against unexploitable opponents, N1 determined the extent to which negative and positive outcomes were perceived as distinct categories by FRN. Against exploitable opponents, FRN determined the extent to which P3 generated neural gain for future events. Differential activation of the N1 – FRN – P3 processing chain provides a framework for understanding the behavioural dynamism observed during competitive decision making.

Download Full-text

Tackling Root Causes Upstream of Unhealthy Urban Development (TRUUD): Protocol of a five-year prevention research consortium

Wellcome Open Research ◽

10.12688/wellcomeopenres.16382.1 ◽

2021 ◽

Vol 6 ◽

pp. 30

Author(s):

Daniel Black ◽

Sarah Ayres ◽

Krista Bondy ◽

Rachel Brierley ◽

Rona Campbell ◽

...

Keyword(s):

Decision Making ◽

Urban Development ◽

Urban Areas ◽

Large Scale ◽

Driving Forces ◽

Communicable Disease ◽

Economic Valuation ◽

Poor Quality ◽

Urban Environments ◽

Policy And Practice

Poor quality urban environments substantially increase non-communicable disease. Responsibility for associated decision-making is dispersed across multiple agents and systems: fast growing urban authorities are the primary gatekeepers of new development and change in the UK, yet the driving forces are remote private sector interests supported by a political economy focused on short-termism and consumption-based growth. Economic valuation of externalities is widely thought to be fundamental, yet evidence on how to value and integrate it into urban development decision-making is limited, and it forms only a part of the decision-making landscape. Researchers must find new ways of integrating socio-environmental costs at numerous key leverage points across multiple complex systems. This mixed-methods study comprises of six highly integrated work packages. It aimsto develop and test a multi-action intervention in two urban areas: one on large-scale mixed-use development, the other on major transport. The core intervention is the co-production with key stakeholders through interviews, workshops, and participatory action research, of three areas of evidence: economic valuations of changed health outcomes; community-led media on health inequalities; and routes to potential impact mapped through co-production with key decision-makers, advisors and the lay public. This will be achieved by: mapping system of actors and processes involved in each case study; developing, testing and refining the combined intervention; evaluating the extent to which policy and practice changes amongst our target users, and the likelihood of impact on non-communicable diseases (NCDs) downstream. The integration of such diverse disciplines and sectors presents multiple practical/operational issues. The programme is testing new approaches to research, notably with regards practitioner-researcher integration and transdisciplinary research co-leadership. Other critical risks relate to urban development timescales, uncertainties in upstream-downstream causality, and the demonstration of impact.

Download Full-text

Large-scale cost function learning for path planning using deep inverse reinforcement learning

The International Journal of Robotics Research ◽

10.1177/0278364917722396 ◽

2017 ◽

Vol 36 (10) ◽

pp. 1073-1087 ◽

Cited By ~ 32

Author(s):

Markus Wulfmeier ◽

Dushyant Rao ◽

Dominic Zeng Wang ◽

Peter Ondruska ◽

Ingmar Posner

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Cost Model ◽

High Capacity ◽

Urban Environments ◽

Training Dataset ◽

Inverse Reinforcement Learning ◽

Dataset Size ◽

And Performance ◽

The Cost

We present an approach for learning spatial traversability maps for driving in complex, urban environments based on an extensive dataset demonstrating the driving behaviour of human experts. The direct end-to-end mapping from raw input data to cost bypasses the effort of manually designing parts of the pipeline, exploits a large number of data samples, and can be framed additionally to refine handcrafted cost maps produced based on manual hand-engineered features. To achieve this, we introduce a maximum-entropy-based, non-linear inverse reinforcement learning (IRL) framework which exploits the capacity of fully convolutional neural networks (FCNs) to represent the cost model underlying driving behaviours. The application of a high-capacity, deep, parametric approach successfully scales to more complex environments and driving behaviours, while at deployment being run-time independent of training dataset size. After benchmarking against state-of-the-art IRL approaches, we focus on demonstrating scalability and performance on an ambitious dataset collected over the course of 1 year including more than 25,000 demonstration trajectories extracted from over 120 km of urban driving. We evaluate the resulting cost representations by showing the advantages over a carefully, manually designed cost map and furthermore demonstrate its robustness towards systematic errors by learning accurate representations even in the presence of calibration perturbations. Importantly, we demonstrate that a manually designed cost map can be refined to more accurately handle corner cases that are scarcely seen in the environment, such as stairs, slopes and underpasses, by further incorporating human priors into the training framework.

Download Full-text

To hop or not, that is the question: Towards effective multi-hop reasoning over knowledge graphs

World Wide Web ◽

10.1007/s11280-021-00911-5 ◽

2021 ◽

Author(s):

Jinzhi Liao ◽

Xiang Zhao ◽

Jiuyang Tang ◽

Weixin Zeng ◽

Zhen Tan

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Supervised Learning ◽

Large Scale ◽

State Of The Art ◽

False Negative ◽

Stop Signal ◽

Knowledge Graph ◽

Overall Performance ◽

Knowledge Graphs

AbstractWith the proliferation of large-scale knowledge graphs (KGs), multi-hop knowledge graph reasoning has been a capstone that enables machines to be able to handle intelligent tasks, especially where some explicit reasoning path is appreciated for decision making. To train a KG reasoner, supervised learning-based methods suffer from false-negative issues, i.e., unseen paths during training are not to be found in prediction; in contrast, reinforcement learning (RL)-based methods do not require labeled paths, and can explore to cover many appropriate reasoning paths. In this connection, efforts have been dedicated to investigating several RL formulations for multi-hop KG reasoning. Particularly, current RL-based methods generate rewards at the very end of the reasoning process, due to which short paths of hops less than a given threshold are likely to be overlooked, and the overall performance is impaired. To address the problem, we propose , a revised RL formulation of multi-hop KG reasoning that is characterized by two novel designs—the stop signal and the worth-trying signal. The stop signal instructs the agent of RL to stay at the entity after finding the answer, preventing from hopping further even if the threshold is not reached; meanwhile, the worth-trying signal encourages the agent to try to learn some partial patterns from the paths that fail to lead to the answer. To validate the design of our model , comprehensive experiments are carried out on three benchmark knowledge graphs, and the results and analysis suggest the superiority of over state-of-the-art methods.

Download Full-text