A reinforcement learning approach for sequential decision-making process of attacks in smart grid

Author(s):  
Zhen Ni ◽  
Shuva Paul ◽  
Xiangnan Zhong ◽  
Qinglai Wei
Author(s):  
Ming-Sheng Ying ◽  
Yuan Feng ◽  
Sheng-Gang Ying

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.


Author(s):  
Rey Pocius ◽  
Lawrence Neal ◽  
Alan Fern

Commonly used sequential decision making tasks such as the games in the Arcade Learning Environment (ALE) provide rich observation spaces suitable for deep reinforcement learning. However, they consist mostly of low-level control tasks which are of limited use for the development of explainable artificial intelligence(XAI) due to the fine temporal resolution of the tasks. Many of these domains also lack built-in high level abstractions and symbols. Existing tasks that provide for both strategic decision-making and rich observation spaces are either difficult to simulate or are intractable. We provide a set of new strategic decision-making tasks specialized for the development and evaluation of explainable AI methods, built as constrained mini-games within the StarCraft II Learning Environment.


2017 ◽  
Vol 29 (12) ◽  
pp. 2103-2113 ◽  
Author(s):  
Samuel J. Gershman ◽  
Jimmy Zhou ◽  
Cody Kommers

Imagination enables us not only to transcend reality but also to learn about it. In the context of reinforcement learning, an agent can rationally update its value estimates by simulating an internal model of the environment, provided that the model is accurate. In a series of sequential decision-making experiments, we investigated the impact of imaginative simulation on subsequent decisions. We found that imagination can cause people to pursue imagined paths, even when these paths are suboptimal. This bias is systematically related to participants' optimism about how much reward they expect to receive along imagined paths; providing feedback strongly attenuates the effect. The imagination effect can be captured by a reinforcement learning model that includes a bonus added onto imagined rewards. Using fMRI, we show that a network of regions associated with valuation is predictive of the imagination effect. These results suggest that imagination, although a powerful tool for learning, is also susceptible to motivational biases.


Author(s):  
Dongliang He ◽  
Xiang Zhao ◽  
Jizhou Huang ◽  
Fu Li ◽  
Xiao Liu ◽  
...  

The task of video grounding, which temporally localizes a natural language description in a video, plays an important role in understanding videos. Existing studies have adopted strategies of sliding window over the entire video or exhaustively ranking all possible clip-sentence pairs in a presegmented video, which inevitably suffer from exhaustively enumerated candidates. To alleviate this problem, we formulate this task as a problem of sequential decision making by learning an agent which regulates the temporal grounding boundaries progressively based on its policy. Specifically, we propose a reinforcement learning based framework improved by multi-task learning and it shows steady performance gains by considering additional supervised boundary information during training. Our proposed framework achieves state-of-the-art performance on ActivityNet’18 DenseCaption dataset (Krishna et al. 2017) and Charades-STA dataset (Sigurdsson et al. 2016; Gao et al. 2017) while observing only 10 or less clips per video.


2019 ◽  
Vol 1 (2) ◽  
pp. 590-610
Author(s):  
Zohreh Akbari ◽  
Rainer Unland

Sequential Decision Making Problems (SDMPs) that can be modeled as Markov Decision Processes can be solved using methods that combine Dynamic Programming (DP) and Reinforcement Learning (RL). Depending on the problem scenarios and the available Decision Makers (DMs), such RL algorithms may be designed for single-agent systems or multi-agent systems that either consist of agents with individual goals and decision making capabilities, which are influenced by other agent’s decisions, or behave as a swarm of agents that collaboratively learn a single objective. Many studies have been conducted in this area; however, when concentrating on available swarm RL algorithms, one obtains a clear view of the areas that still require attention. Most of the studies in this area focus on homogeneous swarms and so far, systems introduced as Heterogeneous Swarms (HetSs) merely include very few, i.e., two or three sub-swarms of homogeneous agents, which either, according to their capabilities, deal with a specific sub-problem of the general problem or exhibit different behaviors in order to reduce the risk of bias. This study introduces a novel approach that allows agents, which are originally designed to solve different problems and hence have higher degrees of heterogeneity, to behave as a swarm when addressing identical sub-problems. In fact, the affinity between two agents, which measures the compatibility of agents to work together towards solving a specific sub-problem, is used in designing a Heterogeneous Swarm RL (HetSRL) algorithm that allows HetSs to solve the intended SDMPs.


1997 ◽  
Vol 119 (4) ◽  
pp. 485-493 ◽  
Author(s):  
V. Krishnan ◽  
S. D. Eppinger ◽  
D. E. Whitney

In this paper, we consider the cross-functional design decision making process and discuss how sequential decision making leads to a degradation in design quality even when downstream design tasks are not rendered infeasible by preceding upstream decisions. We focus on the problem of simplifying the design iterations required to address this quality loss. Two properties, called sequence invariance and task invariance, are introduced to help reduce the complexity of subsequent design iterations. We also discuss how these properties may be used by designers in situations where mathematical descriptions of the design performance characteristics are unavailable. We illustrate the utility of these properties by showing their applicability to the design of catalytic converter diagnostic systems at a major U.S. automotive firm.


Author(s):  
Herbert C. Puscheck ◽  
James H. Greene

A two-sided wargame simulation and four decision making models to play one side of the game were developed. The game and models were used to study the decision making process exhibited by 64 students at the U.S. Military Academy. It was concluded that these students utilized a simple strategy; decisions were unaffected, within the range indicated by opponent decision delays; students displayed a learning effect during the game; there existed a positive correlation between mean decision time and score; academically lower ranking students received higher scores than higher ranking players; and players received higher scores when opposing certain more sophisticated opponents than when opposing selected simpler models. The results are discussed. The wargame and associated decision making models were run on a GE-225 computer from remote Teletype terminals. The investigation suggests a number of additional applications for the wargame and decision making models.


Sign in / Sign up

Export Citation Format

Share Document