scholarly journals Deep Q-Network with Predictive State Models in Partially Observable Domains

2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Danning Yu ◽  
Kun Ni ◽  
Yunlong Liu

While deep reinforcement learning (DRL) has achieved great success in some large domains, most of the related algorithms assume that the state of the underlying system is fully observable. However, many real-world problems are actually partially observable. For systems with continuous observation, most of the related algorithms, e.g., the deep Q-network (DQN) and deep recurrent Q-network (DRQN), use history observations to represent states; however, they often make computation-expensive and ignore the information of actions. Predictive state representations (PSRs) can offer a powerful framework for modelling partially observable dynamical systems with discrete or continuous state space, which represents the latent state using completely observable actions and observations. In this paper, we present a PSR model-based DQN approach which combines the strengths of the PSR model and DQN planning. We use a recurrent network to establish the recurrent PSR model, which can fully learn dynamics of the partially continuous observable environment. Then, the model is used for the state representation and update of DQN, which makes DQN no longer rely on a fixed number of history observations or recurrent neural network (RNN) to represent states in the case of partially observable environments. The strong performance of the proposed approach is demonstrated on a set of robotic control tasks from OpenAI Gym by comparing with the technique with the memory-based DRQN and the state-of-the-art recurrent predictive state policy (RPSP) networks. Source code is available at https://github.com/RPSR-DQN/paper-code.git.

2021 ◽  
Vol 3 (6) ◽  
Author(s):  
Ogbonnaya Anicho ◽  
Philip B. Charlesworth ◽  
Gurvinder S. Baicher ◽  
Atulya K. Nagar

AbstractThis work analyses the performance of Reinforcement Learning (RL) versus Swarm Intelligence (SI) for coordinating multiple unmanned High Altitude Platform Stations (HAPS) for communications area coverage. It builds upon previous work which looked at various elements of both algorithms. The main aim of this paper is to address the continuous state-space challenge within this work by using partitioning to manage the high dimensionality problem. This enabled comparing the performance of the classical cases of both RL and SI establishing a baseline for future comparisons of improved versions. From previous work, SI was observed to perform better across various key performance indicators. However, after tuning parameters and empirically choosing suitable partitioning ratio for the RL state space, it was observed that the SI algorithm still maintained superior coordination capability by achieving higher mean overall user coverage (about 20% better than the RL algorithm), in addition to faster convergence rates. Though the RL technique showed better average peak user coverage, the unpredictable coverage dip was a key weakness, making SI a more suitable algorithm within the context of this work.


1996 ◽  
Vol 33 (1) ◽  
pp. 122-126
Author(s):  
Torgny Lindvall ◽  
L. C. G. Rogers

The use of Mineka coupling is extended to a case with a continuous state space: an efficient coupling of random walks S and S' in can be made such that S' — S is virtually a one-dimensional simple random walk. This insight settles a zero-two law of ergodicity. One more proof of Blackwell's renewal theorem is also presented.


NeuroImage ◽  
2017 ◽  
Vol 162 ◽  
pp. 344-352 ◽  
Author(s):  
Jacob C.W. Billings ◽  
Alessio Medda ◽  
Sadia Shakil ◽  
Xiaohong Shen ◽  
Amrit Kashyap ◽  
...  

2009 ◽  
Vol 7 (2) ◽  
pp. 387-394 ◽  
Author(s):  
Tom Mortimer

This article considers the traditional approach to the ’state’ Models of corporate governance, namely shareholder Model and stakeholder Model. It then considers the extent to which developments in a recent accession EU country, Poland, reflects either of these Models or adopts a hybrid approach. It then offers proposals for the future development of corporate governance within Poland.


2021 ◽  
Author(s):  
Jie Wen ◽  
Yuanhao Shi ◽  
Jianfang Jia ◽  
Jianchao Zeng

The exponential stabilization of eigenstates by using switching state feedback strategy for quantum spin-$\frac{1}{2}$ systems is considered in this paper. In order to obtain faster state exponential convergence, we divide the state space into two subspaces, and use two different continuous state feedback controls in the corresponding subspace. The two continuous state feedback controls form the switching state feedback, under which the state convergence is faster than that under continuous state feedback. The exponential convergence and the superiority of switching state feedback are proved in theory and verified in numerical simulations. Besides, the influence of the control parameter on the state convergence rate is also studied.


Author(s):  
Eric Timmons ◽  
Brian C. Williams

State estimation methods based on hybrid discrete and continuous state models have emerged as a method of precisely computing belief states for real world systems, however they have difficulty scaling to systems with more than a handful of components. Classical, consistency based diagnosis methods scale to this level by combining best-first enumeration and conflict-directed search. While best-first methods have been developed for hybrid estimation, conflict-directed methods have thus far been elusive as conflicts summarize constraint violations, but probabilistic hybrid estimation is relatively unconstrained. In this paper we present an approach (A*BC) that unifies best-first enumeration and conflict-directed search in relatively unconstrained problems through the concept of "bounding" conflicts, an extension of conflicts that represent tighter bounds on the cost of regions of the search space. Experiments show that an A*BC powered state estimator produces estimates up to an order of magnitude faster than the current state of the art, particularly on large systems.


2002 ◽  
Vol 12 (02) ◽  
pp. 137-148
Author(s):  
K. GOPALSAMY ◽  
S. MOHAMAD

The convergence characteristics of a single dissipative Hopfield-type neuron with self-interaction under periodic external stimuli are considered. Sufficient conditions are established for associative encoding and recall of the periodic patterns associated with the external stimuli. Both continuous-time-continuous-state and discrete-time-continuous-state models are discussed. It is shown that when the neuronal gain is dominated by the neuronal dissipation on average, associative recall of the encoded temporal pattern is guaranteed and this is achieved by the global asymptotic stability of the encoded pattern.


Sign in / Sign up

Export Citation Format

Share Document