A Q-Learning Framework for User QoE Enhanced Self-Organizing Spectrally Efficient Network Using a Novel Inter-Operator Proximal Spectrum Sharing

2016 ◽  
Vol 34 (11) ◽  
pp. 2887-2901 ◽  
Author(s):  
Manikantan Srinivasan ◽  
Vijeth J. Kotagi ◽  
C. Siva Ram Murthy
2016 ◽  
Vol 13 (1) ◽  
pp. 85-98 ◽  
Author(s):  
Stephen S. Mwanje ◽  
Lars Christoph Schmelz ◽  
Andreas Mitschele-Thiel

2011 ◽  
Vol 181 (13) ◽  
pp. 2813-2822 ◽  
Author(s):  
Kao-Shing Hwang ◽  
Hsin-Yi Lin ◽  
Yuan-Pao Hsu ◽  
Hung-Hsiu Yu

Mathematics ◽  
2020 ◽  
Vol 8 (9) ◽  
pp. 1479
Author(s):  
Francisco Martinez-Gil ◽  
Miguel Lozano ◽  
Ignacio García-Fernández ◽  
Pau Romero ◽  
Dolors Serra ◽  
...  

Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents. This invalidates direct manipulation of the learned value function as a method to modify the derived behaviors. In this paper, we propose the use of Inverse Reinforcement Learning to incorporate real behavior traces in the learning process to shape the learned behaviors, thus increasing their trustworthiness (in terms of conformance to reality). To do so, we adapt the Inverse Reinforcement Learning framework to the navigation problem domain. Specifically, we use Soft Q-learning, an algorithm based on the maximum causal entropy principle, with MARL-Ped (a Reinforcement Learning-based pedestrian simulator) to include information from trajectories of real pedestrians in the process of learning how to navigate inside a virtual 3D space that represents the real environment. A comparison with the behaviors learned using a Reinforcement Learning classic algorithm (Sarsa(λ)) shows that the Inverse Reinforcement Learning behaviors adjust significantly better to the real trajectories.


2021 ◽  
Author(s):  
Aqib A. Syed ◽  
Thanakorn Khamvilai ◽  
Yoobin Kim ◽  
Kyriakos G. Vamvoudakis

2019 ◽  
Vol 64 (9) ◽  
pp. 3756-3763 ◽  
Author(s):  
Donghwan Lee ◽  
Jianghai Hu

Electronics ◽  
2019 ◽  
Vol 8 (6) ◽  
pp. 615 ◽  
Author(s):  
Ching-Chang Wong ◽  
Chih-Cheng Liu ◽  
Sheng-Ru Xiao ◽  
Hao-Yu Yang ◽  
Meng-Cheng Lau

In this paper, an oscillator-based gait pattern with sinusoidal functions is designed and implemented on a field-programmable gate array (FPGA) chip to generate a trajectory plan and achieve bipedal locomotion for a small-sized humanoid robot. In order to let the robot can walk straight, the turning direction is viewed as a parameter of the gait pattern and Q-learning is used to obtain a straightforward gait pattern. Moreover, an automatic training platform is designed so that the learning process is automated. In this way, the turning direction can be adjusted flexibly and efficiently under the supervision of the automatic training platform. The experimental results show that the proposed learning framework allows the humanoid robot to gradually walk straight in the automated learning process.


Sign in / Sign up

Export Citation Format

Share Document