Modeling an Inverted Pendulum via Differential Equations and Reinforcement Learning Techniques

Mapping Intimacies ◽

10.20944/preprints202005.0181.v1 ◽

2020 ◽

Author(s):

Siddharth Sharma

Keyword(s):

Reinforcement Learning ◽

Differential Equations ◽

Control Theory ◽

Inverted Pendulum ◽

Control Mechanism ◽

Dynamic Environment ◽

Mathematical Technique ◽

Q Learning ◽

Research Systems ◽

Learning Techniques

The prevalence of differential equations as a mathematical technique has refined the fields of control theory and constrained optimization due to the newfound ability to accurately model chaotic, unbalanced systems. However, in recent research, systems are increasingly more nonlinear and difficult to model using Differential Equations only. Thus, a newer technique is to use policy iteration and Reinforcement Learning, techniques that center around an action and reward sequence for a controller. Reinforcement Learning (RL) can be applied to control theory problems since a system can robustly apply RL in a dynamic environment such as the cartpole system (an inverted pendulum). This solution successfully avoids use of PID or other dynamics optimization systems, in favor of a more robust, reward-based control mechanism. This paper applies RL and Q-Learning to the classic cartpole problem, while also discussing the mathematical background and differential equations which are used to model the aforementioned system.

Download Full-text

Cloud Load Balancing and Reinforcement Learning

Advances in Business Information Systems and Analytics - Cloud Computing Technologies for Green Enterprises ◽

10.4018/978-1-5225-3038-1.ch011 ◽

2018 ◽

pp. 266-291

Author(s):

Abdelghafour Harraz ◽

Mostapha Zbakh

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Load Balancing ◽

Decision Process ◽

Cloud System ◽

Human Intervention ◽

Q Learning ◽

State Action ◽

Learning Techniques ◽

Markov Decision

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.

Download Full-text

Implementation of modified Q learning technique in EMCAP control architecture

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.5.9160 ◽

2017 ◽

Vol 7 (1.5) ◽

pp. 269

Author(s):

D. Ganesha ◽

Vijayakumar Maragal Venkatamuni

Keyword(s):

Learning Algorithm ◽

Control Strategies ◽

Cognitive Architecture ◽

Dynamic Environment ◽

Learning System ◽

Selection Strategy ◽

Q Learning ◽

Learning Techniques ◽

Markov Decision ◽

Environment Experiment

This research introduces a self learning modified (Q-Learning) techniques in a EMCAP (Enhanced Mind Cognitive Architecture of pupils). Q-learning is a modelless reinforcement learning (RL) methodology technique. In Specific, Q-learning can be applied to establish an optimal action-selection strategy for any respective Markov decision process. In this research introduces the modified Q-learning in a EMCAP (Enhanced Mind Cognitive Architecture of pupils). EMCAP architecture [1] enables and presents various agent control strategies for static and dynamic environment. Experiment are conducted to evaluate the performace for each agent individually. For result comparison among different agent, the same statistics were collected. This work considered varied kind of agents in different level of architecture for experiment analysis. The Fungus world testbed has been considered for experiment which is has been implemented using SwI-Prolog 5.4.6. The fixed obstructs tend to be more versatile, to make a location that is specific to Fungus world testbed environment. The various parameters are introduced in an environment to test a agent’s performance.his modified q learning algorithm can be more suitable in EMCAP architecture. The experiments are conducted the modified Q-Learning system gets more rewards compare to existing Q-learning.

Download Full-text

Q-Learning based Routing Protocol to Enhance Network Lifetime in WSNs

International journal of Computer Networks & Communications ◽

10.5121/ijcnc.2021.13204 ◽

2021 ◽

Vol 13 (2) ◽

pp. 57-80

Author(s):

Arunita Kundaliya ◽

D.K. Lobiyal

Keyword(s):

Reinforcement Learning ◽

Network Lifetime ◽

Residual Energy ◽

Efficient Solutions ◽

Machine Learning Techniques ◽

Q Learning ◽

Learning Techniques ◽

Aodv Protocol ◽

Optimal Action ◽

Additional Memory

In resource constraint Wireless Sensor Networks (WSNs), enhancement of network lifetime has been one of the significantly challenging issues for the researchers. Researchers have been exploiting machine learning techniques, in particular reinforcement learning, to achieve efficient solutions in the domain of WSN. The objective of this paper is to apply Q-learning, a reinforcement learning technique, to enhance the lifetime of the network, by developing distributed routing protocols. Q-learning is an attractive choice for routing due to its low computational requirements and additional memory demands. To facilitate an agent running at each node to take an optimal action, the approach considers node’s residual energy, hop length to sink and transmission power. The parameters, residual energy and hop length, are used to calculate the Q-value, which in turn is used to decide the optimal next-hop for routing. The proposed protocols’ performance is evaluated through NS3 simulations, and compared with AODV protocol in terms of network lifetime, throughput and end-to-end delay.

Download Full-text

Developing End-to-End Control Policies for Robotic Swarms Using Deep Q-learning

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2019.p0920 ◽

2019 ◽

Vol 23 (5) ◽

pp. 920-927 ◽

Cited By ~ 3

Author(s):

Yufei Wei ◽

Xiaotong Nie ◽

Motoaki Hiraga ◽

Kazuhiro Ohkura ◽

Zlatan Car ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Evolutionary Robotics ◽

Control Policy ◽

Control Policies ◽

Q Learning ◽

Robotic Swarms ◽

Learning Techniques ◽

End To End ◽

Large Parameter Space

In this study, the use of a popular deep reinforcement learning algorithm – deep Q-learning – in developing end-to-end control policies for robotic swarms is explored. Robots only have limited local sensory capabilities; however, in a swarm, they can accomplish collective tasks beyond the capability of a single robot. Compared with most automatic design approaches proposed so far, which belong to the field of evolutionary robotics, deep reinforcement learning techniques provide two advantages: (i) they enable researchers to develop control policies in an end-to-end fashion; and (ii) they require fewer computation resources, especially when the control policy to be developed has a large parameter space. The proposed approach is evaluated in a round-trip task, where the robots are required to travel between two destinations as much as possible. Simulation results show that the proposed approach can learn control policies directly from high-dimensional raw camera pixel inputs for robotic swarms.

Download Full-text

Double Deep Q-Learning and Faster R-CNN-Based Autonomous Vehicle Navigation and Obstacle Avoidance in Dynamic Environment

Sensors ◽

10.3390/s21041468 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1468

Author(s):

Razin Bin Issa ◽

Modhumonty Das ◽

Md. Saferi Rahman ◽

Monika Barua ◽

Md. Khalilur Rhaman ◽

...

Keyword(s):

Reinforcement Learning ◽

Autonomous Navigation ◽

Autonomous Vehicle ◽

Dynamic Environment ◽

Vehicle Navigation ◽

Autonomous Agent ◽

Learning Approach ◽

Q Learning ◽

Land Vehicles ◽

Autonomous Vehicle Navigation

Autonomous vehicle navigation in an unknown dynamic environment is crucial for both supervised- and Reinforcement Learning-based autonomous maneuvering. The cooperative fusion of these two learning approaches has the potential to be an effective mechanism to tackle indefinite environmental dynamics. Most of the state-of-the-art autonomous vehicle navigation systems are trained on a specific mapped model with familiar environmental dynamics. However, this research focuses on the cooperative fusion of supervised and Reinforcement Learning technologies for autonomous navigation of land vehicles in a dynamic and unknown environment. The Faster R-CNN, a supervised learning approach, identifies the ambient environmental obstacles for untroubled maneuver of the autonomous vehicle. Whereas, the training policies of Double Deep Q-Learning, a Reinforcement Learning approach, enable the autonomous agent to learn effective navigation decisions form the dynamic environment. The proposed model is primarily tested in a gaming environment similar to the real-world. It exhibits the overall efficiency and effectiveness in the maneuver of autonomous land vehicles.

Download Full-text

Deep Q Learning in Stabilization of Inverted Pendulum

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b6904.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 803-807

Keyword(s):

Reinforcement Learning ◽

Language Processing ◽

Inverted Pendulum ◽

Business Management ◽

Q Learning ◽

Sequence Generation ◽

Decision Control ◽

New States ◽

Experience Replay ◽

Real Robot

Reinforcement Learning is a concept in which an agent takes actions in an environment to maximize its aggregate reward. Agents in Reinforcement Learning algorithms are penalized for bad actions and rewarded for good ones. Deep Reinforcement learning is used to overcome some shortcomings in Reinforcement learning like lack of variety in the open source collection and change in task’s difficulty based on reward or set of actions. In a larger environment, inference of new states from already explored states is a difficult task due to its time & space complexity. Hence the Q-value function which represents the quality value is approximated by Deep Q-learning that uses a neural network for the same. The results of implementing Deep Q Learning (DQN), Double DQN, Dueling DQN, Noisy DQN & DQN Prioritized Experience Replay techniques & their performance in stabilization of inverted pendulum are highlighted in this paper. Deep Reinforcement learning can be applied in various platform like for recognition, perception in computer vision, for simulation to real robot control in robotics, for sequence generation, translation in Natural Language processing, for Poker, Bridge StarCraft in games, for pricing, trading risk management in finance, for e- commerce, customer management in business management, for diagnosis using Electronic Medical Records in healthcare and in adaptive decision control.

Download Full-text

Application of Deep Q-learning and double Deep Q-learning algorithms to the task of control an inverted pendulum

Transaction of Scientific Papers of the Novosibirsk State Technical University ◽

10.17212/2307-6879-2020-1-2-7-25 ◽

2020 ◽

pp. 7-25

Author(s):

Alla Evseenko ◽

◽

Dmitrii Romannikov ◽

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Reinforcement Learning ◽

Inverted Pendulum ◽

Learning Algorithm ◽

Learning Algorithms ◽

Training Time ◽

Q Learning ◽

Research Areas ◽

Wide Range

Today, such a branch of science as «artificial intelligence» is booming in the world. Systems built on the basis of artificial intelligence methods have the ability to perform functions that are traditionally considered the prerogative of man. Artificial intelligence has a wide range of research areas. One such area is machine learning. This article discusses the algorithms of one of the approaches of machine learning – reinforcement learning (RL), according to which a lot of research and development has been carried out over the past seven years. Development and research on this approach is mainly carried out to solve problems in Atari 2600 games or in other similar ones. In this article, reinforcement training will be applied to one of the dynamic objects – an inverted pendulum. As a model of this object, we consider a model of an inverted pendulum on a cart taken from the Gym library, which contains many models that are used to test and analyze reinforcement learning algorithms. The article describes the implementation and study of two algorithms from this approach, Deep Q-learning and Double Deep Q-learning. As a result, training, testing and training time graphs for each algorithm are presented, on the basis of which it is concluded that it is desirable to use the Double Deep Q-learning algorithm, because the training time is approximately 2 minutes and provides the best control for the model of an inverted pendulum on a cart.

Download Full-text

Q-Learning Based Fair and Efficient Coexistence of LTE in Unlicensed Band

Sensors ◽

10.3390/s19132875 ◽

2019 ◽

Vol 19 (13) ◽

pp. 2875 ◽

Cited By ~ 6

Author(s):

Rojeena Bajracharya ◽

Rakesh Shrestha ◽

Sung Won Kim

Keyword(s):

Dynamic Environment ◽

Multimedia Communications ◽

Duty Cycling ◽

Q Learning ◽

Iot Applications ◽

Term Evolution ◽

Learning Techniques ◽

Dynamic Channel ◽

Selection Of

The increased demand for spectrum resources for multimedia communications and a limited licensed spectrum have led to widespread concern regarding the operation of long term evolution (LTE) in the unlicensed (LTE-U) band for internet of things (IoT) systems. Because Wi-Fi and LTE are diverse with dissimilar physical and link layer configurations, several solutions to achieve an efficient and fair coexistence have been proposed. Most of the proposed solutions facilitate a fair coexistence through a discontinuous transmission using a duty cycling or contention mechanism and an efficient coexistence through a clean channel selection. However, they are constrained only by fairness or efficient coexistence but not both. Herein, we propose joint adaptive duty cycling (ADC) and dynamic channel switch (DCS) mechanisms. The ADC mechanism supports a fair channel access opportunity by muting certain numbers of subframes for Wi-Fi users whereas the DCS mechanism offers more access opportunities for LTE-U and Wi-Fi users by preventing LTE-U users from occupying a crowded channel for a longer time. To support these mechanisms in a dynamic environment, LTE-U for IoT applications is enhanced using Q-learning techniques for an automatic selection of the appropriate combination of muting period and channel. Simulation results show the fair and efficient coexistence achieved from using the proposed mechanism.

Download Full-text