scholarly journals Modeling an Inverted Pendulum via Differential Equations and Reinforcement Learning Techniques

Author(s):  
Siddharth Sharma

The prevalence of differential equations as a mathematical technique has refined the fields of control theory and constrained optimization due to the newfound ability to accurately model chaotic, unbalanced systems. However, in recent research, systems are increasingly more nonlinear and difficult to model using Differential Equations only. Thus, a newer technique is to use policy iteration and Reinforcement Learning, techniques that center around an action and reward sequence for a controller. Reinforcement Learning (RL) can be applied to control theory problems since a system can robustly apply RL in a dynamic environment such as the cartpole system (an inverted pendulum). This solution successfully avoids use of PID or other dynamics optimization systems, in favor of a more robust, reward-based control mechanism. This paper applies RL and Q-Learning to the classic cartpole problem, while also discussing the mathematical background and differential equations which are used to model the aforementioned system.

Author(s):  
Abdelghafour Harraz ◽  
Mostapha Zbakh

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.


2017 ◽  
Vol 7 (1.5) ◽  
pp. 269
Author(s):  
D. Ganesha ◽  
Vijayakumar Maragal Venkatamuni

This research introduces a self learning modified (Q-Learning) techniques in a EMCAP (Enhanced Mind Cognitive Architecture of pupils). Q-learning is a modelless reinforcement learning (RL) methodology technique. In Specific, Q-learning can be applied to establish an optimal action-selection strategy for any respective Markov decision process. In this research introduces the modified Q-learning in a EMCAP (Enhanced Mind Cognitive Architecture of pupils). EMCAP architecture [1] enables and presents various agent control strategies for static and dynamic environment.  Experiment are conducted to evaluate the performace for each agent individually. For result comparison among different agent, the same statistics were collected. This work considered varied kind of agents in different level of architecture for experiment analysis. The Fungus world testbed has been considered for experiment which is has been implemented using SwI-Prolog 5.4.6. The fixed obstructs tend to be more versatile, to make a location that is specific to Fungus world testbed environment. The various parameters are introduced in an environment to test a agent’s performance.his modified q learning algorithm can be more suitable in EMCAP architecture.  The experiments are conducted the modified Q-Learning system gets more rewards compare to existing Q-learning.


2021 ◽  
Vol 13 (2) ◽  
pp. 57-80
Author(s):  
Arunita Kundaliya ◽  
D.K. Lobiyal

In resource constraint Wireless Sensor Networks (WSNs), enhancement of network lifetime has been one of the significantly challenging issues for the researchers. Researchers have been exploiting machine learning techniques, in particular reinforcement learning, to achieve efficient solutions in the domain of WSN. The objective of this paper is to apply Q-learning, a reinforcement learning technique, to enhance the lifetime of the network, by developing distributed routing protocols. Q-learning is an attractive choice for routing due to its low computational requirements and additional memory demands. To facilitate an agent running at each node to take an optimal action, the approach considers node’s residual energy, hop length to sink and transmission power. The parameters, residual energy and hop length, are used to calculate the Q-value, which in turn is used to decide the optimal next-hop for routing. The proposed protocols’ performance is evaluated through NS3 simulations, and compared with AODV protocol in terms of network lifetime, throughput and end-to-end delay.


Author(s):  
Yufei Wei ◽  
Xiaotong Nie ◽  
Motoaki Hiraga ◽  
Kazuhiro Ohkura ◽  
Zlatan Car ◽  
...  

In this study, the use of a popular deep reinforcement learning algorithm – deep Q-learning – in developing end-to-end control policies for robotic swarms is explored. Robots only have limited local sensory capabilities; however, in a swarm, they can accomplish collective tasks beyond the capability of a single robot. Compared with most automatic design approaches proposed so far, which belong to the field of evolutionary robotics, deep reinforcement learning techniques provide two advantages: (i) they enable researchers to develop control policies in an end-to-end fashion; and (ii) they require fewer computation resources, especially when the control policy to be developed has a large parameter space. The proposed approach is evaluated in a round-trip task, where the robots are required to travel between two destinations as much as possible. Simulation results show that the proposed approach can learn control policies directly from high-dimensional raw camera pixel inputs for robotic swarms.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1468
Author(s):  
Razin Bin Issa ◽  
Modhumonty Das ◽  
Md. Saferi Rahman ◽  
Monika Barua ◽  
Md. Khalilur Rhaman ◽  
...  

Autonomous vehicle navigation in an unknown dynamic environment is crucial for both supervised- and Reinforcement Learning-based autonomous maneuvering. The cooperative fusion of these two learning approaches has the potential to be an effective mechanism to tackle indefinite environmental dynamics. Most of the state-of-the-art autonomous vehicle navigation systems are trained on a specific mapped model with familiar environmental dynamics. However, this research focuses on the cooperative fusion of supervised and Reinforcement Learning technologies for autonomous navigation of land vehicles in a dynamic and unknown environment. The Faster R-CNN, a supervised learning approach, identifies the ambient environmental obstacles for untroubled maneuver of the autonomous vehicle. Whereas, the training policies of Double Deep Q-Learning, a Reinforcement Learning approach, enable the autonomous agent to learn effective navigation decisions form the dynamic environment. The proposed model is primarily tested in a gaming environment similar to the real-world. It exhibits the overall efficiency and effectiveness in the maneuver of autonomous land vehicles.


Reinforcement Learning is a concept in which an agent takes actions in an environment to maximize its aggregate reward. Agents in Reinforcement Learning algorithms are penalized for bad actions and rewarded for good ones. Deep Reinforcement learning is used to overcome some shortcomings in Reinforcement learning like lack of variety in the open source collection and change in task’s difficulty based on reward or set of actions. In a larger environment, inference of new states from already explored states is a difficult task due to its time & space complexity. Hence the Q-value function which represents the quality value is approximated by Deep Q-learning that uses a neural network for the same. The results of implementing Deep Q Learning (DQN), Double DQN, Dueling DQN, Noisy DQN & DQN Prioritized Experience Replay techniques & their performance in stabilization of inverted pendulum are highlighted in this paper. Deep Reinforcement learning can be applied in various platform like for recognition, perception in computer vision, for simulation to real robot control in robotics, for sequence generation, translation in Natural Language processing, for Poker, Bridge StarCraft in games, for pricing, trading risk management in finance, for e- commerce, customer management in business management, for diagnosis using Electronic Medical Records in healthcare and in adaptive decision control.


Author(s):  
Alla Evseenko ◽  
◽  
Dmitrii Romannikov ◽  

Today, such a branch of science as «artificial intelligence» is booming in the world. Systems built on the basis of artificial intelligence methods have the ability to perform functions that are traditionally considered the prerogative of man. Artificial intelligence has a wide range of research areas. One such area is machine learning. This article discusses the algorithms of one of the approaches of machine learning – reinforcement learning (RL), according to which a lot of research and development has been carried out over the past seven years. Development and research on this approach is mainly carried out to solve problems in Atari 2600 games or in other similar ones. In this article, reinforcement training will be applied to one of the dynamic objects – an inverted pendulum. As a model of this object, we consider a model of an inverted pendulum on a cart taken from the Gym library, which contains many models that are used to test and analyze reinforcement learning algorithms. The article describes the implementation and study of two algorithms from this approach, Deep Q-learning and Double Deep Q-learning. As a result, training, testing and training time graphs for each algorithm are presented, on the basis of which it is concluded that it is desirable to use the Double Deep Q-learning algorithm, because the training time is approximately 2 minutes and provides the best control for the model of an inverted pendulum on a cart.


Sensors ◽  
2019 ◽  
Vol 19 (13) ◽  
pp. 2875 ◽  
Author(s):  
Rojeena Bajracharya ◽  
Rakesh Shrestha ◽  
Sung Won Kim

The increased demand for spectrum resources for multimedia communications and a limited licensed spectrum have led to widespread concern regarding the operation of long term evolution (LTE) in the unlicensed (LTE-U) band for internet of things (IoT) systems. Because Wi-Fi and LTE are diverse with dissimilar physical and link layer configurations, several solutions to achieve an efficient and fair coexistence have been proposed. Most of the proposed solutions facilitate a fair coexistence through a discontinuous transmission using a duty cycling or contention mechanism and an efficient coexistence through a clean channel selection. However, they are constrained only by fairness or efficient coexistence but not both. Herein, we propose joint adaptive duty cycling (ADC) and dynamic channel switch (DCS) mechanisms. The ADC mechanism supports a fair channel access opportunity by muting certain numbers of subframes for Wi-Fi users whereas the DCS mechanism offers more access opportunities for LTE-U and Wi-Fi users by preventing LTE-U users from occupying a crowded channel for a longer time. To support these mechanisms in a dynamic environment, LTE-U for IoT applications is enhanced using Q-learning techniques for an automatic selection of the appropriate combination of muting period and channel. Simulation results show the fair and efficient coexistence achieved from using the proposed mechanism.


Author(s):  
Gokhan Demirkiran ◽  
Ozcan Erdener ◽  
Onay Akpinar ◽  
Pelin Demirtas ◽  
M. Yagiz Arik ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document