Application of Deep Q-learning and double Deep Q-learning algorithms to the task of control an inverted pendulum

Today, such a branch of science as «artificial intelligence» is booming in the world. Systems built on the basis of artificial intelligence methods have the ability to perform functions that are traditionally considered the prerogative of man. Artificial intelligence has a wide range of research areas. One such area is machine learning. This article discusses the algorithms of one of the approaches of machine learning – reinforcement learning (RL), according to which a lot of research and development has been carried out over the past seven years. Development and research on this approach is mainly carried out to solve problems in Atari 2600 games or in other similar ones. In this article, reinforcement training will be applied to one of the dynamic objects – an inverted pendulum. As a model of this object, we consider a model of an inverted pendulum on a cart taken from the Gym library, which contains many models that are used to test and analyze reinforcement learning algorithms. The article describes the implementation and study of two algorithms from this approach, Deep Q-learning and Double Deep Q-learning. As a result, training, testing and training time graphs for each algorithm are presented, on the basis of which it is concluded that it is desirable to use the Double Deep Q-learning algorithm, because the training time is approximately 2 minutes and provides the best control for the model of an inverted pendulum on a cart.

Download Full-text

Choice of cargo delivery option in multimodal connection based on reinforcement learning

Journal of Physics Conference Series ◽

10.1088/1742-6596/2131/3/032103 ◽

2021 ◽

Vol 2131 (3) ◽

pp. 032103

Author(s):

A P Badetskii ◽

O A Medved

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Digital Technologies ◽

Qualitative Assessment ◽

Q Learning ◽

Cargo Delivery ◽

Route Option

Abstract The article discusses the issues of choosing a route and an option of cargo flows in multimodal connection in modern conditions. Taking into account active development of artificial intelligence and digital technologies in all types of production activities, it is proposed to use reinforcement learning algorithms to solve the problem. An analysis of the existing algorithms has been carried out, on the basis of which it was found that when choosing a route option for cargo in a multimodal connection, it would be useful to have a qualitative assessment of terminal states. To obtain such an estimate, the Q-learning algorithm was applied in the article, which showed sufficient convergence and efficiency.

Download Full-text

A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms

Neural Computation ◽

10.1162/089976699300016070 ◽

1999 ◽

Vol 11 (8) ◽

pp. 2017-2060 ◽

Cited By ~ 70

Author(s):

Csaba Szepesvári ◽

Michael L. Littman

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Learning Algorithm ◽

Learning Algorithms ◽

Sequential Decision ◽

Q Learning ◽

Markov Games ◽

Optimal Behavior ◽

Risk Sensitive ◽

Optimal Value

Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinforcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based reinforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning algorithm to be proved by verifying that a simpler synchronous algorithm converges. We illustrate the application of the theorem by analyzing the convergence of Q-learning, model-based reinforcement learning, Q-learning with multistate updates, Q-learning for Markov games, and risk-sensitive reinforcement learning.

Download Full-text

Image Classification Using Reinforcement Learning

Russian Digital Libraries Journal ◽

10.26907/1562-5419-2020-23-6-1172-1191 ◽

2020 ◽

Vol 23 (6) ◽

pp. 1172-1191

Author(s):

Artem Aleksandrovich Elizarov ◽

Evgenii Viktorovich Razinkov

Keyword(s):

Neural Network ◽

Artificial Intelligence ◽

Machine Learning ◽

Computer Vision ◽

Reinforcement Learning ◽

Image Classification ◽

Deep Neural Network ◽

Learning Algorithms ◽

Further Development

Recently, such a direction of machine learning as reinforcement learning has been actively developing. As a consequence, attempts are being made to use reinforcement learning for solving computer vision problems, in particular for solving the problem of image classification. The tasks of computer vision are currently one of the most urgent tasks of artificial intelligence. The article proposes a method for image classification in the form of a deep neural network using reinforcement learning. The idea of the developed method comes down to solving the problem of a contextual multi-armed bandit using various strategies for achieving a compromise between exploitation and research and reinforcement learning algorithms. Strategies such as -greedy, -softmax, -decay-softmax, and the UCB1 method, and reinforcement learning algorithms such as DQN, REINFORCE, and A2C are considered. The analysis of the influence of various parameters on the efficiency of the method is carried out, and options for further development of the method are proposed.

Download Full-text

Iterative SARSA: The Modified SARSA Algorithm for Finding the Optimal Path

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f9429.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 4333-4338

Keyword(s):

Reinforcement Learning ◽

Comparative Analysis ◽

Mobile Robots ◽

Learning Algorithm ◽

Learning Algorithms ◽

Optimal Path ◽

Autonomous Mobile Robots ◽

Current Standard ◽

Q Learning ◽

Better Than

This paper presents a thorough comparative analysis of various reinforcement learning algorithms used by autonomous mobile robots for optimal path finding and, we propose a new algorithm called Iterative SARSA for the same. The main objective of the paper is to differentiate between the Q-learning and SARSA, and modify the latter. These algorithms use either the on-policy or off-policy methods of reinforcement learning. For the on-policy method, we have used the SARSA algorithm and for the off-policy method, the Q-learning algorithm has been used. These algorithms also have an impacting effect on finding the shortest path possible for the robot. Based on the results obtained, we have concluded how our algorithm is better than the current standard reinforcement learning algorithms

Download Full-text

A line follower robot implementation using Lego's Mindstorms Kit and Q-Learning

Acta Universitaria ◽

10.15174/au.2012.350 ◽

2012 ◽

Vol 22 ◽

pp. 113-118 ◽

Cited By ~ 1

Author(s):

Víctor Ricardo Cruz-Álvarez ◽

Enrique Hidalgo-Peña ◽

Hector-Gabriel Acosta-Mesa

Keyword(s):

Reinforcement Learning ◽

Mobile Robots ◽

Learning Algorithm ◽

Learning Algorithms ◽

Programming Environment ◽

Q Learning

A common problem working with mobile robots is that programming phase could be a long, expensive and heavy process for programmers. The reinforcement learning algorithms offer one of the most general frameworks in learning subjects. This work presents an approach using the Q-Learning algorithm on a Lego robot in order for it to learn by itself how to follow a blackline drawn down on a white surface, using Matlab [5] as programming environment.

Download Full-text

Machine Learning and Data Mining in Bioinformatics

Machine Learning ◽

10.4018/978-1-60960-818-7.ch401 ◽

2012 ◽

pp. 695-703

Author(s):

George Tzanis ◽

Christos Berberidis ◽

Ioannis Vlahavas

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Data Mining ◽

Reinforcement Learning ◽

Supervised Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

The World ◽

Supervised Learning Algorithms ◽

Computational Systems

Machine learning is one of the oldest subfields of artificial intelligence and is concerned with the design and development of computational systems that can adapt themselves and learn. The most common machine learning algorithms can be either supervised or unsupervised. Supervised learning algorithms generate a function that maps inputs to desired outputs, based on a set of examples with known output (labeled examples). Unsupervised learning algorithms find patterns and relationships over a given set of inputs (unlabeled examples). Other categories of machine learning are semi-supervised learning, where an algorithm uses both labeled and unlabeled examples, and reinforcement learning, where an algorithm learns a policy of how to act given an observation of the world.

Download Full-text

Machine Learning and Data Mining in Bioinformatics

Handbook of Research on Innovations in Database Technologies and Applications ◽

10.4018/978-1-60566-242-8.ch066 ◽

2009 ◽

pp. 612-621 ◽

Cited By ~ 2

Author(s):

George Tzanis ◽

Christos Berberidis ◽

Ioannis Vlahavas

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Data Mining ◽

Reinforcement Learning ◽

Supervised Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

The World ◽

Supervised Learning Algorithms ◽

Computational Systems

Download Full-text

When does reinforcement learning stand out in quantum control? A comparative study on state preparation

npj Quantum Information ◽

10.1038/s41534-019-0201-8 ◽

2019 ◽

Vol 5 (1) ◽

Cited By ~ 9

Author(s):

Xiao-Ming Zhang ◽

Zezhu Wei ◽

Raza Asad ◽

Xu-Chen Yang ◽

Xin Wang

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Comparative Study ◽

Gradient Descent ◽

Quantum Control ◽

Learning Algorithms ◽

Stochastic Gradient Descent ◽

Q Learning ◽

Machine Learning Methods ◽

Policy Gradient

Abstract Reinforcement learning has been widely used in many problems, including quantum control of qubits. However, such problems can, at the same time, be solved by traditional, non-machine-learning methods, such as stochastic gradient descent and Krotov algorithms, and it remains unclear which one is most suitable when the control has specific constraints. In this work, we perform a comparative study on the efficacy of three reinforcement learning algorithms: tabular Q-learning, deep Q-learning, and policy gradient, as well as two non-machine-learning methods: stochastic gradient descent and Krotov algorithms, in the problem of preparing a desired quantum state. We found that overall, the deep Q-learning and policy gradient algorithms outperform others when the problem is discretized, e.g. allowing discrete values of control, and when the problem scales up. The reinforcement learning algorithms can also adaptively reduce the complexity of the control sequences, shortening the operation time and improving the fidelity. Our comparison provides insights into the suitability of reinforcement learning in quantum control problems.

Download Full-text

Autism Spectrum Disorder Classification Using Deep Learning

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v17i08.24603 ◽

2021 ◽

Vol 17 (08) ◽

pp. 103

Author(s):

Abdulrazak Yahya Saleh ◽

Lim Huey Chern

Keyword(s):

Artificial Intelligence ◽

Autism Spectrum Disorder ◽

Deep Learning ◽

Learning Algorithm ◽

Learning Algorithms ◽

Image Data ◽

Autism Spectrum ◽

Spectrum Disorder ◽

Support Vector ◽

Wide Range

<p class="0abstract">The goal of this paper is to evaluate the deep learning algorithm for people placed in the Autism Spectrum Disorder (ASD) classification. ASD is a developmental disability that causes the affected people to have significant communication, social, and behavioural challenges. People with autism are saddled with communication problems, difficulties in social interaction and displaying repetitive behaviours. Several methods have been used to classify the ASD from non-ASD people. However, there is a need to explore more algorithms that can yield better classification performance. Recently, deep learning methods have significantly sharpened the cutting edge of learning algorithms in a wide range of artificial intelligence tasks. These artificial intelligence tasks refer to object detection, speech recognition, and machine translation. In this research, the convolutional neural network (CNN) is employed. This algorithm is used to find processes that can classify ASD with a higher level of accuracy. The image data is pre-processed; the CNN algorithm is then applied to classify the ASD and non-ASD, and the steps of implementing the CNN algorithm are clearly stated. Finally, the effectiveness of the algorithm is evaluated based on the accuracy performance. The support vector machine (SVM) is utilised for the purpose of comparison. The CNN algorithm produces better results with an accuracy of 97.07%, compared with the SVM algorithm. In the future, different types of deep learning algorithms need to be applied, and different datasets can be tested with different hyper-parameters to produce more accurate ASD classifications.</p>

Download Full-text

Minibatch Recursive Least Squares Q-Learning

Computational Intelligence and Neuroscience ◽

10.1155/2021/5370281 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Chunyuan Zhang ◽

Qi Song ◽

Zeng Meng

Keyword(s):

Reinforcement Learning ◽

Least Squares ◽

Linear Function ◽

Function Approximation ◽

Learning Algorithm ◽

Learning Algorithms ◽

Optimization Technique ◽

Recursive Least Squares ◽

Q Learning ◽

Linear Function Approximation

The deep Q-network (DQN) is one of the most successful reinforcement learning algorithms, but it has some drawbacks such as slow convergence and instability. In contrast, the traditional reinforcement learning algorithms with linear function approximation usually have faster convergence and better stability, although they easily suffer from the curse of dimensionality. In recent years, many improvements to DQN have been made, but they seldom make use of the advantage of traditional algorithms to improve DQN. In this paper, we propose a novel Q-learning algorithm with linear function approximation, called the minibatch recursive least squares Q-learning (MRLS-Q). Different from the traditional Q-learning algorithm with linear function approximation, the learning mechanism and model structure of MRLS-Q are more similar to those of DQNs with only one input layer and one linear output layer. It uses the experience replay and the minibatch training mode and uses the agent’s states rather than the agent’s state-action pairs as the inputs. As a result, it can be used alone for low-dimensional problems and can be seamlessly integrated into DQN as the last layer for high-dimensional problems as well. In addition, MRLS-Q uses our proposed average RLS optimization technique, so that it can achieve better convergence performance whether it is used alone or integrated with DQN. At the end of this paper, we demonstrate the effectiveness of MRLS-Q on the CartPole problem and four Atari games and investigate the influences of its hyperparameters experimentally.

Download Full-text