scholarly journals Playing Atari with Six Neurons (Extended Abstract)

Author(s):  
Giuseppe Cuccu ◽  
Julian Togelius ◽  
Philippe Cudré-Mauroux

Deep reinforcement learning applied to vision-based problems like Atari games maps pixels directly to actions; internally, the deep neural network bears the responsibility of both extracting useful information and making decisions based on it. By separating image processing from decision-making, one could better understand the complexity of each task, as well as potentially find smaller policy representations that are easier for humans to understand and may generalize better. To this end, we propose a new method for learning policies and compact state representations separately but simultaneously for policy approximation in reinforcement learning. State representations are generated by an encoder based on two novel algorithms: Increasing Dictionary Vector Quantization makes the encoder capable of growing its dictionary size over time, to address new observations; and Direct Residuals Sparse Coding encodes observations by aiming for highest information inclusion. We test our system on a selection of Atari games using tiny neural networks of only 6 to 18 neurons (depending on the game's controls). These are still capable of achieving results comparable---and occasionally superior---to state-of-the-art techniques which use two orders of magnitude more neurons.

2021 ◽  
Vol 35 (2) ◽  
Author(s):  
Giuseppe Cuccu ◽  
Julian Togelius ◽  
Philippe Cudré-Mauroux

AbstractWe propose a new method for learning compact state representations and policies separately but simultaneously for policy approximation in vision-based applications such as Atari games. Approaches based on deep reinforcement learning typically map pixels directly to actions to enable end-to-end training. Internally, however, the deep neural network bears the responsibility of both extracting useful information and making decisions based on it, two objectives which can be addressed independently. Separating the image processing from the action selection allows for a better understanding of either task individually, as well as potentially finding smaller policy representations which is inherently interesting. Our approach learns state representations using a compact encoder based on two novel algorithms: (i) Increasing Dictionary Vector Quantization builds a dictionary of state representations which grows in size over time, allowing our method to address new observations as they appear in an open-ended online-learning context; and (ii) Direct Residuals Sparse Coding encodes observations in function of the dictionary, aiming for highest information inclusion by disregarding reconstruction error and maximizing code sparsity. As the dictionary size increases, however, the encoder produces increasingly larger inputs for the neural network; this issue is addressed with a new variant of the Exponential Natural Evolution Strategies algorithm which adapts the dimensionality of its probability distribution along the run. We test our system on a selection of Atari games using tiny neural networks of only 6 to 18 neurons (depending on each game’s controls). These are still capable of achieving results that are not much worse, and occasionally superior, to the state-of-the-art in direct policy search which uses two orders of magnitude more neurons.


2020 ◽  
Vol 34 (10) ◽  
pp. 13905-13906
Author(s):  
Rohan Saphal ◽  
Balaraman Ravindran ◽  
Dheevatsa Mudigere ◽  
Sasikanth Avancha ◽  
Bharat Kaul

Reinforcement learning algorithms are sensitive to hyper-parameters and require tuning and tweaking for specific environments for improving performance. Ensembles of reinforcement learning models on the other hand are known to be much more robust and stable. However, training multiple models independently on an environment suffers from high sample complexity. We present here a methodology to create multiple models from a single training instance that can be used in an ensemble through directed perturbation of the model parameters at regular intervals. This allows training a single model that converges to several local minima during the optimization process as a result of the perturbation. By saving the model parameters at each such instance, we obtain multiple policies during training that are ensembled during evaluation. We evaluate our approach on challenging discrete and continuous control tasks and also discuss various ensembling strategies. Our framework is substantially sample efficient, computationally inexpensive and is seen to outperform state of the art (SOTA) approaches


2020 ◽  
Vol 8 (6) ◽  
pp. 2983-2991

Image classification is an important task in computer vision involving a large area of applications such as object detection, localization and image segmentation. When it comes to image classification, the most adopted methods are based on deep neural network and especially convolutional Neural Networks(CNN). Selection of hyperparameters plays a crucial role in performance of model and it comes by experience. So, in this paper, we will use the genetic algorithm(GA) to automate and build the CNN model for higher accuracy on GPU which is provided by Google Collaboratory cloud. The best architecture of CNN after several generations of the genetic algorithm is then compared to the state-of-the-art CNN. We have used the malaria cell images dataset to find out whether the person is normal or if they are suffering from malaria. We trained two types of malaria cells, which are uninfected and parasitized on Tesla P100 multi core GPU. We got a high training accuracy of 97% and got a testing accuracy of about 95% on the multicore GPU that boosted the speed of execution of training time period and testing time period.


Author(s):  
Man Luo ◽  
Wenzhe Zhang ◽  
Tianyou Song ◽  
Kun Li ◽  
Hongming Zhu ◽  
...  

Electric Vehicle (EV) sharing systems have recently experienced unprecedented growth across the world. One of the key challenges in their operation is vehicle rebalancing, i.e., repositioning the EVs across stations to better satisfy future user demand. This is particularly challenging in the shared EV context, because i) the range of EVs is limited while charging time is substantial, which constrains the rebalancing options; and ii) as a new mobility trend, most of the current EV sharing systems are still continuously expanding their station networks, i.e., the targets for rebalancing can change over time. To tackle these challenges, in this paper we model the rebalancing task as a Multi-Agent Reinforcement Learning (MARL) problem, which directly takes the range and charging properties of the EVs into account. We propose a novel approach of policy optimization with action cascading, which isolates the non-stationarity locally, and use two connected networks to solve the formulated MARL. We evaluate the proposed approach using a simulator calibrated with 1-year operation data from a real EV sharing system. Results show that our approach significantly outperforms the state-of-the-art, offering up to 14% gain in order satisfied rate and 12% increase in net revenue.


Author(s):  
Xinyu Xiao ◽  
Lingfeng Wang ◽  
Shiming Xiang ◽  
Chunhong Pan

The image captioning is to describe an image with natural language as human, which has benefited from the advances in deep neural network and achieved substantial progress in performance. However, the perspective of human description to scene has not been fully considered in this task recently. Actually, the human description to scene is tightly related to the endogenous knowledge and the exogenous salient objects simultaneously, which implies that the content in the description is confined to the known salient objects. Inspired by this observation, this paper proposes a novel framework, which explicitly applies the known salient objects in image captioning. Under this framework, the known salient objects are served as the themes to guide the description generation. According to the property of the known salient object, a theme is composed of two components: its endogenous concept (what) and the exogenous spatial attention feature (where). Specifically, the prediction of each word is dominated by the concept and spatial attention feature of the corresponding theme in the process of caption prediction. Moreover, we introduce a novel learning method of Distinctive Learning (DL) to get more specificity of generated captions like human descriptions. It formulates two constraints in the theme learning process to encourage distinctiveness between different images. Particularly, reinforcement learning is introduced into the framework to address the exposure bias problem between the training and the testing modes. Extensive experiments on the COCO and Flickr30K datasets achieve superior results when compared with the state-of-the-art methods.


2019 ◽  
Vol 34 (Supplement_1) ◽  
pp. i156-i171
Author(s):  
Cihan Sarı ◽  
Albert Ali Salah ◽  
Alkım Almıla Akdag Salah

Abstract Paintings give us important clues about how males and females were perceived over centuries in the Western culture. In this article, we describe a system that allows scholars to automatically visualize how the clothing colors of male and female subjects changed over time. Our system analyzes a large database of paintings, locates portraits, automatically classifies each portrait’s subject as either male or female, segments the clothing areas and finds their dominant color. An interactive, web-based visualization is proposed to allow further exploration of the results. To test the accuracy of our system, we manually annotate a portion of the Rijksmuseum collection, and use state-of-the-art image processing and computer vision algorithms to process the paintings. We use a deep neural network-based style transfer approach to improve gender recognition (or more correctly, sex recognition) of the sitters of portraits. The annotations and the code of the approach are made available.


Symmetry ◽  
2019 ◽  
Vol 11 (2) ◽  
pp. 290 ◽  
Author(s):  
SeungYoon Choi ◽  
Tuyen Le ◽  
Quang Nguyen ◽  
Md Layek ◽  
SeungGwan Lee ◽  
...  

In this paper, we propose a controller for a bicycle using the DDPG (Deep Deterministic Policy Gradient) algorithm, which is a state-of-the-art deep reinforcement learning algorithm. We use a reward function and a deep neural network to build the controller. By using the proposed controller, a bicycle can not only be stably balanced but also travel to any specified location. We confirm that the controller with DDPG shows better performance than the other baselines such as Normalized Advantage Function (NAF) and Proximal Policy Optimization (PPO). For the performance evaluation, we implemented the proposed algorithm in various settings such as fixed and random speed, start location, and destination location.


2021 ◽  
Vol 18 (4) ◽  
pp. 1-22
Author(s):  
Jerzy Proficz

Two novel algorithms for the all-gather operation resilient to imbalanced process arrival patterns (PATs) are presented. The first one, Background Disseminated Ring (BDR), is based on the regular parallel ring algorithm often supplied in MPI implementations and exploits an auxiliary background thread for early data exchange from faster processes to accelerate the performed all-gather operation. The other algorithm, Background Sorted Linear synchronized tree with Broadcast (BSLB), is built upon the already existing PAP-aware gather algorithm, that is, Background Sorted Linear Synchronized tree (BSLS), followed by a regular broadcast distributing gathered data to all participating processes. The background of the imbalanced PAP subject is described, along with the PAP monitoring and evaluation topics. An experimental evaluation of the algorithms based on a proposed mini-benchmark is presented. The mini-benchmark was performed over 2,000 times in a typical HPC cluster architecture with homogeneous compute nodes. The obtained results are analyzed according to different PATs, data sizes, and process numbers, showing that the proposed optimization works well for various configurations, is scalable, and can significantly reduce the all-gather elapsed times, in our case, up to factor 1.9 or 47% in comparison with the best state-of-the-art solution.


Algorithms ◽  
2021 ◽  
Vol 14 (8) ◽  
pp. 226
Author(s):  
Wenzel Pilar von Pilchau ◽  
Anthony Stein ◽  
Jörg Hähner

State-of-the-art Deep Reinforcement Learning Algorithms such as DQN and DDPG use the concept of a replay buffer called Experience Replay. The default usage contains only the experiences that have been gathered over the runtime. We propose a method called Interpolated Experience Replay that uses stored (real) transitions to create synthetic ones to assist the learner. In this first approach to this field, we limit ourselves to discrete and non-deterministic environments and use a simple equally weighted average of the reward in combination with observed follow-up states. We could demonstrate a significantly improved overall mean average in comparison to a DQN network with vanilla Experience Replay on the discrete and non-deterministic FrozenLake8x8-v0 environment.


Electronics ◽  
2021 ◽  
Vol 10 (9) ◽  
pp. 999
Author(s):  
Ahmad Taher Azar ◽  
Anis Koubaa ◽  
Nada Ali Mohamed ◽  
Habiba A. Ibrahim ◽  
Zahra Fathy Ibrahim ◽  
...  

Unmanned Aerial Vehicles (UAVs) are increasingly being used in many challenging and diversified applications. These applications belong to the civilian and the military fields. To name a few; infrastructure inspection, traffic patrolling, remote sensing, mapping, surveillance, rescuing humans and animals, environment monitoring, and Intelligence, Surveillance, Target Acquisition, and Reconnaissance (ISTAR) operations. However, the use of UAVs in these applications needs a substantial level of autonomy. In other words, UAVs should have the ability to accomplish planned missions in unexpected situations without requiring human intervention. To ensure this level of autonomy, many artificial intelligence algorithms were designed. These algorithms targeted the guidance, navigation, and control (GNC) of UAVs. In this paper, we described the state of the art of one subset of these algorithms: the deep reinforcement learning (DRL) techniques. We made a detailed description of them, and we deduced the current limitations in this area. We noted that most of these DRL methods were designed to ensure stable and smooth UAV navigation by training computer-simulated environments. We realized that further research efforts are needed to address the challenges that restrain their deployment in real-life scenarios.


Sign in / Sign up

Export Citation Format

Share Document