Playing Atari with Six Neurons (Extended Abstract)

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/651 ◽

2020 ◽

Author(s):

Giuseppe Cuccu ◽

Julian Togelius ◽

Philippe Cudré-Mauroux

Keyword(s):

Reinforcement Learning ◽

Vector Quantization ◽

Sparse Coding ◽

Deep Neural Network ◽

State Of The Art ◽

Compact State ◽

Learning Policies ◽

Novel Algorithms ◽

Over Time ◽

Selection Of

Deep reinforcement learning applied to vision-based problems like Atari games maps pixels directly to actions; internally, the deep neural network bears the responsibility of both extracting useful information and making decisions based on it. By separating image processing from decision-making, one could better understand the complexity of each task, as well as potentially find smaller policy representations that are easier for humans to understand and may generalize better. To this end, we propose a new method for learning policies and compact state representations separately but simultaneously for policy approximation in reinforcement learning. State representations are generated by an encoder based on two novel algorithms: Increasing Dictionary Vector Quantization makes the encoder capable of growing its dictionary size over time, to address new observations; and Direct Residuals Sparse Coding encodes observations by aiming for highest information inclusion. We test our system on a selection of Atari games using tiny neural networks of only 6 to 18 neurons (depending on the game's controls). These are still capable of achieving results comparable---and occasionally superior---to state-of-the-art techniques which use two orders of magnitude more neurons.

Download Full-text

Playing Atari with few neurons

Autonomous Agents and Multi-Agent Systems ◽

10.1007/s10458-021-09497-8 ◽

2021 ◽

Vol 35 (2) ◽

Author(s):

Giuseppe Cuccu ◽

Julian Togelius ◽

Philippe Cudré-Mauroux

Keyword(s):

Neural Network ◽

State Of The Art ◽

Reconstruction Error ◽

Learning Context ◽

Natural Evolution ◽

The Neural Network ◽

New Variant ◽

Compact State ◽

Novel Algorithms ◽

Selection Of

AbstractWe propose a new method for learning compact state representations and policies separately but simultaneously for policy approximation in vision-based applications such as Atari games. Approaches based on deep reinforcement learning typically map pixels directly to actions to enable end-to-end training. Internally, however, the deep neural network bears the responsibility of both extracting useful information and making decisions based on it, two objectives which can be addressed independently. Separating the image processing from the action selection allows for a better understanding of either task individually, as well as potentially finding smaller policy representations which is inherently interesting. Our approach learns state representations using a compact encoder based on two novel algorithms: (i) Increasing Dictionary Vector Quantization builds a dictionary of state representations which grows in size over time, allowing our method to address new observations as they appear in an open-ended online-learning context; and (ii) Direct Residuals Sparse Coding encodes observations in function of the dictionary, aiming for highest information inclusion by disregarding reconstruction error and maximizing code sparsity. As the dictionary size increases, however, the encoder produces increasingly larger inputs for the neural network; this issue is addressed with a new variant of the Exponential Natural Evolution Strategies algorithm which adapts the dimensionality of its probability distribution along the run. We test our system on a selection of Atari games using tiny neural networks of only 6 to 18 neurons (depending on each game’s controls). These are still capable of achieving results that are not much worse, and occasionally superior, to the state-of-the-art in direct policy search which uses two orders of magnitude more neurons.

Download Full-text

ERLP: Ensembles of Reinforcement Learning Policies (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7225 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13905-13906

Author(s):

Rohan Saphal ◽

Balaraman Ravindran ◽

Dheevatsa Mudigere ◽

Sasikanth Avancha ◽

Bharat Kaul

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Multiple Models ◽

Model Parameters ◽

Continuous Control ◽

Sample Complexity ◽

Local Minima ◽

Single Model ◽

Learning Policies ◽

Reinforcement Learning Models

Reinforcement learning algorithms are sensitive to hyper-parameters and require tuning and tweaking for specific environments for improving performance. Ensembles of reinforcement learning models on the other hand are known to be much more robust and stable. However, training multiple models independently on an environment suffers from high sample complexity. We present here a methodology to create multiple models from a single training instance that can be used in an ensemble through directed perturbation of the model parameters at regular intervals. This allows training a single model that converges to several local minima during the optimization process as a result of the perturbation. By saving the model parameters at each such instance, we obtain multiple policies during training that are ensembled during evaluation. We evaluate our approach on challenging discrete and continuous control tasks and also discuss various ensembling strategies. Our framework is substantially sample efficient, computationally inexpensive and is seen to outperform state of the art (SOTA) approaches

Download Full-text

Effectively Diagnosing Malaria by Optimizing the Hyperparameters of CNN using Genetic Algorithm on the Multi core GPU

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f8448.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 2983-2991

Keyword(s):

Genetic Algorithm ◽

Image Classification ◽

Deep Neural Network ◽

State Of The Art ◽

Testing Time ◽

Large Area ◽

Training Time ◽

Time Period ◽

Testing Accuracy ◽

Selection Of

Image classification is an important task in computer vision involving a large area of applications such as object detection, localization and image segmentation. When it comes to image classification, the most adopted methods are based on deep neural network and especially convolutional Neural Networks(CNN). Selection of hyperparameters plays a crucial role in performance of model and it comes by experience. So, in this paper, we will use the genetic algorithm(GA) to automate and build the CNN model for higher accuracy on GPU which is provided by Google Collaboratory cloud. The best architecture of CNN after several generations of the genetic algorithm is then compared to the state-of-the-art CNN. We have used the malaria cell images dataset to find out whether the person is normal or if they are suffering from malaria. We trained two types of malaria cells, which are uninfected and parasitized on Tesla P100 multi core GPU. We got a high training accuracy of 97% and got a testing accuracy of about 95% on the multicore GPU that boosted the speed of execution of training time period and testing time period.

Download Full-text

Rebalancing Expanding EV Sharing Systems with Deep Reinforcement Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/186 ◽

2020 ◽

Author(s):

Man Luo ◽

Wenzhe Zhang ◽

Tianyou Song ◽

Kun Li ◽

Hongming Zhu ◽

...

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Charging Time ◽

Novel Approach ◽

The World ◽

Net Revenue ◽

Multi Agent ◽

User Demand ◽

Policy Optimization ◽

Over Time

Electric Vehicle (EV) sharing systems have recently experienced unprecedented growth across the world. One of the key challenges in their operation is vehicle rebalancing, i.e., repositioning the EVs across stations to better satisfy future user demand. This is particularly challenging in the shared EV context, because i) the range of EVs is limited while charging time is substantial, which constrains the rebalancing options; and ii) as a new mobility trend, most of the current EV sharing systems are still continuously expanding their station networks, i.e., the targets for rebalancing can change over time. To tackle these challenges, in this paper we model the rebalancing task as a Multi-Agent Reinforcement Learning (MARL) problem, which directly takes the range and charging properties of the EVs into account. We propose a novel approach of policy optimization with action cascading, which isolates the non-stationarity locally, and use two connected networks to solve the formulated MARL. We evaluate the proposed approach using a simulator calibrated with 1-year operation data from a real EV sharing system. Results show that our approach significantly outperforms the state-of-the-art, offering up to 14% gain in order satisfied rate and 12% increase in net revenue.

Download Full-text

What and Where the Themes Dominate in Image

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019021 ◽

2019 ◽

Vol 33 ◽

pp. 9021-9029

Author(s):

Xinyu Xiao ◽

Lingfeng Wang ◽

Shiming Xiang ◽

Chunhong Pan

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Spatial Attention ◽

Deep Neural Network ◽

State Of The Art ◽

Learning Method ◽

Salient Object ◽

Image Captioning ◽

Object A ◽

Substantial Progress

The image captioning is to describe an image with natural language as human, which has benefited from the advances in deep neural network and achieved substantial progress in performance. However, the perspective of human description to scene has not been fully considered in this task recently. Actually, the human description to scene is tightly related to the endogenous knowledge and the exogenous salient objects simultaneously, which implies that the content in the description is confined to the known salient objects. Inspired by this observation, this paper proposes a novel framework, which explicitly applies the known salient objects in image captioning. Under this framework, the known salient objects are served as the themes to guide the description generation. According to the property of the known salient object, a theme is composed of two components: its endogenous concept (what) and the exogenous spatial attention feature (where). Specifically, the prediction of each word is dominated by the concept and spatial attention feature of the corresponding theme in the process of caption prediction. Moreover, we introduce a novel learning method of Distinctive Learning (DL) to get more specificity of generated captions like human descriptions. It formulates two constraints in the theme learning process to encourage distinctiveness between different images. Particularly, reinforcement learning is introduced into the framework to address the exposure bias problem between the training and the testing modes. Extensive experiments on the COCO and Flickr30K datasets achieve superior results when compared with the state-of-the-art methods.

Download Full-text

Automatic detection and visualization of garment color in Western portrait paintings

Digital Scholarship in the Humanities ◽

10.1093/llc/fqz055 ◽

2019 ◽

Vol 34 (Supplement_1) ◽

pp. i156-i171

Author(s):

Cihan Sarı ◽

Albert Ali Salah ◽

Alkım Almıla Akdag Salah

Keyword(s):

Deep Neural Network ◽

State Of The Art ◽

Gender Recognition ◽

Large Database ◽

Sex Recognition ◽

Web Based ◽

Style Transfer ◽

Male And Female ◽

Males And Females ◽

Over Time

Abstract Paintings give us important clues about how males and females were perceived over centuries in the Western culture. In this article, we describe a system that allows scholars to automatically visualize how the clothing colors of male and female subjects changed over time. Our system analyzes a large database of paintings, locates portraits, automatically classifies each portrait’s subject as either male or female, segments the clothing areas and finds their dominant color. An interactive, web-based visualization is proposed to allow further exploration of the results. To test the accuracy of our system, we manually annotate a portion of the Rijksmuseum collection, and use state-of-the-art image processing and computer vision algorithms to process the paintings. We use a deep neural network-based style transfer approach to improve gender recognition (or more correctly, sex recognition) of the sitters of portraits. The annotations and the code of the approach are made available.

Download Full-text

Toward Self-Driving Bicycles Using State-of-the-Art Deep Reinforcement Learning Algorithms

Symmetry ◽

10.3390/sym11020290 ◽

2019 ◽

Vol 11 (2) ◽

pp. 290 ◽

Cited By ~ 4

Author(s):

SeungYoon Choi ◽

Tuyen Le ◽

Quang Nguyen ◽

Md Layek ◽

SeungGwan Lee ◽

...

Keyword(s):

Reinforcement Learning ◽

Deep Neural Network ◽

Learning Algorithm ◽

State Of The Art ◽

The Other ◽

Gradient Algorithm ◽

Reward Function ◽

Policy Gradient ◽

Policy Optimization ◽

Start Location

In this paper, we propose a controller for a bicycle using the DDPG (Deep Deterministic Policy Gradient) algorithm, which is a state-of-the-art deep reinforcement learning algorithm. We use a reward function and a deep neural network to build the controller. By using the proposed controller, a bicycle can not only be stably balanced but also travel to any specified location. We confirm that the controller with DDPG shows better performance than the other baselines such as Normalized Advantage Function (NAF) and Proximal Policy Optimization (PPO). For the performance evaluation, we implemented the proposed algorithm in various settings such as fixed and random speed, start location, and destination location.

Download Full-text

All-gather Algorithms Resilient to Imbalanced Process Arrival Patterns

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3460122 ◽

2021 ◽

Vol 18 (4) ◽

pp. 1-22

Author(s):

Jerzy Proficz

Keyword(s):

Experimental Evaluation ◽

Data Exchange ◽

State Of The Art ◽

Monitoring And Evaluation ◽

The Other ◽

Early Data ◽

Cluster Architecture ◽

Novel Algorithms

Two novel algorithms for the all-gather operation resilient to imbalanced process arrival patterns (PATs) are presented. The first one, Background Disseminated Ring (BDR), is based on the regular parallel ring algorithm often supplied in MPI implementations and exploits an auxiliary background thread for early data exchange from faster processes to accelerate the performed all-gather operation. The other algorithm, Background Sorted Linear synchronized tree with Broadcast (BSLB), is built upon the already existing PAP-aware gather algorithm, that is, Background Sorted Linear Synchronized tree (BSLS), followed by a regular broadcast distributing gathered data to all participating processes. The background of the imbalanced PAP subject is described, along with the PAP monitoring and evaluation topics. An experimental evaluation of the algorithms based on a proposed mini-benchmark is presented. The mini-benchmark was performed over 2,000 times in a typical HPC cluster architecture with homogeneous compute nodes. The obtained results are analyzed according to different PATs, data sizes, and process numbers, showing that the proposed optimization works well for various configurations, is scalable, and can significantly reduce the all-gather elapsed times, in our case, up to factor 1.9 or 47% in comparison with the best state-of-the-art solution.

Download Full-text

Synthetic Experiences for Accelerating DQN Performance in Discrete Non-Deterministic Environments

Algorithms ◽

10.3390/a14080226 ◽

2021 ◽

Vol 14 (8) ◽

pp. 226

Author(s):

Wenzel Pilar von Pilchau ◽

Anthony Stein ◽

Jörg Hähner

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Weighted Average ◽

Up States ◽

Experience Replay

State-of-the-art Deep Reinforcement Learning Algorithms such as DQN and DDPG use the concept of a replay buffer called Experience Replay. The default usage contains only the experiences that have been gathered over the runtime. We propose a method called Interpolated Experience Replay that uses stored (real) transitions to create synthetic ones to assist the learner. In this first approach to this field, we limit ourselves to discrete and non-deterministic environments and use a simple equally weighted average of the reward in combination with observed follow-up states. We could demonstrate a significantly improved overall mean average in comparison to a DQN network with vanilla Experience Replay on the discrete and non-deterministic FrozenLake8x8-v0 environment.

Download Full-text

Drone Deep Reinforcement Learning: A Review

Electronics ◽

10.3390/electronics10090999 ◽

2021 ◽

Vol 10 (9) ◽

pp. 999

Author(s):

Ahmad Taher Azar ◽

Anis Koubaa ◽

Nada Ali Mohamed ◽

Habiba A. Ibrahim ◽

Zahra Fathy Ibrahim ◽

...

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Real Life ◽

Environment Monitoring ◽

Simulated Environments ◽

Infrastructure Inspection ◽

Remote Sensing Mapping ◽

And Control ◽

The Military ◽

Uav Navigation

Unmanned Aerial Vehicles (UAVs) are increasingly being used in many challenging and diversified applications. These applications belong to the civilian and the military fields. To name a few; infrastructure inspection, traffic patrolling, remote sensing, mapping, surveillance, rescuing humans and animals, environment monitoring, and Intelligence, Surveillance, Target Acquisition, and Reconnaissance (ISTAR) operations. However, the use of UAVs in these applications needs a substantial level of autonomy. In other words, UAVs should have the ability to accomplish planned missions in unexpected situations without requiring human intervention. To ensure this level of autonomy, many artificial intelligence algorithms were designed. These algorithms targeted the guidance, navigation, and control (GNC) of UAVs. In this paper, we described the state of the art of one subset of these algorithms: the deep reinforcement learning (DRL) techniques. We made a detailed description of them, and we deduced the current limitations in this area. We noted that most of these DRL methods were designed to ensure stable and smooth UAV navigation by training computer-simulated environments. We realized that further research efforts are needed to address the challenges that restrain their deployment in real-life scenarios.

Download Full-text