Reinforcement Learning using Convolutional Neural Network for Game Prediction

The paper presents a Deep learning model for playing computer games with elevated level information utilizing Reinforcement learning learning. The games are activity restricted (like snakes, catcher, air-bandit and so on.). The implementation is progressive in three parts. The first part deals with a simple neural network, the second one with Deep Q network and further to increase the accuracy and speed of the algorithm, the third part consists of a model consisting of convolution neural network for image processing and giving outputs from the fully connected layers so as to estimate the probability of an action being taken based on information extracted from inputs where we apply Q-learning to determine the best possible move. The results are further analysed and compared to provide an overview of the improvements in each methods.

Download Full-text

The Use of Reinforcement Learning in Gaming The Breakout Game Case Study.pdf

10.36227/techrxiv.12061728.v1 ◽

2020 ◽

Author(s):

Ao Chen ◽

Taresh Dewan ◽

Manva Trivedi ◽

Danning Jiang ◽

Aloukik Aditya ◽

...

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Comparative Analysis ◽

Policy Learning ◽

Q Value ◽

Complex Environment ◽

Q Learning ◽

Hit Rate ◽

Optimal Action ◽

Good For

This paper provides a comparative analysis between Deep Q Network (DQN) and Double Deep Q Network (DDQN) algorithms based on their hit rate, out of which DDQN proved to be better for Breakout game. DQN is chosen over Basic Q learning because it understands policy learning using its neural network which is good for complex environment and DDQN is chosen as it solves overestimation problem (agent always choses non-optimal action for any state just because it has maximum Q-value) occurring in basic Q-learning.

Download Full-text

The Use of Reinforcement Learning in Gaming The Breakout Game Case Study.pdf

10.36227/techrxiv.12061728 ◽

2020 ◽

Author(s):

Ao Chen ◽

Taresh Dewan ◽

Manva Trivedi ◽

Danning Jiang ◽

Aloukik Aditya ◽

...

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Comparative Analysis ◽

Policy Learning ◽

Q Value ◽

Complex Environment ◽

Q Learning ◽

Hit Rate ◽

Optimal Action ◽

Good For

Download Full-text

Introspective analysis of convolutional neural networks for improving discrimination performance and feature visualisation

PeerJ Computer Science ◽

10.7717/peerj-cs.497 ◽

2021 ◽

Vol 7 ◽

pp. e497

Author(s):

Shakeel Shafiq ◽

Tayyaba Azim

Keyword(s):

Neural Network ◽

Neural Networks ◽

Convolutional Neural Network ◽

Convolutional Neural Networks ◽

Discrimination Performance ◽

Input Image ◽

Data Sets ◽

Discrimination Power ◽

Level Information ◽

Fully Connected

Deep neural networks have been widely explored and utilised as a useful tool for feature extraction in computer vision and machine learning. It is often observed that the last fully connected (FC) layers of convolutional neural network possess higher discrimination power as compared to the convolutional and maxpooling layers whose goal is to preserve local and low-level information of the input image and down sample it to avoid overfitting. Inspired from the functionality of local binary pattern (LBP) operator, this paper proposes to induce discrimination into the mid layers of convolutional neural network by introducing a discriminatively boosted alternative to pooling (DBAP) layer that has shown to serve as a favourable replacement of early maxpooling layer in a convolutional neural network (CNN). A thorough research of the related works show that the proposed change in the neural architecture is novel and has not been proposed before to bring enhanced discrimination and feature visualisation power achieved from the mid layer features. The empirical results reveal that the introduction of DBAP layer in popular neural architectures such as AlexNet and LeNet produces competitive classification results in comparison to their baseline models as well as other ultra-deep models on several benchmark data sets. In addition, better visualisation of intermediate features can allow one to seek understanding and interpretation of black box behaviour of convolutional neural networks, used widely by the research community.

Download Full-text

A modeling framework for adaptive lifelong learning with transfer and savings through gating in the prefrontal cortex

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2009591117 ◽

2020 ◽

Vol 117 (47) ◽

pp. 29872-29882

Author(s):

Ben Tsuda ◽

Kay M. Tye ◽

Hava T. Siegelmann ◽

Terrence J. Sejnowski

Keyword(s):

Neural Network ◽

Neural Networks ◽

Artificial Neural Networks ◽

Prefrontal Cortex ◽

Reinforcement Learning ◽

Modeling Framework ◽

Simple Neural Network ◽

Neuropsychological Impairments ◽

And Storage ◽

The Brain

The prefrontal cortex encodes and stores numerous, often disparate, schemas and flexibly switches between them. Recent research on artificial neural networks trained by reinforcement learning has made it possible to model fundamental processes underlying schema encoding and storage. Yet how the brain is able to create new schemas while preserving and utilizing old schemas remains unclear. Here we propose a simple neural network framework that incorporates hierarchical gating to model the prefrontal cortex’s ability to flexibly encode and use multiple disparate schemas. We show how gating naturally leads to transfer learning and robust memory savings. We then show how neuropsychological impairments observed in patients with prefrontal damage are mimicked by lesions of our network. Our architecture, which we call DynaMoE, provides a fundamental framework for how the prefrontal cortex may handle the abundance of schemas necessary to navigate the real world.

Download Full-text

Hybrid Online and Offline Reinforcement Learning for Tibetan Jiu Chess

Complexity ◽

10.1155/2020/4708075 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Xiali Li ◽

Zhengyu Lv ◽

Licheng Wu ◽

Yue Zhao ◽

Xiaona Xu

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Domain Knowledge ◽

Deep Neural Network ◽

Learning Strategy ◽

Learning Algorithms ◽

Hybrid State ◽

Q Learning ◽

State Action ◽

Upper Confidence Bound

In this study, hybrid state-action-reward-state-action (SARSAλ) and Q-learning algorithms are applied to different stages of an upper confidence bound applied to tree search for Tibetan Jiu chess. Q-learning is also used to update all the nodes on the search path when each game ends. A learning strategy that uses SARSAλ and Q-learning algorithms combining domain knowledge for a feedback function for layout and battle stages is proposed. An improved deep neural network based on ResNet18 is used for self-play training. Experimental results show that hybrid online and offline reinforcement learning with a deep neural network can improve the game program’s learning efficiency and understanding ability for Tibetan Jiu chess.

Download Full-text

Reinforcement Learning for Hyperparameter Tuning in Deep Learning-based Side-channel Analysis

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2021.i3.677-707 ◽

2021 ◽

pp. 677-707

Author(s):

Jorai Rijsdijk ◽

Lichao Wu ◽

Guilherme Perin ◽

Stjepan Picek

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Reinforcement Learning ◽

Convolutional Neural Networks ◽

Random Search ◽

High Price ◽

Side Channel ◽

Q Learning ◽

Reward Functions

Deep learning represents a powerful set of techniques for profiling sidechannel analysis. The results in the last few years show that neural network architectures like multilayer perceptron and convolutional neural networks give strong attack performance where it is possible to break targets protected with various countermeasures. Considering that deep learning techniques commonly have a plethora of hyperparameters to tune, it is clear that such top attack results can come with a high price in preparing the attack. This is especially problematic as the side-channel community commonly uses random search or grid search techniques to look for the best hyperparameters.In this paper, we propose to use reinforcement learning to tune the convolutional neural network hyperparameters. In our framework, we investigate the Q-Learning paradigm and develop two reward functions that use side-channel metrics. We mount an investigation on three commonly used datasets and two leakage models where the results show that reinforcement learning can find convolutional neural networks exhibiting top performance while having small numbers of trainable parameters. We note that our approach is automated and can be easily adapted to different datasets. Several of our newly developed architectures outperform the current state-of-the-art results. Finally, we make our source code publicly available. https://github.com/AISyLab/Reinforcement-Learning-for-SCA

Download Full-text

BrainShield: A Hybrid Machine Learning-Based Malware Detection Model for Android Devices

Electronics ◽

10.3390/electronics10232948 ◽

2021 ◽

Vol 10 (23) ◽

pp. 2948

Author(s):

Corentin Rodrigo ◽

Samuel Pierre ◽

Ronald Beaubrun ◽

Franjieh El Khoury

Keyword(s):

Neural Network ◽

Malware Detection ◽

Detection Methods ◽

Dynamic Features ◽

Android Malware ◽

Detection Model ◽

The Third ◽

Android Malware Detection ◽

Fully Connected ◽

Server Architecture

Android has become the leading operating system for mobile devices, and the most targeted one by malware. Therefore, many analysis methods have been proposed for detecting Android malware. However, few of them use proper datasets for evaluation. In this paper, we propose BrainShield, a hybrid malware detection model trained on the Omnidroid dataset to reduce attacks on Android devices. The latter is the most diversified dataset in terms of the number of different features, and contains the largest number of samples, 22,000 samples, for model evaluation in the Android malware detection field. BrainShield’s implementation is based on a client/server architecture and consists of three fully connected neural networks: (1) the first is used for static analysis and reaches an accuracy of 92.9% trained on 840 static features; (2) the second is a dynamic neural network that reaches an accuracy of 81.1% trained on 3722 dynamic features; and (3) the third neural network proposed is hybrid, reaching an accuracy of 91.1% trained on 7081 static and dynamic features. Simulation results show that BrainShield is able to improve the accuracy and the precision of well-known malware detection methods.

Download Full-text

Xatu: Richer Neural Network Based Prediction for Video Streaming

Proceedings of the ACM on Measurement and Analysis of Computing Systems ◽

10.1145/3491056 ◽

2021 ◽

Vol 5 (3) ◽

pp. 1-26

Author(s):

Yun Seong Nam ◽

Jianfei Gao ◽

Chandan Bothra ◽

Ehab Ghabashneh ◽

Sanjay Rao ◽

...

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Video Streaming ◽

Real World ◽

State Of The Art ◽

Clustering Method ◽

Download Time ◽

Abr Algorithm ◽

Fully Connected ◽

Prediction Approach

The performance of Adaptive Bitrate (ABR) algorithms for video streaming depends on accurately predicting the download time of video chunks. Existing prediction approaches (i) assume chunk download times are dominated by network throughput; and (ii) apriori cluster sessions (e.g., based on ISP and CDN) and only learn from sessions in the same cluster. We make three contributions. First, through analysis of data from real-world video streaming sessions, we show (i) apriori clustering prevents learning from related clusters; and (ii) factors such as the Time to First Byte (TTFB) are key components of chunk download times but not easily incorporated into existing prediction approaches. Second, we propose Xatu, a new prediction approach that jointly learns a neural network sequence model with an interpretable automatic session clustering method. Xatu learns clustering rules across all sessions it deems relevant, and models sequences with multiple chunk-dependent features (e.g., TTFB) rather than just throughput. Third, evaluations using the above datasets and emulation experiments show that Xatu significantly improves prediction accuracies by 23.8% relative to CS2P (a state-of-the-art predictor). We show Xatu provides substantial performance benefits when integrated with multiple ABR algorithms including MPC (a well studied ABR algorithm), and FuguABR (a recent algorithm using stochastic control) relative to their default predictors (CS2P and a fully connected neural network respectively). Further, Xatu combined with MPC outperforms Pensieve, an ABR based on deep reinforcement learning.

Download Full-text

A modeling framework for adaptive lifelong learning with transfer and savings through gating in the prefrontal cortex

10.1101/2020.03.11.984757 ◽

2020 ◽

Author(s):

Ben Tsuda ◽

Kay M. Tye ◽

Hava T. Siegelmann ◽

Terrence J. Sejnowski

Keyword(s):

Neural Network ◽

Neural Networks ◽

Artificial Neural Networks ◽

Prefrontal Cortex ◽

Reinforcement Learning ◽

Mixture Of Experts ◽

Modeling Framework ◽

Simple Neural Network ◽

And Storage ◽

The Brain

AbstractThe prefrontal cortex encodes and stores numerous, often disparate, schemas and flexibly switches between them. Recent research on artificial neural networks trained by reinforcement learning has made it possible to model fundamental processes underlying schema encoding and storage. Yet how the brain is able to create new schemas while preserving and utilizing old schemas remains unclear. Here we propose a simple neural network framework based on a modification of the mixture of experts architecture to model the prefrontal cortex’s ability to flexibly encode and use multiple disparate schemas. We show how incorporation of gating naturally leads to transfer learning and robust memory savings. We then show how phenotypic impairments observed in patients with prefrontal damage are mimicked by lesions of our network. Our architecture, which we call DynaMoE, provides a fundamental framework for how the prefrontal cortex may handle the abundance of schemas necessary to navigate the real world.

Download Full-text

Learning the Car-following Behavior of Drivers Using Maximum Entropy Deep Inverse Reinforcement Learning

Journal of Advanced Transportation ◽

10.1155/2020/4752651 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Yang Zhou ◽

Rui Fu ◽

Chang Wang

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Maximum Entropy ◽

Network Model ◽

Neural Network Model ◽

Percentage Error ◽

Inverse Reinforcement Learning ◽

Car Following ◽

Proposed Model ◽

Fully Connected

The present study proposes a framework for learning the car-following behavior of drivers based on maximum entropy deep inverse reinforcement learning. The proposed framework enables learning the reward function, which is represented by a fully connected neural network, from driving data, including the speed of the driver’s vehicle, the distance to the leading vehicle, and the relative speed. Data from two field tests with 42 drivers are used. After clustering the participants into aggressive and conservative groups, the car-following data were used to train the proposed model, a fully connected neural network model, and a recurrent neural network model. Adopting the fivefold cross-validation method, the proposed model was proved to have the lowest root mean squared percentage error and modified Hausdorff distance among the different models, exhibiting superior ability for reproducing drivers’ car-following behaviors. Moreover, the proposed model captured the characteristics of different driving styles during car-following scenarios. The learned rewards and strategies were consistent with the demonstrations of the two groups. Inverse reinforcement learning can serve as a new tool to explain and model driving behavior, providing references for the development of human-like autonomous driving models.

Download Full-text