scholarly journals Reinforcement Learning using Convolutional Neural Network for Game Prediction

The paper presents a Deep learning model for playing computer games with elevated level information utilizing Reinforcement learning learning. The games are activity restricted (like snakes, catcher, air-bandit and so on.). The implementation is progressive in three parts. The first part deals with a simple neural network, the second one with Deep Q network and further to increase the accuracy and speed of the algorithm, the third part consists of a model consisting of convolution neural network for image processing and giving outputs from the fully connected layers so as to estimate the probability of an action being taken based on information extracted from inputs where we apply Q-learning to determine the best possible move. The results are further analysed and compared to provide an overview of the improvements in each methods.

2020 ◽  
Author(s):  
Ao Chen ◽  
Taresh Dewan ◽  
Manva Trivedi ◽  
Danning Jiang ◽  
Aloukik Aditya ◽  
...  

This paper provides a comparative analysis between Deep Q Network (DQN) and Double Deep Q Network (DDQN) algorithms based on their hit rate, out of which DDQN proved to be better for Breakout game. DQN is chosen over Basic Q learning because it understands policy learning using its neural network which is good for complex environment and DDQN is chosen as it solves overestimation problem (agent always choses non-optimal action for any state just because it has maximum Q-value) occurring in basic Q-learning.


2020 ◽  
Author(s):  
Ao Chen ◽  
Taresh Dewan ◽  
Manva Trivedi ◽  
Danning Jiang ◽  
Aloukik Aditya ◽  
...  

This paper provides a comparative analysis between Deep Q Network (DQN) and Double Deep Q Network (DDQN) algorithms based on their hit rate, out of which DDQN proved to be better for Breakout game. DQN is chosen over Basic Q learning because it understands policy learning using its neural network which is good for complex environment and DDQN is chosen as it solves overestimation problem (agent always choses non-optimal action for any state just because it has maximum Q-value) occurring in basic Q-learning.


2021 ◽  
Vol 7 ◽  
pp. e497
Author(s):  
Shakeel Shafiq ◽  
Tayyaba Azim

Deep neural networks have been widely explored and utilised as a useful tool for feature extraction in computer vision and machine learning. It is often observed that the last fully connected (FC) layers of convolutional neural network possess higher discrimination power as compared to the convolutional and maxpooling layers whose goal is to preserve local and low-level information of the input image and down sample it to avoid overfitting. Inspired from the functionality of local binary pattern (LBP) operator, this paper proposes to induce discrimination into the mid layers of convolutional neural network by introducing a discriminatively boosted alternative to pooling (DBAP) layer that has shown to serve as a favourable replacement of early maxpooling layer in a convolutional neural network (CNN). A thorough research of the related works show that the proposed change in the neural architecture is novel and has not been proposed before to bring enhanced discrimination and feature visualisation power achieved from the mid layer features. The empirical results reveal that the introduction of DBAP layer in popular neural architectures such as AlexNet and LeNet produces competitive classification results in comparison to their baseline models as well as other ultra-deep models on several benchmark data sets. In addition, better visualisation of intermediate features can allow one to seek understanding and interpretation of black box behaviour of convolutional neural networks, used widely by the research community.


2020 ◽  
Vol 117 (47) ◽  
pp. 29872-29882
Author(s):  
Ben Tsuda ◽  
Kay M. Tye ◽  
Hava T. Siegelmann ◽  
Terrence J. Sejnowski

The prefrontal cortex encodes and stores numerous, often disparate, schemas and flexibly switches between them. Recent research on artificial neural networks trained by reinforcement learning has made it possible to model fundamental processes underlying schema encoding and storage. Yet how the brain is able to create new schemas while preserving and utilizing old schemas remains unclear. Here we propose a simple neural network framework that incorporates hierarchical gating to model the prefrontal cortex’s ability to flexibly encode and use multiple disparate schemas. We show how gating naturally leads to transfer learning and robust memory savings. We then show how neuropsychological impairments observed in patients with prefrontal damage are mimicked by lesions of our network. Our architecture, which we call DynaMoE, provides a fundamental framework for how the prefrontal cortex may handle the abundance of schemas necessary to navigate the real world.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Xiali Li ◽  
Zhengyu Lv ◽  
Licheng Wu ◽  
Yue Zhao ◽  
Xiaona Xu

In this study, hybrid state-action-reward-state-action (SARSAλ) and Q-learning algorithms are applied to different stages of an upper confidence bound applied to tree search for Tibetan Jiu chess. Q-learning is also used to update all the nodes on the search path when each game ends. A learning strategy that uses SARSAλ and Q-learning algorithms combining domain knowledge for a feedback function for layout and battle stages is proposed. An improved deep neural network based on ResNet18 is used for self-play training. Experimental results show that hybrid online and offline reinforcement learning with a deep neural network can improve the game program’s learning efficiency and understanding ability for Tibetan Jiu chess.


Author(s):  
Jorai Rijsdijk ◽  
Lichao Wu ◽  
Guilherme Perin ◽  
Stjepan Picek

Deep learning represents a powerful set of techniques for profiling sidechannel analysis. The results in the last few years show that neural network architectures like multilayer perceptron and convolutional neural networks give strong attack performance where it is possible to break targets protected with various countermeasures. Considering that deep learning techniques commonly have a plethora of hyperparameters to tune, it is clear that such top attack results can come with a high price in preparing the attack. This is especially problematic as the side-channel community commonly uses random search or grid search techniques to look for the best hyperparameters.In this paper, we propose to use reinforcement learning to tune the convolutional neural network hyperparameters. In our framework, we investigate the Q-Learning paradigm and develop two reward functions that use side-channel metrics. We mount an investigation on three commonly used datasets and two leakage models where the results show that reinforcement learning can find convolutional neural networks exhibiting top performance while having small numbers of trainable parameters. We note that our approach is automated and can be easily adapted to different datasets. Several of our newly developed architectures outperform the current state-of-the-art results. Finally, we make our source code publicly available. https://github.com/AISyLab/Reinforcement-Learning-for-SCA


Electronics ◽  
2021 ◽  
Vol 10 (23) ◽  
pp. 2948
Author(s):  
Corentin Rodrigo ◽  
Samuel Pierre ◽  
Ronald Beaubrun ◽  
Franjieh El Khoury

Android has become the leading operating system for mobile devices, and the most targeted one by malware. Therefore, many analysis methods have been proposed for detecting Android malware. However, few of them use proper datasets for evaluation. In this paper, we propose BrainShield, a hybrid malware detection model trained on the Omnidroid dataset to reduce attacks on Android devices. The latter is the most diversified dataset in terms of the number of different features, and contains the largest number of samples, 22,000 samples, for model evaluation in the Android malware detection field. BrainShield’s implementation is based on a client/server architecture and consists of three fully connected neural networks: (1) the first is used for static analysis and reaches an accuracy of 92.9% trained on 840 static features; (2) the second is a dynamic neural network that reaches an accuracy of 81.1% trained on 3722 dynamic features; and (3) the third neural network proposed is hybrid, reaching an accuracy of 91.1% trained on 7081 static and dynamic features. Simulation results show that BrainShield is able to improve the accuracy and the precision of well-known malware detection methods.


Author(s):  
Yun Seong Nam ◽  
Jianfei Gao ◽  
Chandan Bothra ◽  
Ehab Ghabashneh ◽  
Sanjay Rao ◽  
...  

The performance of Adaptive Bitrate (ABR) algorithms for video streaming depends on accurately predicting the download time of video chunks. Existing prediction approaches (i) assume chunk download times are dominated by network throughput; and (ii) apriori cluster sessions (e.g., based on ISP and CDN) and only learn from sessions in the same cluster. We make three contributions. First, through analysis of data from real-world video streaming sessions, we show (i) apriori clustering prevents learning from related clusters; and (ii) factors such as the Time to First Byte (TTFB) are key components of chunk download times but not easily incorporated into existing prediction approaches. Second, we propose Xatu, a new prediction approach that jointly learns a neural network sequence model with an interpretable automatic session clustering method. Xatu learns clustering rules across all sessions it deems relevant, and models sequences with multiple chunk-dependent features (e.g., TTFB) rather than just throughput. Third, evaluations using the above datasets and emulation experiments show that Xatu significantly improves prediction accuracies by 23.8% relative to CS2P (a state-of-the-art predictor). We show Xatu provides substantial performance benefits when integrated with multiple ABR algorithms including MPC (a well studied ABR algorithm), and FuguABR (a recent algorithm using stochastic control) relative to their default predictors (CS2P and a fully connected neural network respectively). Further, Xatu combined with MPC outperforms Pensieve, an ABR based on deep reinforcement learning.


2020 ◽  
Author(s):  
Ben Tsuda ◽  
Kay M. Tye ◽  
Hava T. Siegelmann ◽  
Terrence J. Sejnowski

AbstractThe prefrontal cortex encodes and stores numerous, often disparate, schemas and flexibly switches between them. Recent research on artificial neural networks trained by reinforcement learning has made it possible to model fundamental processes underlying schema encoding and storage. Yet how the brain is able to create new schemas while preserving and utilizing old schemas remains unclear. Here we propose a simple neural network framework based on a modification of the mixture of experts architecture to model the prefrontal cortex’s ability to flexibly encode and use multiple disparate schemas. We show how incorporation of gating naturally leads to transfer learning and robust memory savings. We then show how phenotypic impairments observed in patients with prefrontal damage are mimicked by lesions of our network. Our architecture, which we call DynaMoE, provides a fundamental framework for how the prefrontal cortex may handle the abundance of schemas necessary to navigate the real world.


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Yang Zhou ◽  
Rui Fu ◽  
Chang Wang

The present study proposes a framework for learning the car-following behavior of drivers based on maximum entropy deep inverse reinforcement learning. The proposed framework enables learning the reward function, which is represented by a fully connected neural network, from driving data, including the speed of the driver’s vehicle, the distance to the leading vehicle, and the relative speed. Data from two field tests with 42 drivers are used. After clustering the participants into aggressive and conservative groups, the car-following data were used to train the proposed model, a fully connected neural network model, and a recurrent neural network model. Adopting the fivefold cross-validation method, the proposed model was proved to have the lowest root mean squared percentage error and modified Hausdorff distance among the different models, exhibiting superior ability for reproducing drivers’ car-following behaviors. Moreover, the proposed model captured the characteristics of different driving styles during car-following scenarios. The learned rewards and strategies were consistent with the demonstrations of the two groups. Inverse reinforcement learning can serve as a new tool to explain and model driving behavior, providing references for the development of human-like autonomous driving models.


Sign in / Sign up

Export Citation Format

Share Document