ReLAccS: A Multi-level Approach to Accelerator Design for Reinforcement Learning on FPGA-based Systems

Author(s):  
Akhil Raj Baranwal ◽  
Salim Ullah ◽  
Siva Satyendra Sahoo ◽  
Akash Kumar
2019 ◽  
Vol 9 (3) ◽  
pp. 502 ◽  
Author(s):  
Cristyan Gil ◽  
Hiram Calvo ◽  
Humberto Sossa

Programming robots for performing different activities requires calculating sequences of values of their joints by taking into account many factors, such as stability and efficiency, at the same time. Particularly for walking, state of the art techniques to approximate these sequences are based on reinforcement learning (RL). In this work we propose a multi-level system, where the same RL method is used first to learn the configuration of robot joints (poses) that allow it to stand with stability, and then in the second level, we find the sequence of poses that let it reach the furthest distance in the shortest time, while avoiding falling down and keeping a straight path. In order to evaluate this, we focus on measuring the time it takes for the robot to travel a certain distance. To our knowledge, this is the first work focusing both on speed and precision of the trajectory at the same time. We implement our model in a simulated environment using q-learning. We compare with the built-in walking modes of an NAO robot by improving normal-speed and enhancing robustness in fast-speed. The proposed model can be extended to other tasks and is independent of a particular robot model.


Algorithms ◽  
2018 ◽  
Vol 11 (9) ◽  
pp. 134 ◽  
Author(s):  
Gabriele Russo Russo ◽  
Matteo Nardelli ◽  
Valeria Cardellini ◽  
Francesco Lo Presti

The capability of efficiently processing the data streams emitted by nowadays ubiquitous sensing devices enables the development of new intelligent services. Data Stream Processing (DSP) applications allow for processing huge volumes of data in near real-time. To keep up with the high volume and velocity of data, these applications can elastically scale their execution on multiple computing resources to process the incoming data flow in parallel. Being that data sources and consumers are usually located at the network edges, nowadays the presence of geo-distributed computing resources represents an attractive environment for DSP. However, controlling the applications and the processing infrastructure in such wide-area environments represents a significant challenge. In this paper, we present a hierarchical solution for the autonomous control of elastic DSP applications and infrastructures. It consists of a two-layered hierarchical solution, where centralized components coordinate subordinated distributed managers, which, in turn, locally control the elastic adaptation of the application components and deployment regions. Exploiting this framework, we design several self-adaptation policies, including reinforcement learning based solutions. We show the benefits of the presented self-adaptation policies with respect to static provisioning solutions, and discuss the strengths of reinforcement learning based approaches, which learn from experience how to optimize the application performance and resource allocation.


2019 ◽  
Vol 10 (3) ◽  
pp. 57
Author(s):  
Kyunghan Min ◽  
Gyubin Sim ◽  
Seongju Ahn ◽  
Inseok Park ◽  
Seungjae Yoo ◽  
...  

A smart regenerative braking system, which is an advanced driver assistance system of electric vehicles, automatically controls the regeneration torque of the electric motor to brake the vehicle by recognizing the deceleration conditions. Thus, this autonomous braking system can provide driver convenience and energy efficiency by suppressing the frequent braking of the driver brake pedaling. In order to apply this assistance system, a deceleration planning algorithm should guarantee the safety deceleration under diverse driving situations. Furthermore, the planning algorithm suppresses a sense of heterogeneity by autonomous braking. To ensuring these requirements for deceleration planning, this study proposes a multi-level deceleration planning algorithm which consists of the two representative planning algorithms and one planning management. Two planning algorithms, which are the driver model-based planning and optimization-based planning, generate the deceleration profiles. Then, the planning management determines the optimal planning result among the deceleration profiles. To obtain an optimal result, planning management is updated based on the reinforcement learning algorithm. The proposed algorithm was learned and validated under a simulation environment using the real vehicle experimental data. As a result, the algorithm determines the optimal deceleration vehicle trajectory to autonomous regenerative braking.


2020 ◽  
Vol 22 (5) ◽  
pp. 1372-1383 ◽  
Author(s):  
Ning Xu ◽  
Hanwang Zhang ◽  
An-An Liu ◽  
Weizhi Nie ◽  
Yuting Su ◽  
...  

Author(s):  
Anan Liu ◽  
Ning Xu ◽  
Hanwang Zhang ◽  
Weizhi Nie ◽  
Yuting Su ◽  
...  

Image captioning is one of the most challenging hallmark of AI, due to its complexity in visual and natural language understanding. As it is essentially a sequential prediction task, recent advances in image captioning use Reinforcement Learning (RL) to better explore the dynamics of word-by-word generation. However, existing RL-based image captioning methods mainly rely on a single policy network and reward function that does not well fit the multi-level (word and sentence) and multi-modal (vision and language) nature of the task. To this end, we propose a novel multi-level policy and reward RL framework for image captioning. It contains two modules: 1) Multi-Level Policy Network that can adaptively fuse the word-level policy and the sentence-level policy for the word generation; and 2) Multi-Level Reward Function that collaboratively leverages both vision-language reward and language-language reward to guide the policy. Further, we propose a guidance term to bridge the policy and the reward for RL optimization. Extensive experiments and analysis on MSCOCO and Flickr30k show that the proposed framework can achieve competing performances with respect to different evaluation metrics.


Author(s):  
WenJi Zhou ◽  
Yang Yu ◽  
Yingfeng Chen ◽  
Kai Guan ◽  
Tangjie Lv ◽  
...  

Experience reuse is key to sample-efficient reinforcement learning. One of the critical issues is how the experience is represented and stored. Previously, the experience can be stored in the forms of features, individual models, and the average model, each lying at a different granularity. However, new tasks may require experience across multiple granularities. In this paper, we propose the policy residual representation (PRR) network, which can extract and store multiple levels of experience. PRR network is trained on a set of tasks with a multi-level architecture, where a module in each level corresponds to a subset of the tasks. Therefore, the PRR network represents the experience in a spectrum-like way. When training on a new task, PRR can provide different levels of experience for accelerating the learning. We experiment with the PRR network on a set of grid world navigation tasks, locomotion tasks, and fighting tasks in a video game. The results show that the PRR network leads to better reuse of experience and thus outperforms some state-of-the-art approaches.


Sign in / Sign up

Export Citation Format

Share Document