ReLAccS: A Multi-level Approach to Accelerator Design for Reinforcement Learning on FPGA-based Systems

Programming robots for performing different activities requires calculating sequences of values of their joints by taking into account many factors, such as stability and efficiency, at the same time. Particularly for walking, state of the art techniques to approximate these sequences are based on reinforcement learning (RL). In this work we propose a multi-level system, where the same RL method is used first to learn the configuration of robot joints (poses) that allow it to stand with stability, and then in the second level, we find the sequence of poses that let it reach the furthest distance in the shortest time, while avoiding falling down and keeping a straight path. In order to evaluate this, we focus on measuring the time it takes for the robot to travel a certain distance. To our knowledge, this is the first work focusing both on speed and precision of the trajectory at the same time. We implement our model in a simulated environment using q-learning. We compare with the built-in walking modes of an NAO robot by improving normal-speed and enhancing robustness in fast-speed. The proposed model can be extended to other tasks and is independent of a particular robot model.

Download Full-text

Multi-Level Elasticity for Wide-Area Data Streaming Systems: A Reinforcement Learning Approach

Algorithms ◽

10.3390/a11090134 ◽

2018 ◽

Vol 11 (9) ◽

pp. 134 ◽

Cited By ~ 6

Author(s):

Gabriele Russo Russo ◽

Matteo Nardelli ◽

Valeria Cardellini ◽

Francesco Lo Presti

Keyword(s):

Reinforcement Learning ◽

High Volume ◽

Wide Area ◽

Data Streaming ◽

Application Performance ◽

Ubiquitous Sensing ◽

Self Adaptation ◽

Adaptation Policies ◽

Multi Level ◽

Dsp Applications

The capability of efficiently processing the data streams emitted by nowadays ubiquitous sensing devices enables the development of new intelligent services. Data Stream Processing (DSP) applications allow for processing huge volumes of data in near real-time. To keep up with the high volume and velocity of data, these applications can elastically scale their execution on multiple computing resources to process the incoming data flow in parallel. Being that data sources and consumers are usually located at the network edges, nowadays the presence of geo-distributed computing resources represents an attractive environment for DSP. However, controlling the applications and the processing infrastructure in such wide-area environments represents a significant challenge. In this paper, we present a hierarchical solution for the autonomous control of elastic DSP applications and infrastructures. It consists of a two-layered hierarchical solution, where centralized components coordinate subordinated distributed managers, which, in turn, locally control the elastic adaptation of the application components and deployment regions. Exploiting this framework, we design several self-adaptation policies, including reinforcement learning based solutions. We show the benefits of the presented self-adaptation policies with respect to static provisioning solutions, and discuss the strengths of reinforcement learning based approaches, which learn from experience how to optimize the application performance and resource allocation.

Download Full-text

A coordinated multi-agent reinforcement learning approach to multi-level cache co-partitioning

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 ◽

10.23919/date.2017.7927098 ◽

2017 ◽

Cited By ~ 1

Author(s):

Rahul Jain ◽

Preeti Ranjan Panda ◽

Sreenivas Subramoney

Keyword(s):

Reinforcement Learning ◽

Learning Approach ◽

Multi Agent ◽

Multi Level

Download Full-text

Multi-Level Deceleration Planning Based on Reinforcement Learning Algorithm for Autonomous Regenerative Braking of EV

World Electric Vehicle Journal ◽

10.3390/wevj10030057 ◽

2019 ◽

Vol 10 (3) ◽

pp. 57

Author(s):

Kyunghan Min ◽

Gyubin Sim ◽

Seongju Ahn ◽

Inseok Park ◽

Seungjae Yoo ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Assistance System ◽

Driver Model ◽

Regenerative Braking ◽

Braking System ◽

Planning Algorithm ◽

Multi Level ◽

Planning Algorithms ◽

Reinforcement Learning Algorithm

A smart regenerative braking system, which is an advanced driver assistance system of electric vehicles, automatically controls the regeneration torque of the electric motor to brake the vehicle by recognizing the deceleration conditions. Thus, this autonomous braking system can provide driver convenience and energy efficiency by suppressing the frequent braking of the driver brake pedaling. In order to apply this assistance system, a deceleration planning algorithm should guarantee the safety deceleration under diverse driving situations. Furthermore, the planning algorithm suppresses a sense of heterogeneity by autonomous braking. To ensuring these requirements for deceleration planning, this study proposes a multi-level deceleration planning algorithm which consists of the two representative planning algorithms and one planning management. Two planning algorithms, which are the driver model-based planning and optimization-based planning, generate the deceleration profiles. Then, the planning management determines the optimal planning result among the deceleration profiles. To obtain an optimal result, planning management is updated based on the reinforcement learning algorithm. The proposed algorithm was learned and validated under a simulation environment using the real vehicle experimental data. As a result, the algorithm determines the optimal deceleration vehicle trajectory to autonomous regenerative braking.

Download Full-text

Deep multi-agent reinforcement learning for multi-level preventive maintenance in manufacturing systems

Expert Systems with Applications ◽

10.1016/j.eswa.2021.116323 ◽

2021 ◽

pp. 116323

Author(s):

Jianyu Su ◽

Jing Huang ◽

Stephen Adams ◽

Qing Chang ◽

Peter A. Beling

Keyword(s):

Reinforcement Learning ◽

Preventive Maintenance ◽

Manufacturing Systems ◽

Multi Agent ◽

Multi Level

Download Full-text

Multi-Level Policy and Reward-Based Deep Reinforcement Learning Framework for Image Captioning

IEEE Transactions on Multimedia ◽

10.1109/tmm.2019.2941820 ◽

2020 ◽

Vol 22 (5) ◽

pp. 1372-1383 ◽

Cited By ~ 10

Author(s):

Ning Xu ◽

Hanwang Zhang ◽

An-An Liu ◽

Weizhi Nie ◽

Yuting Su ◽

...

Keyword(s):

Reinforcement Learning ◽

Image Captioning ◽

Learning Framework ◽

Multi Level

Download Full-text

Multi-Level Policy and Reward Reinforcement Learning for Image Captioning

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/114 ◽

2018 ◽

Cited By ~ 7

Author(s):

Anan Liu ◽

Ning Xu ◽

Hanwang Zhang ◽

Weizhi Nie ◽

Yuting Su ◽

...

Keyword(s):

Reinforcement Learning ◽

Policy Network ◽

Image Captioning ◽

Language Understanding ◽

Word Generation ◽

Reward Function ◽

Word Level ◽

Sentence Level ◽

Sequential Prediction ◽

Multi Level

Image captioning is one of the most challenging hallmark of AI, due to its complexity in visual and natural language understanding. As it is essentially a sequential prediction task, recent advances in image captioning use Reinforcement Learning (RL) to better explore the dynamics of word-by-word generation. However, existing RL-based image captioning methods mainly rely on a single policy network and reward function that does not well fit the multi-level (word and sentence) and multi-modal (vision and language) nature of the task. To this end, we propose a novel multi-level policy and reward RL framework for image captioning. It contains two modules: 1) Multi-Level Policy Network that can adaptively fuse the word-level policy and the sentence-level policy for the word generation; and 2) Multi-Level Reward Function that collaboratively leverages both vision-language reward and language-language reward to guide the policy. Further, we propose a guidance term to bridge the policy and the reward for RL optimization. Extensive experiments and analysis on MSCOCO and Flickr30k show that the proposed framework can achieve competing performances with respect to different evaluation metrics.

Download Full-text

Software Service Selection by Multi-level Matching and Reinforcement Learning

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Bio-Inspired Models of Network, Information, and Computing Systems ◽

10.1007/978-3-642-32615-8_31 ◽

2012 ◽

pp. 310-324

Author(s):

Rajeev R. Raje ◽

Snehasis Mukhopadhyay ◽

Sucheta Phatak ◽

Rashmi Shastri ◽

Lahiru S. Gallege

Keyword(s):

Reinforcement Learning ◽

Service Selection ◽

Software Service ◽

Multi Level

Download Full-text

Reinforcement Learning Experience Reuse with Policy Residual Representation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/618 ◽

2019 ◽

Author(s):

WenJi Zhou ◽

Yang Yu ◽

Yingfeng Chen ◽

Kai Guan ◽

Tangjie Lv ◽

...

Keyword(s):

Reinforcement Learning ◽

Video Game ◽

State Of The Art ◽

Learning Experience ◽

Critical Issues ◽

Experience Reuse ◽

Multi Level ◽

Multiple Levels ◽

Multiple Granularities ◽

Different Levels

Experience reuse is key to sample-efficient reinforcement learning. One of the critical issues is how the experience is represented and stored. Previously, the experience can be stored in the forms of features, individual models, and the average model, each lying at a different granularity. However, new tasks may require experience across multiple granularities. In this paper, we propose the policy residual representation (PRR) network, which can extract and store multiple levels of experience. PRR network is trained on a set of tasks with a multi-level architecture, where a module in each level corresponds to a subset of the tasks. Therefore, the PRR network represents the experience in a spectrum-like way. When training on a new task, PRR can provide different levels of experience for accelerating the learning. We experiment with the PRR network on a set of grid world navigation tasks, locomotion tasks, and fighting tasks in a video game. The results show that the PRR network leads to better reuse of experience and thus outperforms some state-of-the-art approaches.

Download Full-text