Medium and Long-Term Stochastic Optimization of Hybrid Pumped Storage Reservoir via Reinforcement Learning Method

<p>Hydropower is a renewable source of energy that relies on efficient water planning and management. As the behavior of this natural resource is difficult to predict, water managers therefore use methods to help the decision-making process. Reinforcement Learning (RL) has been shown to be a potentially effective approach to overcome the limitations of the Stochastic Dynamic Programming (SDP) method that is commonly used for water management. However, convergence to a robust and efficient operating policy from RL methods requires large amounts of data, while long-term historical data is not always available. The objective of this study consists in using tools to generate long-term hydrological series to obtain an efficient parameterization of the management policy. This presentation introduces a comparison of calibration datasets used in a RL method for the optimal control of a hydropower system. This method aims to find a feedback policy that maximizes the production of a hydropower system over a mid-term horizon. Three streamflow datasets are compared on a real hydropower system for RL calibration: 1) the historical streamflow (35 years), 2) streamflow simulated by a hydrological model driven by a high-resolution large-ensemble climate model data (3500 years) from the ClimEx project, and 3) streamflow simulated by a hydrological model driven by climate data generated with a stochastic weather generator (5000 years). The GR4J hydrological model is employed for the hydrologic modelling aspect of the work. The reinforcement learning method is applied on the Lac-Saint-Jean water resources system in Quebec (Canada), where the hydrological regime is snowmelt-dominated. A bootstrapping method where multiple calibration and validation sets were resampled is used to conduct a robust statistical analysis for comparing the methods&#8217; performance. The performance of the calibrated management policy is evaluated with respect to the operational constraints of the system as well as the overall energy production. Preliminary results show that is possible to achieve effective management policies by using tools to generate long-term hydrological series to feed a RL method.</p>

Download Full-text

Model dependent reinforcement learning algorithm for reservoir operation stochastic optimization

International Journal of Hydrology ◽

10.15406/ijh.2018.02.00129 ◽

2018 ◽

Vol 2 (5) ◽

Author(s):

Li Wenwu

Keyword(s):

Reinforcement Learning ◽

Stochastic Optimization ◽

Reservoir Operation ◽

Learning Algorithm ◽

Reinforcement Learning Algorithm

Download Full-text

A Plant Control Technology Using Reinforcement Learning Method with Automatic Reward Adjustment

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.129.1253 ◽

2009 ◽

Vol 129 (7) ◽

pp. 1253-1263

Author(s):

Toru Eguchi ◽

Takaaki Sekiai ◽

Akihiro Yamada ◽

Satoru Shimizu ◽

Masayuki Fukai

Keyword(s):

Reinforcement Learning ◽

Control Technology ◽

Learning Method ◽

Plant Control

Download Full-text

Multi-index Evaluation based Reinforcement Learning Method for Cyclic Optimization of Multiple Energy Utilization in Steel Industry

2020 39th Chinese Control Conference (CCC) ◽

10.23919/ccc50068.2020.9189540 ◽

2020 ◽

Author(s):

Ze Wang ◽

Linqing Wang ◽

Zhongyang Han ◽

Jun Zhao

Keyword(s):

Reinforcement Learning ◽

Steel Industry ◽

Energy Utilization ◽

Learning Method ◽

Multi Index ◽

Index Evaluation

Download Full-text

Collision Avoidance in IEEE 802.11 DCF using a Reinforcement Learning Method

2020 International Conference on Information and Communication Technology Convergence (ICTC) ◽

10.1109/ictc49870.2020.9289402 ◽

2020 ◽

Author(s):

Chang Kyu Lee ◽

Seung Hyong Rhee

Keyword(s):

Reinforcement Learning ◽

Collision Avoidance ◽

Ieee 802.11 ◽

Learning Method ◽

Ieee 802.11 Dcf ◽

802.11 Dcf

Download Full-text

Control of an Inverted Pendulum by Reinforcement Learning Method in PLC Environment

2020 Innovations in Intelligent Systems and Applications Conference (ASYU) ◽

10.1109/asyu50717.2020.9259890 ◽

2020 ◽

Author(s):

Gokhan Demirkiran ◽

Ozcan Erdener ◽

Onay Akpinar ◽

Pelin Demirtas ◽

M. Yagiz Arik ◽

...

Keyword(s):

Reinforcement Learning ◽

Inverted Pendulum ◽

Learning Method

Download Full-text

AoI-Energy-Aware UAV-assisted Data Collection for IoT Networks: A Deep Reinforcement Learning Method

IEEE Internet of Things Journal ◽

10.1109/jiot.2021.3078701 ◽

2021 ◽

pp. 1-1

Author(s):

Mengying Sun ◽

Xiaodong Xu ◽

Xiaoqi Qin ◽

Ping Zhang

Keyword(s):

Reinforcement Learning ◽

Data Collection ◽

Learning Method ◽

Energy Aware

Download Full-text

Enhanced Reinforcement Learning Method Combining One-Hot Encoding-Based Vectors for CNN-Based Alternative High-Level Decisions

Applied Sciences ◽

10.3390/app11031291 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1291

Author(s):

Bonwoo Gu ◽

Yunsick Sung

Keyword(s):

Reinforcement Learning ◽

Search Algorithm ◽

Classification Criteria ◽

Tree Search ◽

Learning Method ◽

Board Game ◽

Ancient China ◽

Monte Carlo Tree Search ◽

High Level ◽

Tree Search Algorithm

Gomoku is a two-player board game that originated in ancient China. There are various cases of developing Gomoku using artificial intelligence, such as a genetic algorithm and a tree search algorithm. Alpha-Gomoku, Gomoku AI built with Alpha-Go’s algorithm, defines all possible situations in the Gomoku board using Monte-Carlo tree search (MCTS), and minimizes the probability of learning other correct answers in the duplicated Gomoku board situation. However, in the tree search algorithm, the accuracy drops, because the classification criteria are manually set. In this paper, we propose an improved reinforcement learning-based high-level decision approach using convolutional neural networks (CNN). The proposed algorithm expresses each state as One-Hot Encoding based vectors and determines the state of the Gomoku board by combining the similar state of One-Hot Encoding based vectors. Thus, in a case where a stone that is determined by CNN has already been placed or cannot be placed, we suggest a method for selecting an alternative. We verify the proposed method of Gomoku AI in GuPyEngine, a Python-based 3D simulation platform.

Download Full-text

MEG signatures of long-term effects of agreement and disagreement with the majority

Scientific Reports ◽

10.1038/s41598-021-82670-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

A. Gorin ◽

V. Klucharev ◽

A. Ossadtchi ◽

I. Zubarev ◽

V. Moiseeva ◽

...

Keyword(s):

Reinforcement Learning ◽

Social Influence ◽

Temporal Dynamics ◽

Peer Group ◽

Long Term Effects ◽

Learning Mechanisms ◽

Source Imaging ◽

The Face ◽

First Session

AbstractPeople often change their beliefs by succumbing to an opinion of others. Such changes are often referred to as effects of social influence. While some previous studies have focused on the reinforcement learning mechanisms of social influence or on its internalization, others have reported evidence of changes in sensory processing evoked by social influence of peer groups. In this study, we used magnetoencephalographic (MEG) source imaging to further investigate the long-term effects of agreement and disagreement with the peer group. The study was composed of two sessions. During the first session, participants rated the trustworthiness of faces and subsequently learned group rating of each face. In the first session, a neural marker of an immediate mismatch between individual and group opinions was found in the posterior cingulate cortex, an area involved in conflict-monitoring and reinforcement learning. To identify the neural correlates of the long-lasting effect of the group opinion, we analysed MEG activity while participants rated faces during the second session. We found MEG traces of past disagreement or agreement with the peers at the parietal cortices 230 ms after the face onset. The neural activity of the superior parietal lobule, intraparietal sulcus, and precuneus was significantly stronger when the participant’s rating had previously differed from the ratings of the peers. The early MEG correlates of disagreement with the majority were followed by activity in the orbitofrontal cortex 320 ms after the face onset. Altogether, the results reveal the temporal dynamics of the neural mechanism of long-term effects of disagreement with the peer group: early signatures of modified face processing were followed by later markers of long-term social influence on the valuation process at the ventromedial prefrontal cortex.

Download Full-text

A Deep Reinforcement Learning Method for Solving Task Mapping Problems with Dynamic Traffic on Parallel Systems

The International Conference on High Performance Computing in Asia-Pacific Region ◽

10.1145/3432261.3432262 ◽

2021 ◽

Author(s):

Yu-Cheng Wang ◽

Jerry Chou ◽

I-Hsin Chung

Keyword(s):

Reinforcement Learning ◽

Parallel Systems ◽

Task Mapping ◽

Learning Method ◽

Dynamic Traffic

Download Full-text