Medium and Long-Term Stochastic Optimization of Hybrid Pumped Storage Reservoir via Reinforcement Learning Method

Author(s):  
Daniel Eliote Mbanze ◽  
Li Wenwu ◽  
Zhang Xueying
2021 ◽  
Author(s):  
Gabrielle Dallaire ◽  
Richard Arsenault ◽  
Pascal Côté ◽  
Kenjy Demeester

<p>Hydropower is a renewable source of energy that relies on efficient water planning and management. As the behavior of this natural resource is difficult to predict, water managers therefore use methods to help the decision-making process. Reinforcement Learning (RL) has been shown to be a potentially effective approach to overcome the limitations of the Stochastic Dynamic Programming (SDP) method that is commonly used for water management. However, convergence to a robust and efficient operating policy from RL methods requires large amounts of data, while long-term historical data is not always available. The objective of this study consists in using tools to generate long-term hydrological series to obtain an efficient parameterization of the management policy. This presentation introduces a comparison of calibration datasets used in a RL method for the optimal control of a hydropower system. This method aims to find a feedback policy that maximizes the production of a hydropower system over a mid-term horizon. Three streamflow datasets are compared on a real hydropower system for RL calibration: 1) the historical streamflow (35 years), 2) streamflow simulated by a hydrological model driven by a high-resolution large-ensemble climate model data (3500 years) from the ClimEx project, and 3) streamflow simulated by a hydrological model driven by climate data generated with a stochastic weather generator (5000 years). The GR4J hydrological model is employed for the hydrologic modelling aspect of the work. The reinforcement learning method is applied on the Lac-Saint-Jean water resources system in Quebec (Canada), where the hydrological regime is snowmelt-dominated. A bootstrapping method where multiple calibration and validation sets were resampled is used to conduct a robust statistical analysis for comparing the methods’ performance. The performance of the calibrated management policy is evaluated with respect to the operational constraints of the system as well as the overall energy production. Preliminary results show that is possible to achieve effective management policies by using tools to generate long-term hydrological series to feed a RL method.</p>


2009 ◽  
Vol 129 (7) ◽  
pp. 1253-1263
Author(s):  
Toru Eguchi ◽  
Takaaki Sekiai ◽  
Akihiro Yamada ◽  
Satoru Shimizu ◽  
Masayuki Fukai

Author(s):  
Gokhan Demirkiran ◽  
Ozcan Erdener ◽  
Onay Akpinar ◽  
Pelin Demirtas ◽  
M. Yagiz Arik ◽  
...  

2021 ◽  
Vol 11 (3) ◽  
pp. 1291
Author(s):  
Bonwoo Gu ◽  
Yunsick Sung

Gomoku is a two-player board game that originated in ancient China. There are various cases of developing Gomoku using artificial intelligence, such as a genetic algorithm and a tree search algorithm. Alpha-Gomoku, Gomoku AI built with Alpha-Go’s algorithm, defines all possible situations in the Gomoku board using Monte-Carlo tree search (MCTS), and minimizes the probability of learning other correct answers in the duplicated Gomoku board situation. However, in the tree search algorithm, the accuracy drops, because the classification criteria are manually set. In this paper, we propose an improved reinforcement learning-based high-level decision approach using convolutional neural networks (CNN). The proposed algorithm expresses each state as One-Hot Encoding based vectors and determines the state of the Gomoku board by combining the similar state of One-Hot Encoding based vectors. Thus, in a case where a stone that is determined by CNN has already been placed or cannot be placed, we suggest a method for selecting an alternative. We verify the proposed method of Gomoku AI in GuPyEngine, a Python-based 3D simulation platform.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
A. Gorin ◽  
V. Klucharev ◽  
A. Ossadtchi ◽  
I. Zubarev ◽  
V. Moiseeva ◽  
...  

AbstractPeople often change their beliefs by succumbing to an opinion of others. Such changes are often referred to as effects of social influence. While some previous studies have focused on the reinforcement learning mechanisms of social influence or on its internalization, others have reported evidence of changes in sensory processing evoked by social influence of peer groups. In this study, we used magnetoencephalographic (MEG) source imaging to further investigate the long-term effects of agreement and disagreement with the peer group. The study was composed of two sessions. During the first session, participants rated the trustworthiness of faces and subsequently learned group rating of each face. In the first session, a neural marker of an immediate mismatch between individual and group opinions was found in the posterior cingulate cortex, an area involved in conflict-monitoring and reinforcement learning. To identify the neural correlates of the long-lasting effect of the group opinion, we analysed MEG activity while participants rated faces during the second session. We found MEG traces of past disagreement or agreement with the peers at the parietal cortices 230 ms after the face onset. The neural activity of the superior parietal lobule, intraparietal sulcus, and precuneus was significantly stronger when the participant’s rating had previously differed from the ratings of the peers. The early MEG correlates of disagreement with the majority were followed by activity in the orbitofrontal cortex 320 ms after the face onset. Altogether, the results reveal the temporal dynamics of the neural mechanism of long-term effects of disagreement with the peer group: early signatures of modified face processing were followed by later markers of long-term social influence on the valuation process at the ventromedial prefrontal cortex.


Sign in / Sign up

Export Citation Format

Share Document