A Simulation Environment for Training a Reinforcement Learning Agent Trading a Battery Storage

Battery storages are an essential element of the emerging smart grid. Compared to other distributed intelligent energy resources, batteries have the advantage of being able to rapidly react to events such as renewable generation fluctuations or grid disturbances. There is a lack of research on ways to profitably exploit this ability. Any solution needs to consider rapid electrical phenomena as well as the much slower dynamics of relevant electricity markets. Reinforcement learning is a branch of artificial intelligence that has shown promise in optimizing complex problems involving uncertainty. This article applies reinforcement learning to the problem of trading batteries. The problem involves two timescales, both of which are important for profitability. Firstly, trading the battery capacity must occur on the timescale of the chosen electricity markets. Secondly, the real-time operation of the battery must ensure that no financial penalties are incurred from failing to meet the technical specification. The trading-related decisions must be done under uncertainties, such as unknown future market prices and unpredictable power grid disturbances. In this article, a simulation model of a battery system is proposed as the environment to train a reinforcement learning agent to make such decisions. The system is demonstrated with an application of the battery to Finnish primary frequency reserve markets.

Download Full-text

A Real-time Operation Scheme of Microgrids with Distributed Generation in Electricity Markets

American Journal of Electrical and Electronic Engineering ◽

10.12691/ajeee-5-5-4 ◽

2017 ◽

Vol 5 (5) ◽

pp. 189-194

Author(s):

Nguyen Minh Y

Keyword(s):

Real Time ◽

Distributed Generation ◽

Electricity Markets ◽

Time Operation ◽

Real Time Operation

Download Full-text

Real-time operation of distribution network: A deep reinforcement learning-based reconfiguration approach

Sustainable Energy Technologies and Assessments ◽

10.1016/j.seta.2021.101841 ◽

2022 ◽

Vol 50 ◽

pp. 101841

Author(s):

Van-Hai Bui ◽

Wencong Su

Keyword(s):

Reinforcement Learning ◽

Real Time ◽

Distribution Network ◽

Time Operation ◽

Real Time Operation

Download Full-text

Changing the Day-Ahead Gate Closure to Wind Power Integration: A Simulation-Based Study

Energies ◽

10.3390/en12142765 ◽

2019 ◽

Vol 12 (14) ◽

pp. 2765 ◽

Cited By ~ 3

Author(s):

Hugo Algarvio ◽

António Couto ◽

Fernando Lopes ◽

Ana Estanqueiro

Keyword(s):

Wind Power ◽

Electricity Markets ◽

Electricity Market ◽

Energy System ◽

Market Outcomes ◽

Time Operation ◽

Wind Power Forecast ◽

Simulation Based ◽

Real Time Operation ◽

The Impact

Currently, in most European electricity markets, power bids are based on forecasts performed 12 to 36 hours ahead. Actual wind power forecast systems still lead to large errors, which may strongly impact electricity market outcomes. Accordingly, this article analyzes the impact of the wind power forecast uncertainty and the change of the day-ahead market gate closure on both the market-clearing prices and the outcomes of the balancing market. To this end, it presents a simulation-based study conducted with the help of an agent-based tool, called MATREM. The results support the following conclusion: a change in the gate closure to a time closer to real-time operation is beneficial to market participants and the energy system generally.

Download Full-text

An Evaluation Methodology for Interactive Reinforcement Learning with Simulated Users

Biomimetics ◽

10.3390/biomimetics6010013 ◽

2021 ◽

Vol 6 (1) ◽

pp. 13

Author(s):

Adam Bignold ◽

Francisco Cruz ◽

Richard Dazeley ◽

Peter Vamplew ◽

Cameron Foale

Keyword(s):

Reinforcement Learning ◽

Information Source ◽

Human Interaction ◽

Evaluation Methodology ◽

External Information ◽

Preliminary Evaluation ◽

Learning Agents ◽

Learning Agent ◽

Knowledge Bias ◽

The Impact

Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent.

Download Full-text

Location- and Person-Independent Activity Recognition with WiFi, Deep Neural Networks, and Reinforcement Learning

ACM Transactions on Internet of Things ◽

10.1145/3424739 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-25

Author(s):

Yongsen Ma ◽

Sheheryar Arshad ◽

Swetha Muniraju ◽

Eric Torkildson ◽

Enrico Rantala ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Reinforcement Learning ◽

Activity Recognition ◽

Deep Neural Networks ◽

State Machine ◽

Recognition Algorithm ◽

The State ◽

Neural Architecture ◽

Learning Agent

In recent years, Channel State Information (CSI) measured by WiFi is widely used for human activity recognition. In this article, we propose a deep learning design for location- and person-independent activity recognition with WiFi. The proposed design consists of three Deep Neural Networks (DNNs): a 2D Convolutional Neural Network (CNN) as the recognition algorithm, a 1D CNN as the state machine, and a reinforcement learning agent for neural architecture search. The recognition algorithm learns location- and person-independent features from different perspectives of CSI data. The state machine learns temporal dependency information from history classification results. The reinforcement learning agent optimizes the neural architecture of the recognition algorithm using a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). The proposed design is evaluated in a lab environment with different WiFi device locations, antenna orientations, sitting/standing/walking locations/orientations, and multiple persons. The proposed design has 97% average accuracy when testing devices and persons are not seen during training. The proposed design is also evaluated by two public datasets with accuracy of 80% and 83%. The proposed design needs very little human efforts for ground truth labeling, feature engineering, signal processing, and tuning of learning parameters and hyperparameters.

Download Full-text

Exploring 3D Human Action Recognition Using STACOG on Multi-View Depth Motion Maps Sequences

Sensors ◽

10.3390/s21113642 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3642

Author(s):

Mohammad Farhad Bulbul ◽

Sadiya Tabussum ◽

Hazrat Ali ◽

Wenli Zheng ◽

Mi Young Lee ◽

...

Keyword(s):

Action Recognition ◽

Depth Map ◽

Human Action Recognition ◽

Human Action ◽

Collaborative Representation ◽

Auto Correlation ◽

Time Operation ◽

Real Time Operation ◽

Benchmark Datasets ◽

Depth Motion Maps

This paper proposes an action recognition framework for depth map sequences using the 3D Space-Time Auto-Correlation of Gradients (STACOG) algorithm. First, each depth map sequence is split into two sets of sub-sequences of two different frame lengths individually. Second, a number of Depth Motion Maps (DMMs) sequences from every set are generated and are fed into STACOG to find an auto-correlation feature vector. For two distinct sets of sub-sequences, two auto-correlation feature vectors are obtained and applied gradually to L2-regularized Collaborative Representation Classifier (L2-CRC) for computing a pair of sets of residual values. Next, the Logarithmic Opinion Pool (LOGP) rule is used to combine the two different outcomes of L2-CRC and to allocate an action label of the depth map sequence. Finally, our proposed framework is evaluated on three benchmark datasets named MSR-action 3D dataset, DHA dataset, and UTD-MHAD dataset. We compare the experimental results of our proposed framework with state-of-the-art approaches to prove the effectiveness of the proposed framework. The computational efficiency of the framework is also analyzed for all the datasets to check whether it is suitable for real-time operation or not.

Download Full-text

Crowd Evacuation Guidance Based on Combined Action Reinforcement Learning

Algorithms ◽

10.3390/a14010026 ◽

2021 ◽

Vol 14 (1) ◽

pp. 26

Author(s):

Yiran Xue ◽

Rui Wu ◽

Jiafeng Liu ◽

Xianglong Tang

Keyword(s):

Reinforcement Learning ◽

Guidance System ◽

Force Model ◽

Interactive Simulation ◽

Social Force ◽

Novel Approach ◽

Learning Agent ◽

Network Output ◽

Combined Action ◽

Crowd Evacuation

Existing crowd evacuation guidance systems require the manual design of models and input parameters, incurring a significant workload and a potential for errors. This paper proposed an end-to-end intelligent evacuation guidance method based on deep reinforcement learning, and designed an interactive simulation environment based on the social force model. The agent could automatically learn a scene model and path planning strategy with only scene images as input, and directly output dynamic signage information. Aiming to solve the “dimension disaster” phenomenon of the deep Q network (DQN) algorithm in crowd evacuation, this paper proposed a combined action-space DQN (CA-DQN) algorithm that grouped Q network output layer nodes according to action dimensions, which significantly reduced the network complexity and improved system practicality in complex scenes. In this paper, the evacuation guidance system is defined as a reinforcement learning agent and implemented by the CA-DQN method, which provides a novel approach for the evacuation guidance problem. The experiments demonstrate that the proposed method is superior to the static guidance method, and on par with the manually designed model method.

Download Full-text

Autonomous reinforcement learning agent for chemical vapor deposition synthesis of quantum materials

npj Computational Materials ◽

10.1038/s41524-021-00535-3 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Pankaj Rajak ◽

Aravind Krishnamoorthy ◽

Ankit Mishra ◽

Rajiv Kalia ◽

Aiichiro Nakano ◽

...

Keyword(s):

Chemical Vapor Deposition ◽

Reinforcement Learning ◽

Vapor Deposition ◽

Chemical Vapor ◽

Time Behavior ◽

Materials Synthesis ◽

Design Synthesis ◽

Learning Agent ◽

Threshold Temperatures ◽

Quantum Materials

AbstractPredictive materials synthesis is the primary bottleneck in realizing functional and quantum materials. Strategies for synthesis of promising materials are currently identified by time-consuming trial and error and there are no known predictive schemes to design synthesis parameters for materials. We use offline reinforcement learning (RL) to predict optimal synthesis schedules, i.e., a time-sequence of reaction conditions like temperatures and concentrations, for the synthesis of semiconducting monolayer MoS2 using chemical vapor deposition. The RL agent, trained on 10,000 computational synthesis simulations, learned threshold temperatures and chemical potentials for onset of chemical reactions and predicted previously unknown synthesis schedules that produce well-sulfidized crystalline, phase-pure MoS2. The model can be extended to multi-task objectives such as predicting profiles for synthesis of complex structures including multi-phase heterostructures and can predict long-time behavior of reacting systems, far beyond the domain of molecular dynamics simulations, making these predictions directly relevant to experimental synthesis.

Download Full-text

A Confrontation Decision-Making Method with Deep Reinforcement Learning and Knowledge Transfer for Multi-Agent System

Symmetry ◽

10.3390/sym12040631 ◽

2020 ◽

Vol 12 (4) ◽

pp. 631

Author(s):

Chunyang Hu

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Knowledge Transfer ◽

Large Scale ◽

Effective Control ◽

Small Scale ◽

Learning Agent ◽

Multi Agent ◽

Transfer Method ◽

Parameter Sharing

In this paper, deep reinforcement learning (DRL) and knowledge transfer are used to achieve the effective control of the learning agent for the confrontation in the multi-agent systems. Firstly, a multi-agent Deep Deterministic Policy Gradient (DDPG) algorithm with parameter sharing is proposed to achieve confrontation decision-making of multi-agent. In the process of training, the information of other agents is introduced to the critic network to improve the strategy of confrontation. The parameter sharing mechanism can reduce the loss of experience storage. In the DDPG algorithm, we use four neural networks to generate real-time action and Q-value function respectively and use a momentum mechanism to optimize the training process to accelerate the convergence rate for the neural network. Secondly, this paper introduces an auxiliary controller using a policy-based reinforcement learning (RL) method to achieve the assistant decision-making for the game agent. In addition, an effective reward function is used to help agents balance losses of enemies and our side. Furthermore, this paper also uses the knowledge transfer method to extend the learning model to more complex scenes and improve the generalization of the proposed confrontation model. Two confrontation decision-making experiments are designed to verify the effectiveness of the proposed method. In a small-scale task scenario, the trained agent can successfully learn to fight with the competitors and achieve a good winning rate. For large-scale confrontation scenarios, the knowledge transfer method can gradually improve the decision-making level of the learning agent.

Download Full-text