scholarly journals Novel Energy Trading System Based on Deep-Reinforcement Learning in Microgrids

Energies ◽  
2021 ◽  
Vol 14 (17) ◽  
pp. 5515
Author(s):  
Seongwoo Lee ◽  
Joonho Seon ◽  
Chanuk Kyeong ◽  
Soohyun Kim ◽  
Youngghyu Sun ◽  
...  

Inefficiencies in energy trading systems of microgrids are mainly caused by uncertainty in non-stationary operating environments. The problem of uncertainty can be mitigated by analyzing patterns of primary operation parameters and their corresponding actions. In this paper, a novel energy trading system based on a double deep Q-networks (DDQN) algorithm and a double Kelly strategy is proposed for improving profits while reducing dependence on the main grid in the microgrid systems. The DDQN algorithm is proposed in order to select optimized action for improving energy transactions. Additionally, the double Kelly strategy is employed to control the microgrid’s energy trading quantity for producing long-term profits. From the simulation results, it is confirmed that the proposed strategies can achieve a significant improvement in the total profits and independence from the main grid via optimized energy transactions.

2019 ◽  
Vol 17 (1) ◽  
pp. 80
Author(s):  
Leandro Maciel ◽  
Rosangela Ballini

<p>Stock exchange automation, characterized by the replacement of floor trading systems by electronic trading systems, is one of the main restructuring processes observed in global capital markets in recent decades. This paper investigates the effects of automation in the São Paulo Stock Exchange (B3), which adopted an electronic trading system in October 2005. Empirical analysis of the Bovespa index rejects the random walk hypothesis for the periods before and after B3 automation, and provides evidence of distinct volatility regimes. After automation, there is an increase in the linear dependence of IBovespa returns, suggesting a negative effect of automation on the Brazilian stock market’s efficiency. On the other hand, in the same period, there is evidence for a reduction in the long-term persistence of conditional volatility, in response to shocks to returns.</p>


2017 ◽  
Vol 11 (3) ◽  
pp. 322-334 ◽  
Author(s):  
Se-Chang Oh ◽  
Min-Soo Kim ◽  
Yoon Park ◽  
Gyu-Tak Roh ◽  
Chin-Woo Lee

Purpose The centralized processes of today’s power trading systems are complex and pose a risk of price tampering and hacking. The decentralized and unmodifiable nature of the blockchain technology that has recently been highlighted offers the potential to improve this power trading process. The purpose of this study is to implement a system to apply the blockchain technology to the problem of power trading. Design/methodology/approach The authors modeled the power trading problem as the interaction between admin, producer and consumer nodes. And a power trading scenario has been created for this model using a blockchain platform called Multichain which is both fast and highly scalable. To verify this scenario, they implemented a trading system using Savoir, a Python-based JsonRPC module. Findings Experimental results show that all processes, such as blockchain creation, node connectivity, asset issuance and exchange transactions have been correctly handled according to the scenario. Originality/value In this study, the authors have proposed and implemented a power trading method that determines price according to the pure market principle and cannot be manipulated or hacked. It is based on the nature of blockchain technology that is decentralized and cannot be tampered.


Author(s):  
Seiya Kuroda ◽  
◽  
Kazuteru Miyazaki ◽  
Hiroaki Kobayashi ◽  
◽  
...  

During a long-term reinforcement learning task, the efficiency of learning is heavily degraded because the probabilistic actions of an agent often cause the task to fail, which makes it difficult to reach the goal and receive a reward. To address this problem, a fixed mode state is proposed in this paper. If the agent acquires an adequate reward, a normal state is switched to a fixed mode state. In this mode, the agent selects an action using a greedy strategy, i.e., it selects the highest weight action deterministically. First, this paper combines Online Profit Sharing reinforcement learning with the Penalty Avoiding Rational Policy Making algorithm, then introduces fixed mode states in it. The target task is then formulated, i.e., learning the modified waist trajectory of dynamically stable walking task based on the static stable walking of a biped robot. Finally, we present our simulation results and discuss the effectiveness of the proposed method.


2019 ◽  
Vol 260 ◽  
pp. 01003
Author(s):  
Sang Hyeon Lee ◽  
Myeong-in Choi ◽  
SangHoon Lee ◽  
SoungHoan Park ◽  
Sehyun Park

As small-scale distributed energy is gradually expanding, commercialization of peer to peer(P2P) energy trading that freely exchanges energy among individuals in various countries is being commercialized, and the Microgrids (MGs) are considered to be an optimal platform for P2P energy trading. Although conducting electricity trade among individuals without going through power companies is still in its infancy, it is expected to expand gradually as the awareness of the shared economy grows and the MG spreads. Research on more efficient trading systems is needed while trading energy in MG. Therefore we propose a more efficient energy trading system that minimizes the loss in proportion to the distance of the power line when energy trading is performed in the MG. We have constructed a virtual MG environment and experimented with energy trading scenarios. As a result, when the algorithm is applied, loss in proportion to the distance is reduced by 2.495% and energy trading becomes more active. The amount of energy and the number of trades increased by 1.5 times during the energy trading process.


Sensors ◽  
2021 ◽  
Vol 21 (12) ◽  
pp. 4237
Author(s):  
Hoon Ko ◽  
Kwangcheol Rim ◽  
Isabel Praça

The biggest problem with conventional anomaly signal detection using features was that it was difficult to use it in real time and it requires processing of network signals. Furthermore, analyzing network signals in real-time required vast amounts of processing for each signal, as each protocol contained various pieces of information. This paper suggests anomaly detection by analyzing the relationship among each feature to the anomaly detection model. The model analyzes the anomaly of network signals based on anomaly feature detection. The selected feature for anomaly detection does not require constant network signal updates and real-time processing of these signals. When the selected features are found in the received signal, the signal is registered as a potential anomaly signal and is then steadily monitored until it is determined as either an anomaly or normal signal. In terms of the results, it determined the anomaly with 99.7% (0.997) accuracy in f(4)(S0) and in case f(4)(REJ) received 11,233 signals with a normal or 171anomaly judgment accuracy of 98.7% (0.987).


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
A. Gorin ◽  
V. Klucharev ◽  
A. Ossadtchi ◽  
I. Zubarev ◽  
V. Moiseeva ◽  
...  

AbstractPeople often change their beliefs by succumbing to an opinion of others. Such changes are often referred to as effects of social influence. While some previous studies have focused on the reinforcement learning mechanisms of social influence or on its internalization, others have reported evidence of changes in sensory processing evoked by social influence of peer groups. In this study, we used magnetoencephalographic (MEG) source imaging to further investigate the long-term effects of agreement and disagreement with the peer group. The study was composed of two sessions. During the first session, participants rated the trustworthiness of faces and subsequently learned group rating of each face. In the first session, a neural marker of an immediate mismatch between individual and group opinions was found in the posterior cingulate cortex, an area involved in conflict-monitoring and reinforcement learning. To identify the neural correlates of the long-lasting effect of the group opinion, we analysed MEG activity while participants rated faces during the second session. We found MEG traces of past disagreement or agreement with the peers at the parietal cortices 230 ms after the face onset. The neural activity of the superior parietal lobule, intraparietal sulcus, and precuneus was significantly stronger when the participant’s rating had previously differed from the ratings of the peers. The early MEG correlates of disagreement with the majority were followed by activity in the orbitofrontal cortex 320 ms after the face onset. Altogether, the results reveal the temporal dynamics of the neural mechanism of long-term effects of disagreement with the peer group: early signatures of modified face processing were followed by later markers of long-term social influence on the valuation process at the ventromedial prefrontal cortex.


Sign in / Sign up

Export Citation Format

Share Document