reward mechanism
Recently Published Documents


TOTAL DOCUMENTS

65
(FIVE YEARS 34)

H-INDEX

9
(FIVE YEARS 2)

2022 ◽  
pp. 1-20
Author(s):  
D. Xu ◽  
G. Chen

Abstract In this paper, we expolore Multi-Agent Reinforcement Learning (MARL) methods for unmanned aerial vehicle (UAV) cluster. Considering that the current UAV cluster is still in the program control stage, the fully autonomous and intelligent cooperative combat has not been realised. In order to realise the autonomous planning of the UAV cluster according to the changing environment and cooperate with each other to complete the combat goal, we propose a new MARL framework. It adopts the policy of centralised training with decentralised execution, and uses Actor-Critic network to select the execution action and then to make the corresponding evaluation. The new algorithm makes three key improvements on the basis of Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. The first is to improve learning framework; it makes the calculated Q value more accurate. The second is to add collision avoidance setting, which can increase the operational safety factor. And the third is to adjust reward mechanism; it can effectively improve the cluster’s cooperative ability. Then the improved MADDPG algorithm is tested by performing two conventional combat missions. The simulation results show that the learning efficiency is obviously improved, and the operational safety factor is further increased compared with the previous algorithm.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Jingjing Jiang ◽  
Aobo Lyu

This study aims to solve the credit problems in the supply chain commodity and currency circulation links from the perspective of the ledger, while the game model method has been adopted. The research firstly reviews the relationship between distributed ledger technology and the essential functions of currency. Then, by constructing two-agent single-period and multi-period game models in the entire supply chain, the researchers analysed the incentive mechanism and equilibrium solution of distributed nodes of Central Bank Digital Currency (CBDC). The results of this study include the incentive mechanism and optimization of distributed nodes based on licensed distributed ledger technology, which is an important issue that CBDC faces when performing currency functions. The implications of this study mainly cover the limitations of the underlying technology of the public chain and its reward mechanism in the supply chain management and provide support for the rationality of the CBDC issuance mechanism based on state-owned commercial banks, which provides a reference for the CBDC practice. The main value of the research not only serves the decision-making department of the CBDC issuance but also provides ideas on the operation mode of digital currency for the field of digital currency research.


2021 ◽  
Vol 20 (6) ◽  
pp. 1-33
Author(s):  
Kaustabha Ray ◽  
Ansuman Banerjee

Multi-Access Edge Computing (MEC) has emerged as a promising new paradigm allowing low latency access to services deployed on edge servers to avert network latencies often encountered in accessing cloud services. A key component of the MEC environment is an auto-scaling policy which is used to decide the overall management and scaling of container instances corresponding to individual services deployed on MEC servers to cater to traffic fluctuations. In this work, we propose a Safe Reinforcement Learning (RL)-based auto-scaling policy agent that can efficiently adapt to traffic variations to ensure adherence to service specific latency requirements. We model the MEC environment using a Markov Decision Process (MDP). We demonstrate how latency requirements can be formally expressed in Linear Temporal Logic (LTL). The LTL specification acts as a guide to the policy agent to automatically learn auto-scaling decisions that maximize the probability of satisfying the LTL formula. We introduce a quantitative reward mechanism based on the LTL formula to tailor service specific latency requirements. We prove that our reward mechanism ensures convergence of standard Safe-RL approaches. We present experimental results in practical scenarios on a test-bed setup with real-world benchmark applications to show the effectiveness of our approach in comparison to other state-of-the-art methods in literature. Furthermore, we perform extensive simulated experiments to demonstrate the effectiveness of our approach in large scale scenarios.


2021 ◽  
Vol 2021 (1) ◽  
Author(s):  
Yu Su ◽  
Shuijie Wang ◽  
Qianqian Cheng ◽  
Yuhe Qiu

AbstractWith regard to video streaming services under wireless networks, how to improve the quality of experience (QoE) has always been a challenging task. Especially after the arrival of the 5G era, more attention has been paid to analyze the experience quality of video streaming in more complex network scenarios (such as 5G-powered drone video transmission). Insufficient buffer in the video stream transmission process will cause the playback to freeze [1]. In order to cope with this defect, this paper proposes a buffer starvation evaluation model based on deep learning and a video stream scheduling model based on reinforcement learning. This approach uses the method of machine learning to extract the correlation between the buffer starvation probability distribution and the traffic load, thereby obtaining the explicit evaluation results of buffer starvation events and a series of resource allocation strategies that optimize long-term QoE. In order to deal with the noise problem caused by the random environment, the model introduces an internal reward mechanism in the scheduling process, so that the agent can fully explore the environment. Experiments have proved that our framework can effectively evaluate and improve the video service quality of 5G-powered UAV.


Author(s):  
Yongbiao Gao ◽  
Ning Xu ◽  
Xin Geng

Reinforcement learning maps from perceived state representation to actions, which is adopted to solve the video summarization problem. The reward is crucial for deal with the video summarization task via reinforcement learning, since the reward signal defines the goal of video summarization. However, existing reward mechanism in reinforcement learning cannot handle the ambiguity which appears frequently in video summarization, i.e., the diverse consciousness by different people on the same video. To solve this problem, in this paper label distributions are mapped from the CNN and LSTM-based state representation to capture the subjectiveness of video summaries. The dual-reward is designed by measuring the similarity between user score distributions and the generated label distributions. Not only the average score but also the the variance of the subjective opinions are considered in summary generation. Experimental results on several benchmark datasets show that our proposed method outperforms other approaches under various settings.


Sensors ◽  
2021 ◽  
Vol 21 (15) ◽  
pp. 5146
Author(s):  
Masoud Kamali ◽  
Mohammad Reza Malek ◽  
Sara Saeedi ◽  
Steve Liang

Due to the increasing relevance of spatial information in different aspects of location-based services, various methods are used to collect this information. The use of crowdsourcing due to plurality and distribution is a remarkable strategy for collecting information, especially spatial information. Crowdsourcing can have a substantial effect on increasing the accuracy of data. However, many centralized crowdsourcing systems lack security and transparency due to a trusted party’s existence. With the emergence of blockchain technology, there has been an increase in security, transparency, and traceability in spatial crowdsourcing systems. In this paper, we propose a blockchain-based spatial crowdsourcing system in which workers confirm or reject the accuracy of tasks. Tasks are reports submitted by requesters to the system; a report comprises type and location. To our best knowledge, the proposed system is the first system that all participants receive rewards. This system considers spatial and non-spatial reward factors to encourage users’ participation in collecting accurate spatial information. Privacy preservation and security of spatial information are considered in the system. We also evaluated the system efficiency. According to the experiment results, using the proposed system, information accuracy increased by 40%, and the minimum time for reviewing reports by facilities reduced by 30%. Moreover, we compared the proposed system with the current centralized and distributed crowdsourcing systems. This comparison shows that, although our proposed system omits the user’s history to preserve privacy, it considers a consensus-based approach to guarantee submitted reports’ accuracy. The proposed system also has a reward mechanism to encourage more participation.


2021 ◽  
pp. 102421
Author(s):  
Enrique Fatas ◽  
Daniele Nosenzo ◽  
Martin Sefton ◽  
Daniel John Zizzo

2021 ◽  
Vol 2 (XXI) ◽  
pp. 219-229
Author(s):  
Jakub Zieliński

This article focuses on the issue of Random Rewards Mechanisms, in particular loot box mechanism, being implemented in modern video games. Nowadays, in the era of progressive technological progress, the borders between gambling and gaming have blurred as never before. More and more countries came to the conclusion, that different features of video games are problematically similar to classically-understood gambling. The report titled „Loot boxes in online games and their effect on consumers, in particular young consumers” provided by the Policy Department for Economic, Scientific and Quality of Life Policies at the request of the committee on Internal Market and Consumer Protection has accurately defined the problem, as simultaneously concerning both gambling regulations, as well as consumer protection laws. The report constitutes excellent starting point for a further discussion about loot boxes in UE and in Poland. The article provides detailed analysis of the issue, beginning with defining the term “loot box”, followed by scrutiny of both domestic and Union regulations. Relying on the distinction of loot-boxes provided by R. K. L. Nielsen and P. Grabarczyk author have analysed the regulations in order to present different approaches to the issue and their consequences. In the final part of the article author comments on Polish regulations and triers to predict the future development of law concerning loot boxes. The paper refers to multiple reports, laws and rulings regarding discussed matter.


2021 ◽  
Vol 11 (11) ◽  
pp. 5154
Author(s):  
Yeonggwang Kim ◽  
Jaehyung Park ◽  
Jinyoung Kim ◽  
Junchurl Yoon ◽  
Sangjoon Lee ◽  
...  

As the resource management systems continues to grow, the resource distribution system is expected to expand steadily. The demand response system enables producers to reduce the consumption costs of an enterprise during fluctuating periods in order balance the supply grid and resell the remaining resources of the product to generate revenue. Q-learning, a reinforcement learning algorithm based on a resource distribution compensation mechanism, is used to make optimal decisions to schedule the operation of smart factory appliances. In this paper, we proposed an effective resource management system for enterprise demand response using a Quad Q Network algorithm. The proposed algorithm is based on a Deep Q Network algorithm that directly integrates supply-demand inputs into control logic and employs fuzzy inference as a reward mechanism. In addition to using uses the Compare Optimizer method to reduce the loss value of the proposed Q Network Algorithm, Quad Q Network also maintains a high accuracy with fewer epochs. The proposed algorithm was applied to market capitalization data obtained from Google and Apple. Also, we verified that the Compare Optimizer used in Quad Q Network derives the minimum loss value through the double operation of Double Q value.


Sign in / Sign up

Export Citation Format

Share Document