scholarly journals A proposal of privacy preserving reinforcement learning for secure multiparty computation

2017 ◽  
Vol 6 (2) ◽  
pp. 57 ◽  
Author(s):  
Hirofumi Miyajima ◽  
Noritaka Shigei ◽  
Syunki Makino ◽  
Hiromi Miyajima ◽  
Yohtaro Miyanishi ◽  
...  

Many studies have been done with the security of cloud computing. Though data encryption is a typical approach, high computing complexity for encryption and decryption of data is needed. Therefore, safe system for distributed processing with secure data attracts attention, and a lot of studies have been done. Secure multiparty computation (SMC) is one of these methods. Specifically, two learning methods for machine learning (ML) with SMC are known. One is to divide learning data into several subsets and perform learning. The other is to divide each item of learning data and perform learning. So far, most of works for ML with SMC are ones with supervised and unsupervised learning such as BP and K-means methods. It seems that there does not exist any studies for reinforcement learning (RL) with SMC. This paper proposes learning methods with SMC for Q-learning which is one of typical methods for RL. The effectiveness of proposed methods is shown by numerical simulation for the maze problem.

Author(s):  
Fumito Uwano ◽  
◽  
Keiki Takadama

This study discusses important factors for zero communication, multi-agent cooperation by comparing different modified reinforcement learning methods. The two learning methods used for comparison were assigned different goal selections for multi-agent cooperation tasks. The first method is called Profit Minimizing Reinforcement Learning (PMRL); it forces agents to learn how to reach the farthest goal, and then the agent closest to the goal is directed to the goal. The second method is called Yielding Action Reinforcement Learning (YARL); it forces agents to learn through a Q-learning process, and if the agents have a conflict, the agent that is closest to the goal learns to reach the next closest goal. To compare the two methods, we designed experiments by adjusting the following maze factors: (1) the location of the start point and goal; (2) the number of agents; and (3) the size of maze. The intensive simulations performed on the maze problem for the agent cooperation task revealed that the two methods successfully enabled the agents to exhibit cooperative behavior, even if the size of the maze and the number of agents change. The PMRL mechanism always enables the agents to learn cooperative behavior, whereas the YARL mechanism makes the agents learn cooperative behavior over a small number of learning iterations. In zero communication, multi-agent cooperation, it is important that only agents that have a conflict cooperate with each other.


2018 ◽  
Vol 7 (2) ◽  
pp. 26 ◽  
Author(s):  
Hirofumi Miyajima ◽  
Norio Shiratori ◽  
Hiromi Miyajima

The use of cloud computing system, which is the basic technology supporting ICT, is expanding. However, as the number of terminals connected to it increases, the limit of the capability is also becoming apparent. The limit of its capacity leads to the delay of significant processing time. As an architecture to improve this, the edge computing system has been proposed. This is known as a new paradigm corresponding the conventional cloud system. In the conventional cloud system, a terminal sends all data to the cloud and the cloud returns the result to the terminal or a thing directly connected to it. On the other hand, in the edge system, a plural of servers called edges are connected directly or to close distance between the cloud and the terminal (or thing). Then, let us consider the case of machine learning that requires big data. The purpose of learning is to find out the relationship (information) lurking in from the collected data. In order to realize this, a system with several parameters is assumed and estimated by repeatedly updating the parameters with learning data. Further, there is the problem of the security for learning data. In other words, users of cloud computing cannot escape the concern about the risk of information leakage. How can we build a cloud computing system to avoid such risks? Secure multiparty computation is known as one method of realizing safe computation. It is called SMC (Secure Multiparty Computation). Many studies on learning methods considering on SMC have also been proposed. Then, what kind of learning method is suitable for edge computing considering on SMC? In this paper, learning method suitable for edge computing considering on SMC is proposed. It is shown using an edge system composed of a client and m servers. Learning data are shared m pieces of subsets for m servers, learning is performed simultaneously in each server and system parameters are updated in the client using their results. The idea of learning method is shown using BP algorithm for neural network. The effectiveness is shown by numerical simulations.


Author(s):  
Eugene Ie ◽  
Vihan Jain ◽  
Jing Wang ◽  
Sanmit Narvekar ◽  
Ritesh Agarwal ◽  
...  

Reinforcement learning methods for recommender systems optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items---which may have interacting effects on user choice---methods are required to deal with the combinatorics of the RL action space. We develop SlateQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of its component item-wise LTVs. We demonstrate our methods in simulation, and validate the scalability and effectiveness of decomposed TD-learning on YouTube.


2021 ◽  
Author(s):  
George Angelopoulos ◽  
Dimitris Metafas

Reinforcement Learning methods such as Q Learning, make use of action selection methods, in order to train an agent to perform a task. As the complexity of the task grows, so does the time required to train the agent. In this paper Q Learning is applied onto the board game Dominion, and Forced ε-greedy, an expansion to the ε-greedy action selection method is introduced. As shown in this paper the Forced ε-greedy method achieves to accelerate the training process and optimize its results, especially as the complexity of the task grows.


2013 ◽  
Vol 33 (12) ◽  
pp. 3527-3530
Author(s):  
Yongli DOU ◽  
Haichun WANG ◽  
Jian KANG

Author(s):  
Faxin Qi ◽  
Xiangrong Tong ◽  
Lei Yu ◽  
Yingjie Wang

AbstractWith the development of the Internet and the progress of human-centered computing (HCC), the mode of man-machine collaborative work has become more and more popular. Valuable information in the Internet, such as user behavior and social labels, is often provided by users. A recommendation based on trust is an important human-computer interaction recommendation application in a social network. However, previous studies generally assume that the trust value between users is static, unable to respond to the dynamic changes of user trust and preferences in a timely manner. In fact, after receiving the recommendation, there is a difference between actual evaluation and expected evaluation which is correlated with trust value. Based on the dynamics of trust and the changing process of trust between users, this paper proposes a trust boost method through reinforcement learning. Recursive least squares (RLS) algorithm is used to learn the dynamic impact of evaluation difference on user’s trust. In addition, a reinforcement learning method Deep Q-Learning (DQN) is studied to simulate the process of learning user’s preferences and boosting trust value. Experiments indicate that our method applied to recommendation systems could respond to the changes quickly on user’s preferences. Compared with other methods, our method has better accuracy on recommendation.


Minerals ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 587
Author(s):  
Joao Pedro de Carvalho ◽  
Roussos Dimitrakopoulos

This paper presents a new truck dispatching policy approach that is adaptive given different mining complex configurations in order to deliver supply material extracted by the shovels to the processors. The method aims to improve adherence to the operational plan and fleet utilization in a mining complex context. Several sources of operational uncertainty arising from the loading, hauling and dumping activities can influence the dispatching strategy. Given a fixed sequence of extraction of the mining blocks provided by the short-term plan, a discrete event simulator model emulates the interaction arising from these mining operations. The continuous repetition of this simulator and a reward function, associating a score value to each dispatching decision, generate sample experiences to train a deep Q-learning reinforcement learning model. The model learns from past dispatching experience, such that when a new task is required, a well-informed decision can be quickly taken. The approach is tested at a copper–gold mining complex, characterized by uncertainties in equipment performance and geological attributes, and the results show improvements in terms of production targets, metal production, and fleet management.


Aerospace ◽  
2021 ◽  
Vol 8 (4) ◽  
pp. 113
Author(s):  
Pedro Andrade ◽  
Catarina Silva ◽  
Bernardete Ribeiro ◽  
Bruno F. Santos

This paper presents a Reinforcement Learning (RL) approach to optimize the long-term scheduling of maintenance for an aircraft fleet. The problem considers fleet status, maintenance capacity, and other maintenance constraints to schedule hangar checks for a specified time horizon. The checks are scheduled within an interval, and the goal is to, schedule them as close as possible to their due date. In doing so, the number of checks is reduced, and the fleet availability increases. A Deep Q-learning algorithm is used to optimize the scheduling policy. The model is validated in a real scenario using maintenance data from 45 aircraft. The maintenance plan that is generated with our approach is compared with a previous study, which presented a Dynamic Programming (DP) based approach and airline estimations for the same period. The results show a reduction in the number of checks scheduled, which indicates the potential of RL in solving this problem. The adaptability of RL is also tested by introducing small disturbances in the initial conditions. After training the model with these simulated scenarios, the results show the robustness of the RL approach and its ability to generate efficient maintenance plans in only a few seconds.


Sign in / Sign up

Export Citation Format

Share Document