A proposal of privacy preserving reinforcement learning for secure multiparty computation

Many studies have been done with the security of cloud computing. Though data encryption is a typical approach, high computing complexity for encryption and decryption of data is needed. Therefore, safe system for distributed processing with secure data attracts attention, and a lot of studies have been done. Secure multiparty computation (SMC) is one of these methods. Specifically, two learning methods for machine learning (ML) with SMC are known. One is to divide learning data into several subsets and perform learning. The other is to divide each item of learning data and perform learning. So far, most of works for ML with SMC are ones with supervised and unsupervised learning such as BP and K-means methods. It seems that there does not exist any studies for reinforcement learning (RL) with SMC. This paper proposes learning methods with SMC for Q-learning which is one of typical methods for RL. The effectiveness of proposed methods is shown by numerical simulation for the maze problem.

Download Full-text

Comparison Between Reinforcement Learning Methods with Different Goal Selections in Multi-Agent Cooperation

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2017.p0917 ◽

2017 ◽

Vol 21 (5) ◽

pp. 917-929 ◽

Cited By ~ 2

Author(s):

Fumito Uwano ◽

◽

Keiki Takadama

Keyword(s):

Reinforcement Learning ◽

Learning Process ◽

Cooperative Behavior ◽

Learning Methods ◽

Q Learning ◽

Designed Experiments ◽

Multi Agent ◽

Agent Cooperation ◽

Maze Problem

This study discusses important factors for zero communication, multi-agent cooperation by comparing different modified reinforcement learning methods. The two learning methods used for comparison were assigned different goal selections for multi-agent cooperation tasks. The first method is called Profit Minimizing Reinforcement Learning (PMRL); it forces agents to learn how to reach the farthest goal, and then the agent closest to the goal is directed to the goal. The second method is called Yielding Action Reinforcement Learning (YARL); it forces agents to learn through a Q-learning process, and if the agents have a conflict, the agent that is closest to the goal learns to reach the next closest goal. To compare the two methods, we designed experiments by adjusting the following maze factors: (1) the location of the start point and goal; (2) the number of agents; and (3) the size of maze. The intensive simulations performed on the maze problem for the agent cooperation task revealed that the two methods successfully enabled the agents to exhibit cooperative behavior, even if the size of the maze and the number of agents change. The PMRL mechanism always enables the agents to learn cooperative behavior, whereas the YARL mechanism makes the agents learn cooperative behavior over a small number of learning iterations. In zero communication, multi-agent cooperation, it is important that only agents that have a conflict cooperate with each other.

Download Full-text

Proposal of security preserving machine learning of IoT

Artificial Intelligence Research ◽

10.5430/air.v7n2p26 ◽

2018 ◽

Vol 7 (2) ◽

pp. 26 ◽

Cited By ~ 2

Author(s):

Hirofumi Miyajima ◽

Norio Shiratori ◽

Hiromi Miyajima

Keyword(s):

Machine Learning ◽

Cloud Computing ◽

Computing System ◽

Secure Multiparty Computation ◽

Edge Computing ◽

Multiparty Computation ◽

Learning Method ◽

System A ◽

Cloud Computing System ◽

Learning Data

The use of cloud computing system, which is the basic technology supporting ICT, is expanding. However, as the number of terminals connected to it increases, the limit of the capability is also becoming apparent. The limit of its capacity leads to the delay of significant processing time. As an architecture to improve this, the edge computing system has been proposed. This is known as a new paradigm corresponding the conventional cloud system. In the conventional cloud system, a terminal sends all data to the cloud and the cloud returns the result to the terminal or a thing directly connected to it. On the other hand, in the edge system, a plural of servers called edges are connected directly or to close distance between the cloud and the terminal (or thing). Then, let us consider the case of machine learning that requires big data. The purpose of learning is to find out the relationship (information) lurking in from the collected data. In order to realize this, a system with several parameters is assumed and estimated by repeatedly updating the parameters with learning data. Further, there is the problem of the security for learning data. In other words, users of cloud computing cannot escape the concern about the risk of information leakage. How can we build a cloud computing system to avoid such risks? Secure multiparty computation is known as one method of realizing safe computation. It is called SMC (Secure Multiparty Computation). Many studies on learning methods considering on SMC have also been proposed. Then, what kind of learning method is suitable for edge computing considering on SMC? In this paper, learning method suitable for edge computing considering on SMC is proposed. It is shown using an edge system composed of a client and m servers. Learning data are shared m pieces of subsets for m servers, learning is performed simultaneously in each server and system parameters are updated in the client using their results. The idea of learning method is shown using BP algorithm for neural network. The effectiveness is shown by numerical simulations.

Download Full-text

SlateQ: A Tractable Decomposition for Reinforcement Learning with Recommendation Sets

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/360 ◽

2019 ◽

Cited By ~ 6

Author(s):

Eugene Ie ◽

Vihan Jain ◽

Jing Wang ◽

Sanmit Narvekar ◽

Ritesh Agarwal ◽

...

Keyword(s):

Reinforcement Learning ◽

Recommender Systems ◽

Choice Behavior ◽

Action Space ◽

User Engagement ◽

Temporal Difference ◽

Learning Methods ◽

User Choice ◽

Q Learning

Reinforcement learning methods for recommender systems optimize recommendations for long-term user engagement. However, since users are often presented with slates of multiple items---which may have interacting effects on user choice---methods are required to deal with the combinatorics of the RL action space. We develop SlateQ, a decomposition of value-based temporal-difference and Q-learning that renders RL tractable with slates. Under mild assumptions on user choice behavior, we show that the long-term value (LTV) of a slate can be decomposed into a tractable function of its component item-wise LTVs. We demonstrate our methods in simulation, and validate the scalability and effectiveness of decomposed TD-learning on YouTube.

Download Full-text

Forced ε-Greedy, an Expansion to the ε-Greedy Action Selection Method

10.3233/faia210070 ◽

2021 ◽

Author(s):

George Angelopoulos ◽

Dimitris Metafas

Keyword(s):

Reinforcement Learning ◽

Action Selection ◽

Selection Method ◽

Board Game ◽

Selection Methods ◽

Training Process ◽

Greedy Method ◽

Learning Methods ◽

Q Learning ◽

Time Required

Reinforcement Learning methods such as Q Learning, make use of action selection methods, in order to train an agent to perform a task. As the complexity of the task grows, so does the time required to train the agent. In this paper Q Learning is applied onto the board game Dominion, and Forced ε-greedy, an expansion to the ε-greedy action selection method is introduced. As shown in this paper the Forced ε-greedy method achieves to accelerate the training process and optimize its results, especially as the complexity of the task grows.

Download Full-text

Secure multiparty computation solutions of collection member decision

Journal of Computer Applications ◽

10.3724/sp.j.1087.2013.03527 ◽

2013 ◽

Vol 33 (12) ◽

pp. 3527-3530

Author(s):

Yongli DOU ◽

Haichun WANG ◽

Jian KANG

Keyword(s):

Secure Multiparty Computation ◽

Multiparty Computation

Download Full-text

Personalized project recommendations: using reinforcement learning

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-019-1619-6 ◽

2019 ◽

Vol 2019 (1) ◽

Cited By ~ 1

Author(s):

Faxin Qi ◽

Xiangrong Tong ◽

Lei Yu ◽

Yingjie Wang

Keyword(s):

Reinforcement Learning ◽

User Behavior ◽

Collaborative Work ◽

Recursive Least Squares ◽

The Internet ◽

Dynamic Impact ◽

Rls Algorithm ◽

Trust Value ◽

Q Learning ◽

Actual Evaluation

AbstractWith the development of the Internet and the progress of human-centered computing (HCC), the mode of man-machine collaborative work has become more and more popular. Valuable information in the Internet, such as user behavior and social labels, is often provided by users. A recommendation based on trust is an important human-computer interaction recommendation application in a social network. However, previous studies generally assume that the trust value between users is static, unable to respond to the dynamic changes of user trust and preferences in a timely manner. In fact, after receiving the recommendation, there is a difference between actual evaluation and expected evaluation which is correlated with trust value. Based on the dynamics of trust and the changing process of trust between users, this paper proposes a trust boost method through reinforcement learning. Recursive least squares (RLS) algorithm is used to learn the dynamic impact of evaluation difference on user’s trust. In addition, a reinforcement learning method Deep Q-Learning (DQN) is studied to simulate the process of learning user’s preferences and boosting trust value. Experiments indicate that our method applied to recommendation systems could respond to the changes quickly on user’s preferences. Compared with other methods, our method has better accuracy on recommendation.

Download Full-text

Integrating Production Planning with Truck-Dispatching Decisions through Reinforcement Learning While Managing Uncertainty

Minerals ◽

10.3390/min11060587 ◽

2021 ◽

Vol 11 (6) ◽

pp. 587

Author(s):

Joao Pedro de Carvalho ◽

Roussos Dimitrakopoulos

Keyword(s):

Reinforcement Learning ◽

Discrete Event ◽

Mining Operations ◽

Fixed Sequence ◽

Q Learning ◽

Reward Function ◽

Copper Gold ◽

Mining Complex ◽

Learning Reinforcement ◽

Operational Plan

This paper presents a new truck dispatching policy approach that is adaptive given different mining complex configurations in order to deliver supply material extracted by the shovels to the processors. The method aims to improve adherence to the operational plan and fleet utilization in a mining complex context. Several sources of operational uncertainty arising from the loading, hauling and dumping activities can influence the dispatching strategy. Given a fixed sequence of extraction of the mining blocks provided by the short-term plan, a discrete event simulator model emulates the interaction arising from these mining operations. The continuous repetition of this simulator and a reward function, associating a score value to each dispatching decision, generate sample experiences to train a deep Q-learning reinforcement learning model. The model learns from past dispatching experience, such that when a new task is required, a well-informed decision can be quickly taken. The approach is tested at a copper–gold mining complex, characterized by uncertainties in equipment performance and geological attributes, and the results show improvements in terms of production targets, metal production, and fleet management.

Download Full-text

Continuity and Smoothness Analysis and Possible Improvement of Traditional Reinforcement Learning Methods

2020 IEEE International Conference on Mechatronics and Automation (ICMA) ◽

10.1109/icma49215.2020.9233547 ◽

2020 ◽

Author(s):

Tianhao Chen ◽

Wenchuan Jia ◽

Jianjun Yuan ◽

Shugen Ma ◽

Limei Cheng

Keyword(s):

Reinforcement Learning ◽

Learning Methods

Download Full-text

Reinforcement Learning Methods on Optimization Problems of Natural Gas Pipeline Networks

2020 4th International Conference on Smart Grid and Smart Cities (ICSGSC) ◽

10.1109/icsgsc50906.2020.9248563 ◽

2020 ◽

Author(s):

Dong Yang ◽

Siyun Yan ◽

Dengji Zhou ◽

Tiemin Shao ◽

Lin Zhang ◽

...

Keyword(s):

Reinforcement Learning ◽

Natural Gas ◽

Optimization Problems ◽

Gas Pipeline ◽

Learning Methods ◽

Natural Gas Pipeline ◽

Pipeline Networks

Download Full-text

Aircraft Maintenance Check Scheduling Using Reinforcement Learning

Aerospace ◽

10.3390/aerospace8040113 ◽

2021 ◽

Vol 8 (4) ◽

pp. 113

Author(s):

Pedro Andrade ◽

Catarina Silva ◽

Bernardete Ribeiro ◽

Bruno F. Santos

Keyword(s):

Reinforcement Learning ◽

Time Horizon ◽

Learning Algorithm ◽

Initial Conditions ◽

Q Learning ◽

Scheduling Policy ◽

Real Scenario ◽

Maintenance Plan ◽

Small Disturbances

This paper presents a Reinforcement Learning (RL) approach to optimize the long-term scheduling of maintenance for an aircraft fleet. The problem considers fleet status, maintenance capacity, and other maintenance constraints to schedule hangar checks for a specified time horizon. The checks are scheduled within an interval, and the goal is to, schedule them as close as possible to their due date. In doing so, the number of checks is reduced, and the fleet availability increases. A Deep Q-learning algorithm is used to optimize the scheduling policy. The model is validated in a real scenario using maintenance data from 45 aircraft. The maintenance plan that is generated with our approach is compared with a previous study, which presented a Dynamic Programming (DP) based approach and airline estimations for the same period. The results show a reduction in the number of checks scheduled, which indicates the potential of RL in solving this problem. The adaptability of RL is also tested by introducing small disturbances in the initial conditions. After training the model with these simulated scenarios, the results show the robustness of the RL approach and its ability to generate efficient maintenance plans in only a few seconds.

Download Full-text