A deep reinforcement learning based long-term recommender system

Recently, the application of deep reinforcement learning in the recommender system is flourishing and stands out by overcoming drawbacks of traditional methods and achieving high recommendation quality. The dynamics, long-term returns, and sparse data issues in the recommender system have been effectively solved. But the application of deep reinforcement learning brings problems of interpretability, overfitting, complex reward function design, and user cold start. This study proposed a tag-aware recommender system based on deep reinforcement learning without complex function design, taking advantage of tags to make up for the interpretability problems existing in the recommender system. Our experiment is carried out on the MovieLens dataset. The result shows that the DRL-based recommender system is superior than traditional algorithms in minimum error, and the application of tags have little effect on accuracy when making up for interpretability. In addition, the DRL-based recommender system has excellent performance on user cold start problems.

Download Full-text

Tag-Aware Recommender System Based on Deep Reinforcement Learning

10.20944/preprints202101.0176.v1 ◽

2021 ◽

Author(s):

Zhiruo Zhao ◽

Lei Cao ◽

Xiliang Chen ◽

Zhixiong Xu

Keyword(s):

Reinforcement Learning ◽

Recommender System ◽

Complex Function ◽

Cold Start ◽

Sparse Data ◽

Excellent Performance ◽

Minimum Error ◽

Reward Function ◽

Function Design

Recently, the application of deep reinforcement learning in recommender system is flourishing and stands out by overcoming drawbacks of traditional methods and achieving high recommendation quality. The dynamics, long-term returns and sparse data issues in recommender system have been effectively solved. But the application of deep reinforcement learning brings problems of interpretability, overfitting, complex reward function design, and user cold start. This paper proposed a tag-aware recommender system based on deep reinforcement learning without complex function design, taking advantage of tags to make up for the interpretability problems existing in recommender system. Our experiment is carried out on MovieLens dataset. The result shows that, DRL based recommender system is superior than traditional algorithms in minimum error and the application of tags has little effect on accuracy when making up for interpretability. In addition, DRL based recommender system has excellent performance on user cold start problems.

Download Full-text

Medium and Long-Term Stochastic Optimization of Hybrid Pumped Storage Reservoir via Reinforcement Learning Method

International Journal of Scientific and Research Publications (IJSRP) ◽

10.29322/ijsrp.8.11.2018.p8309 ◽

2018 ◽

Vol 8 (11) ◽

Author(s):

Daniel Eliote Mbanze ◽

Li Wenwu ◽

Zhang Xueying

Keyword(s):

Reinforcement Learning ◽

Stochastic Optimization ◽

Learning Method ◽

Storage Reservoir ◽

Pumped Storage

Download Full-text

MEG signatures of long-term effects of agreement and disagreement with the majority

Scientific Reports ◽

10.1038/s41598-021-82670-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

A. Gorin ◽

V. Klucharev ◽

A. Ossadtchi ◽

I. Zubarev ◽

V. Moiseeva ◽

...

Keyword(s):

Reinforcement Learning ◽

Social Influence ◽

Temporal Dynamics ◽

Peer Group ◽

Long Term Effects ◽

Learning Mechanisms ◽

Source Imaging ◽

The Face ◽

First Session

AbstractPeople often change their beliefs by succumbing to an opinion of others. Such changes are often referred to as effects of social influence. While some previous studies have focused on the reinforcement learning mechanisms of social influence or on its internalization, others have reported evidence of changes in sensory processing evoked by social influence of peer groups. In this study, we used magnetoencephalographic (MEG) source imaging to further investigate the long-term effects of agreement and disagreement with the peer group. The study was composed of two sessions. During the first session, participants rated the trustworthiness of faces and subsequently learned group rating of each face. In the first session, a neural marker of an immediate mismatch between individual and group opinions was found in the posterior cingulate cortex, an area involved in conflict-monitoring and reinforcement learning. To identify the neural correlates of the long-lasting effect of the group opinion, we analysed MEG activity while participants rated faces during the second session. We found MEG traces of past disagreement or agreement with the peers at the parietal cortices 230 ms after the face onset. The neural activity of the superior parietal lobule, intraparietal sulcus, and precuneus was significantly stronger when the participant’s rating had previously differed from the ratings of the peers. The early MEG correlates of disagreement with the majority were followed by activity in the orbitofrontal cortex 320 ms after the face onset. Altogether, the results reveal the temporal dynamics of the neural mechanism of long-term effects of disagreement with the peer group: early signatures of modified face processing were followed by later markers of long-term social influence on the valuation process at the ventromedial prefrontal cortex.

Download Full-text

Hierarchical multi-agent deep reinforcement learning to develop long-term coordination

Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing - SAC '19 ◽

10.1145/3297280.3297371 ◽

2019 ◽

Author(s):

Marie Ossenkopf ◽

Mackenzie Jorgensen ◽

Kurt Geihs

Keyword(s):

Reinforcement Learning ◽

Multi Agent

Download Full-text

Aircraft Maintenance Check Scheduling Using Reinforcement Learning

Aerospace ◽

10.3390/aerospace8040113 ◽

2021 ◽

Vol 8 (4) ◽

pp. 113

Author(s):

Pedro Andrade ◽

Catarina Silva ◽

Bernardete Ribeiro ◽

Bruno F. Santos

Keyword(s):

Reinforcement Learning ◽

Time Horizon ◽

Learning Algorithm ◽

Initial Conditions ◽

Q Learning ◽

Scheduling Policy ◽

Real Scenario ◽

Maintenance Plan ◽

Small Disturbances

This paper presents a Reinforcement Learning (RL) approach to optimize the long-term scheduling of maintenance for an aircraft fleet. The problem considers fleet status, maintenance capacity, and other maintenance constraints to schedule hangar checks for a specified time horizon. The checks are scheduled within an interval, and the goal is to, schedule them as close as possible to their due date. In doing so, the number of checks is reduced, and the fleet availability increases. A Deep Q-learning algorithm is used to optimize the scheduling policy. The model is validated in a real scenario using maintenance data from 45 aircraft. The maintenance plan that is generated with our approach is compared with a previous study, which presented a Dynamic Programming (DP) based approach and airline estimations for the same period. The results show a reduction in the number of checks scheduled, which indicates the potential of RL in solving this problem. The adaptability of RL is also tested by introducing small disturbances in the initial conditions. After training the model with these simulated scenarios, the results show the robustness of the RL approach and its ability to generate efficient maintenance plans in only a few seconds.

Download Full-text

Predicting Human Mobility with Reinforcement-Learning-Based Long-Term Periodicity Modeling

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3469860 ◽

2021 ◽

Vol 12 (6) ◽

pp. 1-23

Author(s):

Shuo Tao ◽

Jingang Jiang ◽

Defu Lian ◽

Kai Zheng ◽

Enhong Chen

Keyword(s):

Reinforcement Learning ◽

Human Mobility ◽

Recurrent Network ◽

Mobility Prediction ◽

Learning Framework ◽

Temporal Features ◽

Wide Range ◽

Spatio Temporal ◽

Historical Trajectory

Mobility prediction plays an important role in a wide range of location-based applications and services. However, there are three problems in the existing literature: (1) explicit high-order interactions of spatio-temporal features are not systemically modeled; (2) most existing algorithms place attention mechanisms on top of recurrent network, so they can not allow for full parallelism and are inferior to self-attention for capturing long-range dependence; (3) most literature does not make good use of long-term historical information and do not effectively model the long-term periodicity of users. To this end, we propose MoveNet and RLMoveNet. MoveNet is a self-attention-based sequential model, predicting each user’s next destination based on her most recent visits and historical trajectory. MoveNet first introduces a cross-based learning framework for modeling feature interactions. With self-attention on both the most recent visits and historical trajectory, MoveNet can use an attention mechanism to capture the user’s long-term regularity in a more efficient way. Based on MoveNet, to model long-term periodicity more effectively, we add the reinforcement learning layer and named RLMoveNet. RLMoveNet regards the human mobility prediction as a reinforcement learning problem, using the reinforcement learning layer as the regularization part to drive the model to pay attention to the behavior with periodic actions, which can help us make the algorithm more effective. We evaluate both of them with three real-world mobility datasets. MoveNet outperforms the state-of-the-art mobility predictor by around 10% in terms of accuracy, and simultaneously achieves faster convergence and over 4x training speedup. Moreover, RLMoveNet achieves higher prediction accuracy than MoveNet, which proves that modeling periodicity explicitly from the perspective of reinforcement learning is more effective.

Download Full-text

A nonlinear relationship between prediction errors and learning rates in human reinforcement learning

10.1101/751222 ◽

2019 ◽

Author(s):

Erdem Pulcu

Keyword(s):

Reinforcement Learning ◽

Nonlinear Relationship ◽

Prediction Errors ◽

Learning Rates ◽

The Face ◽

In The Wild ◽

Actual Outcome ◽

Update Rules ◽

Different Sources

AbstractWe are living in a dynamic world in which stochastic relationships between cues and outcome events create different sources of uncertainty1 (e.g. the fact that not all grey clouds bring rain). Living in an uncertain world continuously probes learning systems in the brain, guiding agents to make better decisions. This is a type of value-based decision-making which is very important for survival in the wild and long-term evolutionary fitness. Consequently, reinforcement learning (RL) models describing cognitive/computational processes underlying learning-based adaptations have been pivotal in behavioural2,3 and neural sciences4–6, as well as machine learning7,8. This paper demonstrates the suitability of novel update rules for RL, based on a nonlinear relationship between prediction errors (i.e. difference between the agent’s expectation and the actual outcome) and learning rates (i.e. a coefficient with which agents update their beliefs about the environment), that can account for learning-based adaptations in the face of environmental uncertainty. These models illustrate how learners can flexibly adapt to dynamically changing environments.

Download Full-text

Effective Treatment Recommendations for Type 2 Diabetes Management Using Reinforcement Learning: Treatment Recommendation Model Development and Validation (Preprint)

10.2196/preprints.27858 ◽

2021 ◽

Author(s):

Xingzhi Sun ◽

Yong Mong Bee ◽

Shao Wei Lam ◽

Zhuo Liu ◽

Wei Zhao ◽

...

Keyword(s):

Blood Pressure ◽

Type 2 Diabetes ◽

Reinforcement Learning ◽

Blood Lipids ◽

Diabetes Complications ◽

Data Driven ◽

Lipid Lowering ◽

Treatment Recommendation

BACKGROUND Type 2 diabetes mellitus (T2DM) and its related complications represent a growing economic burden for many countries and health systems. Diabetes complications can be prevented through better disease control, but there is a large gap between the recommended treatment and the treatment that patients actually receive. The treatment of T2DM can be challenging because of different comprehensive therapeutic targets and individual variability of the patients, leading to the need for precise, personalized treatment. OBJECTIVE The aim of this study was to develop treatment recommendation models for T2DM based on deep reinforcement learning. A retrospective analysis was then performed to evaluate the reliability and effectiveness of the models. METHODS The data used in our study were collected from the Singapore Health Services Diabetes Registry, encompassing 189,520 patients with T2DM, including 6,407,958 outpatient visits from 2013 to 2018. The treatment recommendation model was built based on 80% of the dataset and its effectiveness was evaluated with the remaining 20% of data. Three treatment recommendation models were developed for antiglycemic, antihypertensive, and lipid-lowering treatments by combining a knowledge-driven model and a data-driven model. The knowledge-driven model, based on clinical guidelines and expert experiences, was first applied to select the candidate medications. The data-driven model, based on deep reinforcement learning, was used to rank the candidates according to the expected clinical outcomes. To evaluate the models, short-term outcomes were compared between the model-concordant treatments and the model-nonconcordant treatments with confounder adjustment by stratification, propensity score weighting, and multivariate regression. For long-term outcomes, model-concordant rates were included as independent variables to evaluate if the combined antiglycemic, antihypertensive, and lipid-lowering treatments had a positive impact on reduction of long-term complication occurrence or death at the patient level via multivariate logistic regression. RESULTS The test data consisted of 36,993 patients for evaluating the effectiveness of the three treatment recommendation models. In 43.3% of patient visits, the antiglycemic medications recommended by the model were concordant with the actual prescriptions of the physicians. The concordant rates for antihypertensive medications and lipid-lowering medications were 51.3% and 58.9%, respectively. The evaluation results also showed that model-concordant treatments were associated with better glycemic control (odds ratio [OR] 1.73, 95% CI 1.69-1.76), blood pressure control (OR 1.26, 95% CI, 1.23-1.29), and blood lipids control (OR 1.28, 95% CI 1.22-1.35). We also found that patients with more model-concordant treatments were associated with a lower risk of diabetes complications (including 3 macrovascular and 2 microvascular complications) and death, suggesting that the models have the potential of achieving better outcomes in the long term. CONCLUSIONS Comprehensive management by combining knowledge-driven and data-driven models has good potential to help physicians improve the clinical outcomes of patients with T2DM; achieving good control on blood glucose, blood pressure, and blood lipids; and reducing the risk of diabetes complications in the long term.

Download Full-text

Long-Term CSI-based Design for RIS-Aided Multiuser MISO Systems Exploiting Deep Reinforcement Learning

IEEE Communications Letters ◽

10.1109/lcomm.2021.3140155 ◽

2022 ◽

pp. 1-1

Author(s):

Hong Ren ◽

Cunhua Pan ◽

Liang Wang ◽

Wang Liu ◽

Zhoubin Kou ◽

...

Keyword(s):

Reinforcement Learning

Download Full-text