Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests

Xiao Xu; Fang Dong; Yanghua Li; Shaojian He; Xin Li

doi:10.1609/aaai.v34i04.6125

Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6125 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6518-6525

Author(s):

Xiao Xu ◽

Fang Dong ◽

Yanghua Li ◽

Shaojian He ◽

Xin Li

Keyword(s):

Learning Algorithm ◽

General Setting ◽

Personalized Recommendation ◽

Time Varying ◽

Bandit Problem ◽

User Interests ◽

Specific Preference ◽

Coefficient Vector ◽

Real World Datasets ◽

Efficient Learning

A contextual bandit problem is studied in a highly non-stationary environment, which is ubiquitous in various recommender systems due to the time-varying interests of users. Two models with disjoint and hybrid payoffs are considered to characterize the phenomenon that users' preferences towards different items vary differently over time. In the disjoint payoff model, the reward of playing an arm is determined by an arm-specific preference vector, which is piecewise-stationary with asynchronous and distinct changes across different arms. An efficient learning algorithm that is adaptive to abrupt reward changes is proposed and theoretical regret analysis is provided to show that a sublinear scaling of regret in the time length T is achieved. The algorithm is further extended to a more general setting with hybrid payoffs where the reward of playing an arm is determined by both an arm-specific preference vector and a joint coefficient vector shared by all arms. Empirical experiments are conducted on real-world datasets to verify the advantages of the proposed learning algorithms against baseline ones in both settings.

Download Full-text

Design and Comparison of Reinforcement-Learning-Based Time-Varying PID Controllers with Gain-Scheduled Actions

Machines ◽

10.3390/machines9120319 ◽

2021 ◽

Vol 9 (12) ◽

pp. 319

Author(s):

Yi-Liang Yeh ◽

Po-Kai Yang

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Pid Controllers ◽

Time Varying ◽

Bandit Problem ◽

Proportional Integral Derivative ◽

System Behavior ◽

Performance Requirements ◽

Gain Scheduled ◽

Control Designs

This paper presents innovative reinforcement learning methods for automatically tuning the parameters of a proportional integral derivative controller. Conventionally, the high dimension of the Q-table is a primary drawback when implementing a reinforcement learning algorithm. To overcome the obstacle, the idea underlying the n-armed bandit problem is used in this paper. Moreover, gain-scheduled actions are presented to tune the algorithms to improve the overall system behavior; therefore, the proposed controllers fulfill the multiple performance requirements. An experiment was conducted for the piezo-actuated stage to illustrate the effectiveness of the proposed control designs relative to competing algorithms.

Download Full-text

Neural network modelling of flow stress and mechanical properties for hot strip rolling of TRIP steel using efficient learning algorithm

Ironmaking & Steelmaking ◽

10.1179/1743281212y.0000000047 ◽

2013 ◽

Vol 40 (4) ◽

pp. 298-304 ◽

Cited By ~ 8

Author(s):

S K Das

Keyword(s):

Neural Network ◽

Mechanical Properties ◽

Flow Stress ◽

Trip Steel ◽

Learning Algorithm ◽

Hot Strip Rolling ◽

Strip Rolling ◽

Network Modelling ◽

Hot Strip ◽

Efficient Learning

Download Full-text

Iterative learning algorithm with a quadratic criterion for linear time-varying systems

Proceedings of the Institution of Mechanical Engineers Part I Journal of Systems and Control Engineering ◽

10.1177/095965180221600309 ◽

2002 ◽

Vol 216 (3) ◽

pp. 309-316 ◽

Cited By ~ 1

Author(s):

S N Huang ◽

K K Tan ◽

T H Lee

Keyword(s):

Learning Algorithm ◽

Linear Time ◽

Tracking Error ◽

Iterative Learning ◽

Tracking Performance ◽

Time Varying ◽

Control Scheme ◽

Time Varying Systems ◽

Simulation Results ◽

Learning Law

A novel iterative learning controller for linear time-varying systems is developed. The learning law is derived on the basis of a quadratic criterion. This control scheme does not include package information. The advantage of the proposed learning law is that the convergence is guaranteed without the need for empirical choice of parameters. Furthermore, the tracking error on the final iteration will be a class K function of the bounds on the uncertainties. Finally, simulation results reveal that the proposed control has a good setpoint tracking performance.

Download Full-text

Discovery and classification of user interests on social media

Information Discovery and Delivery ◽

10.1108/idd-03-2017-0023 ◽

2017 ◽

Vol 45 (3) ◽

pp. 130-138 ◽

Cited By ~ 8

Author(s):

Basit Shahzad ◽

Ikramullah Lali ◽

M. Saqib Nawaz ◽

Waqar Aslam ◽

Raza Mustafa ◽

...

Keyword(s):

Information Filtering ◽

Recommendation Systems ◽

Personalized Recommendation ◽

Support Vector ◽

User Interest ◽

Socioeconomic Impacts ◽

Content Type ◽

Positive Correlation ◽

User Interests ◽

Twitter Users

Purpose Twitter users’ generated data, known as tweets, are now not only used for communication and opinion sharing, but they are considered an important source of trendsetting, future prediction, recommendation systems and marketing. Using network features in tweet modeling and applying data mining and deep learning techniques on tweets is gaining more and more interest. Design/methodology/approach In this paper, user interests are discovered from Twitter Trends using a modeling approach that uses network-based text data (tweets). First, the popular trends are collected and stored in separate documents. These data are then pre-processed, followed by their labeling in respective categories. Data are then modeled and user interest for each Trending topic is calculated by considering positive tweets in that trend, average retweet and favorite count. Findings The proposed approach can be used to infer users’ topics of interest on Twitter and to categorize them. Support vector machine can be used for training and validation purposes. Positive tweets can be further analyzed to find user posting patterns. There is a positive correlation between tweets and Google data. Practical implications The results can be used in the development of information filtering and prediction systems, especially in personalized recommendation systems. Social implications Twitter microblogging platform offers content posting and sharing to billions of internet users worldwide. Therefore, this work has significant socioeconomic impacts. Originality/value This study guides on how Twitter network structure features can be exploited in discovering user interests using tweets. Further, positive correlation of Twitter Trends with Google Trends is reported, which validates the correctness of the authors’ approach.

Download Full-text

Gromov-Wasserstein optimal transport to align single-cell multi-omics data

10.1101/2020.04.28.066787 ◽

2020 ◽

Cited By ~ 2

Author(s):

Pinar Demetci ◽

Rebecca Santorella ◽

Björn Sandstede ◽

William Stafford Noble ◽

Ritambhara Singh

Keyword(s):

Single Cell ◽

Optimal Transport ◽

Learning Algorithm ◽

State Of The Art ◽

Single Cells ◽

Wasserstein Distance ◽

Cell Alignment ◽

Shared Space ◽

Real World Datasets ◽

Unsupervised Algorithms

AbstractData integration of single-cell measurements is critical for understanding cell development and disease, but the lack of correspondence between different types of measurements makes such efforts challenging. Several unsupervised algorithms can align heterogeneous single-cell measurements in a shared space, enabling the creation of mappings between single cells in different data domains. However, these algorithms require hyperparameter tuning for high-quality alignments, which is difficult in an unsupervised setting without correspondence information for validation. We present Single-Cell alignment using Optimal Transport (SCOT), an unsupervised learning algorithm that uses Gromov Wasserstein-based optimal transport to align single-cell multi-omics datasets. We compare the alignment performance of SCOT with state-of-the-art algorithms on four simulated and two real-world datasets. SCOT performs on par with state-of-the-art methods but is faster and requires tuning fewer hyperparameters. Furthermore, we provide an algorithm for SCOT to use Gromov Wasserstein distance to guide the parameter selection. Thus, unlike previous methods, SCOT aligns well without using any orthogonal correspondence information to pick the hyperparameters. Our source code and scripts for replicating the results are available at https://github.com/rsinghlab/SCOT.

Download Full-text

Label Distribution for Learning with Noisy Labels

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/356 ◽

2020 ◽

Author(s):

Yun-Peng Liu ◽

Ning Xu ◽

Yu Zhang ◽

Xin Geng

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Learning Algorithm ◽

State Of The Art ◽

Confidence Estimation ◽

Novel Method ◽

Real World Datasets ◽

Label Distribution ◽

Noisy Labels

The performances of deep neural networks (DNNs) crucially rely on the quality of labeling. In some situations, labels are easily corrupted, and therefore some labels become noisy labels. Thus, designing algorithms that deal with noisy labels is of great importance for learning robust DNNs. However, it is difficult to distinguish between clean labels and noisy labels, which becomes the bottleneck of many methods. To address the problem, this paper proposes a novel method named Label Distribution based Confidence Estimation (LDCE). LDCE estimates the confidence of the observed labels based on label distribution. Then, the boundary between clean labels and noisy labels becomes clear according to confidence scores. To verify the effectiveness of the method, LDCE is combined with the existing learning algorithm to train robust DNNs. Experiments on both synthetic and real-world datasets substantiate the superiority of the proposed algorithm against state-of-the-art methods.

Download Full-text

Monetary transmission in three central European economies: evidence from time-varying coefficient vector autoregressions

Empirica ◽

10.1007/s10663-012-9197-4 ◽

2012 ◽

Vol 40 (2) ◽

pp. 363-390 ◽

Cited By ~ 8

Author(s):

Zsolt Darvas

Keyword(s):

Monetary Transmission ◽

Time Varying ◽

Vector Autoregressions ◽

Central European ◽

Varying Coefficient ◽

Coefficient Vector ◽

European Economies

Download Full-text

Learning Tractable Word Alignment Models with Complex Constraints

Computational Linguistics ◽

10.1162/coli_a_00007 ◽

2010 ◽

Vol 36 (3) ◽

pp. 481-504 ◽

Cited By ~ 6

Author(s):

João V. Graça ◽

Kuzman Ganchev ◽

Ben Taskar

Keyword(s):

Probabilistic Models ◽

Learning Algorithm ◽

Word Alignment ◽

Word Level ◽

Word Alignments ◽

Symmetry Constraints ◽

Critical Resource ◽

Complex Constraints ◽

Bilingual Text ◽

Efficient Learning

Word-level alignment of bilingual text is a critical resource for a growing variety of tasks. Probabilistic models for word alignment present a fundamental trade-off between richness of captured constraints and correlations versus efficiency and tractability of inference. In this article, we use the Posterior Regularization framework (Graça, Ganchev, and Taskar 2007) to incorporate complex constraints into probabilistic models during learning without changing the efficiency of the underlying model. We focus on the simple and tractable hidden Markov model, and present an efficient learning algorithm for incorporating approximate bijectivity and symmetry constraints. Models estimated with these constraints produce a significant boost in performance as measured by both precision and recall of manually annotated alignments for six language pairs. We also report experiments on two different tasks where word alignments are required: phrase-based machine translation and syntax transfer, and show promising improvements over standard methods.

Download Full-text

Adaptive Learning Recommendation Strategy Based on Deep Q-learning

Applied Psychological Measurement ◽

10.1177/0146621619858674 ◽

2019 ◽

Vol 44 (4) ◽

pp. 251-266 ◽

Cited By ~ 1

Author(s):

Chunxi Tan ◽

Ruijian Han ◽

Rougang Ye ◽

Kani Chen

Keyword(s):

Adaptive Learning ◽

Recommendation System ◽

Personalized Recommendation ◽

Early Stopping ◽

Q Learning ◽

E Learning ◽

Learning Scenarios ◽

Efficient Learning ◽

Recommendation Strategy ◽

Full Utilization

Personalized recommendation system has been widely adopted in E-learning field that is adaptive to each learner’s own learning pace. With full utilization of learning behavior data, psychometric assessment models keep track of the learner’s proficiency on knowledge points, and then, the well-designed recommendation strategy selects a sequence of actions to meet the objective of maximizing learner’s learning efficiency. This article proposes a novel adaptive recommendation strategy under the framework of reinforcement learning. The proposed strategy is realized by the deep Q-learning algorithms, which are the techniques that contributed to the success of AlphaGo Zero to achieve the super-human level in playing the game of go. The proposed algorithm incorporates an early stopping to account for the possibility that learners may choose to stop learning. It can properly deal with missing data and can handle more individual-specific features for better recommendations. The recommendation strategy guides individual learners with efficient learning paths that vary from person to person. The authors showcase concrete examples with numeric analysis of substantive learning scenarios to further demonstrate the power of the proposed method.

Download Full-text

Self-Paced Robust Learning for Leveraging Clean Labels in Noisy Data

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6166 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6853-6860

Author(s):

Xuchao Zhang ◽

Xian Wu ◽

Fanglan Chen ◽

Liang Zhao ◽

Chang-Tien Lu

Keyword(s):

Real World ◽

Large Scale ◽

Learning Algorithm ◽

Noisy Data ◽

Training Set ◽

Robust Learning ◽

Robust Model ◽

Small Set ◽

Real World Datasets ◽

Theoretical Analyses

The success of training accurate models strongly depends on the availability of a sufficient collection of precisely labeled data. However, real-world datasets contain erroneously labeled data samples that substantially hinder the performance of machine learning models. Meanwhile, well-labeled data is usually expensive to obtain and only a limited amount is available for training. In this paper, we consider the problem of training a robust model by using large-scale noisy data in conjunction with a small set of clean data. To leverage the information contained via the clean labels, we propose a novel self-paced robust learning algorithm (SPRL) that trains the model in a process from more reliable (clean) data instances to less reliable (noisy) ones under the supervision of well-labeled data. The self-paced learning process hedges the risk of selecting corrupted data into the training set. Moreover, theoretical analyses on the convergence of the proposed algorithm are provided under mild assumptions. Extensive experiments on synthetic and real-world datasets demonstrate that our proposed approach can achieve a considerable improvement in effectiveness and robustness to existing methods.

Download Full-text