scholarly journals Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests

2020 ◽  
Vol 34 (04) ◽  
pp. 6518-6525
Author(s):  
Xiao Xu ◽  
Fang Dong ◽  
Yanghua Li ◽  
Shaojian He ◽  
Xin Li

A contextual bandit problem is studied in a highly non-stationary environment, which is ubiquitous in various recommender systems due to the time-varying interests of users. Two models with disjoint and hybrid payoffs are considered to characterize the phenomenon that users' preferences towards different items vary differently over time. In the disjoint payoff model, the reward of playing an arm is determined by an arm-specific preference vector, which is piecewise-stationary with asynchronous and distinct changes across different arms. An efficient learning algorithm that is adaptive to abrupt reward changes is proposed and theoretical regret analysis is provided to show that a sublinear scaling of regret in the time length T is achieved. The algorithm is further extended to a more general setting with hybrid payoffs where the reward of playing an arm is determined by both an arm-specific preference vector and a joint coefficient vector shared by all arms. Empirical experiments are conducted on real-world datasets to verify the advantages of the proposed learning algorithms against baseline ones in both settings.

Machines ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. 319
Author(s):  
Yi-Liang Yeh ◽  
Po-Kai Yang

This paper presents innovative reinforcement learning methods for automatically tuning the parameters of a proportional integral derivative controller. Conventionally, the high dimension of the Q-table is a primary drawback when implementing a reinforcement learning algorithm. To overcome the obstacle, the idea underlying the n-armed bandit problem is used in this paper. Moreover, gain-scheduled actions are presented to tune the algorithms to improve the overall system behavior; therefore, the proposed controllers fulfill the multiple performance requirements. An experiment was conducted for the piezo-actuated stage to illustrate the effectiveness of the proposed control designs relative to competing algorithms.


Author(s):  
S N Huang ◽  
K K Tan ◽  
T H Lee

A novel iterative learning controller for linear time-varying systems is developed. The learning law is derived on the basis of a quadratic criterion. This control scheme does not include package information. The advantage of the proposed learning law is that the convergence is guaranteed without the need for empirical choice of parameters. Furthermore, the tracking error on the final iteration will be a class K function of the bounds on the uncertainties. Finally, simulation results reveal that the proposed control has a good setpoint tracking performance.


2017 ◽  
Vol 45 (3) ◽  
pp. 130-138 ◽  
Author(s):  
Basit Shahzad ◽  
Ikramullah Lali ◽  
M. Saqib Nawaz ◽  
Waqar Aslam ◽  
Raza Mustafa ◽  
...  

Purpose Twitter users’ generated data, known as tweets, are now not only used for communication and opinion sharing, but they are considered an important source of trendsetting, future prediction, recommendation systems and marketing. Using network features in tweet modeling and applying data mining and deep learning techniques on tweets is gaining more and more interest. Design/methodology/approach In this paper, user interests are discovered from Twitter Trends using a modeling approach that uses network-based text data (tweets). First, the popular trends are collected and stored in separate documents. These data are then pre-processed, followed by their labeling in respective categories. Data are then modeled and user interest for each Trending topic is calculated by considering positive tweets in that trend, average retweet and favorite count. Findings The proposed approach can be used to infer users’ topics of interest on Twitter and to categorize them. Support vector machine can be used for training and validation purposes. Positive tweets can be further analyzed to find user posting patterns. There is a positive correlation between tweets and Google data. Practical implications The results can be used in the development of information filtering and prediction systems, especially in personalized recommendation systems. Social implications Twitter microblogging platform offers content posting and sharing to billions of internet users worldwide. Therefore, this work has significant socioeconomic impacts. Originality/value This study guides on how Twitter network structure features can be exploited in discovering user interests using tweets. Further, positive correlation of Twitter Trends with Google Trends is reported, which validates the correctness of the authors’ approach.


Author(s):  
Pinar Demetci ◽  
Rebecca Santorella ◽  
Björn Sandstede ◽  
William Stafford Noble ◽  
Ritambhara Singh

AbstractData integration of single-cell measurements is critical for understanding cell development and disease, but the lack of correspondence between different types of measurements makes such efforts challenging. Several unsupervised algorithms can align heterogeneous single-cell measurements in a shared space, enabling the creation of mappings between single cells in different data domains. However, these algorithms require hyperparameter tuning for high-quality alignments, which is difficult in an unsupervised setting without correspondence information for validation. We present Single-Cell alignment using Optimal Transport (SCOT), an unsupervised learning algorithm that uses Gromov Wasserstein-based optimal transport to align single-cell multi-omics datasets. We compare the alignment performance of SCOT with state-of-the-art algorithms on four simulated and two real-world datasets. SCOT performs on par with state-of-the-art methods but is faster and requires tuning fewer hyperparameters. Furthermore, we provide an algorithm for SCOT to use Gromov Wasserstein distance to guide the parameter selection. Thus, unlike previous methods, SCOT aligns well without using any orthogonal correspondence information to pick the hyperparameters. Our source code and scripts for replicating the results are available at https://github.com/rsinghlab/SCOT.


Author(s):  
Yun-Peng Liu ◽  
Ning Xu ◽  
Yu Zhang ◽  
Xin Geng

The performances of deep neural networks (DNNs) crucially rely on the quality of labeling. In some situations, labels are easily corrupted, and therefore some labels become noisy labels. Thus, designing algorithms that deal with noisy labels is of great importance for learning robust DNNs. However, it is difficult to distinguish between clean labels and noisy labels, which becomes the bottleneck of many methods. To address the problem, this paper proposes a novel method named Label Distribution based Confidence Estimation (LDCE). LDCE estimates the confidence of the observed labels based on label distribution. Then, the boundary between clean labels and noisy labels becomes clear according to confidence scores. To verify the effectiveness of the method, LDCE is combined with the existing learning algorithm to train robust DNNs. Experiments on both synthetic and real-world datasets substantiate the superiority of the proposed algorithm against state-of-the-art methods.


2010 ◽  
Vol 36 (3) ◽  
pp. 481-504 ◽  
Author(s):  
João V. Graça ◽  
Kuzman Ganchev ◽  
Ben Taskar

Word-level alignment of bilingual text is a critical resource for a growing variety of tasks. Probabilistic models for word alignment present a fundamental trade-off between richness of captured constraints and correlations versus efficiency and tractability of inference. In this article, we use the Posterior Regularization framework (Graça, Ganchev, and Taskar 2007) to incorporate complex constraints into probabilistic models during learning without changing the efficiency of the underlying model. We focus on the simple and tractable hidden Markov model, and present an efficient learning algorithm for incorporating approximate bijectivity and symmetry constraints. Models estimated with these constraints produce a significant boost in performance as measured by both precision and recall of manually annotated alignments for six language pairs. We also report experiments on two different tasks where word alignments are required: phrase-based machine translation and syntax transfer, and show promising improvements over standard methods.


2019 ◽  
Vol 44 (4) ◽  
pp. 251-266 ◽  
Author(s):  
Chunxi Tan ◽  
Ruijian Han ◽  
Rougang Ye ◽  
Kani Chen

Personalized recommendation system has been widely adopted in E-learning field that is adaptive to each learner’s own learning pace. With full utilization of learning behavior data, psychometric assessment models keep track of the learner’s proficiency on knowledge points, and then, the well-designed recommendation strategy selects a sequence of actions to meet the objective of maximizing learner’s learning efficiency. This article proposes a novel adaptive recommendation strategy under the framework of reinforcement learning. The proposed strategy is realized by the deep Q-learning algorithms, which are the techniques that contributed to the success of AlphaGo Zero to achieve the super-human level in playing the game of go. The proposed algorithm incorporates an early stopping to account for the possibility that learners may choose to stop learning. It can properly deal with missing data and can handle more individual-specific features for better recommendations. The recommendation strategy guides individual learners with efficient learning paths that vary from person to person. The authors showcase concrete examples with numeric analysis of substantive learning scenarios to further demonstrate the power of the proposed method.


2020 ◽  
Vol 34 (04) ◽  
pp. 6853-6860
Author(s):  
Xuchao Zhang ◽  
Xian Wu ◽  
Fanglan Chen ◽  
Liang Zhao ◽  
Chang-Tien Lu

The success of training accurate models strongly depends on the availability of a sufficient collection of precisely labeled data. However, real-world datasets contain erroneously labeled data samples that substantially hinder the performance of machine learning models. Meanwhile, well-labeled data is usually expensive to obtain and only a limited amount is available for training. In this paper, we consider the problem of training a robust model by using large-scale noisy data in conjunction with a small set of clean data. To leverage the information contained via the clean labels, we propose a novel self-paced robust learning algorithm (SPRL) that trains the model in a process from more reliable (clean) data instances to less reliable (noisy) ones under the supervision of well-labeled data. The self-paced learning process hedges the risk of selecting corrupted data into the training set. Moreover, theoretical analyses on the convergence of the proposed algorithm are provided under mild assumptions. Extensive experiments on synthetic and real-world datasets demonstrate that our proposed approach can achieve a considerable improvement in effectiveness and robustness to existing methods.


Sign in / Sign up

Export Citation Format

Share Document