Training and testing of recommender systems on data missing not at random

Author(s):  
Harald Steck
Epidemiology ◽  
2018 ◽  
Vol 29 (3) ◽  
pp. 364-368 ◽  
Author(s):  
Jessica R. Marden ◽  
Linbo Wang ◽  
Eric J. Tchetgen Tchetgen ◽  
Stefan Walter ◽  
M. Maria Glymour ◽  
...  

2017 ◽  
Vol 28 (1) ◽  
pp. 134-150 ◽  
Author(s):  
Chi-hong Tseng ◽  
Yi-Hau Chen

It is common in longitudinal studies that missing data occur due to subjects’ no response, missed visits, dropout, death or other reasons during the course of study. To perform valid analysis in this setting, data missing not at random (MNAR) have to be considered. However, models for data MNAR often suffer from the identifiability issue and hence result in difficulty in estimation and computational convergence. To ameliorate this issue, we propose the LASSO and ridge-regularized selection models that regularize the missing data mechanism model to handle data MNAR, with the regularization parameter selected via a cross-validation procedure. The proposed models can be also employed for sensitivity analysis to examine the effects on inference of different assumptions about the missing data mechanism. We illustrate the performance of the proposed models via simulation studies and the analysis of data from a randomized clinical trial.


2015 ◽  
Vol 2015 ◽  
pp. 1-13 ◽  
Author(s):  
Xiangyu Zhao ◽  
Zhendong Niu ◽  
Kaiyi Wang ◽  
Ke Niu ◽  
Zhongqiang Liu

Recommender systems become increasingly significant in solving the information explosion problem. Data sparse is a main challenge in this area. Massive unrated items constitute missing data with only a few observed ratings. Most studies consider missing data as unknown information and only use observed data to learn models and generate recommendations. However, data are missing not at random. Part of missing data is due to the fact that users choose not to rate them. This part of missing data is negative examples of user preferences. Utilizing this information is expected to leverage the performance of recommendation algorithms. Unfortunately, negative examples are mixed with unlabeled positive examples in missing data, and they are hard to be distinguished. In this paper, we propose three schemes to utilize the negative examples in missing data. The schemes are then adapted with SVD++, which is a state-of-the-art matrix factorization recommendation approach, to generate recommendations. Experimental results on two real datasets show that our proposed approaches gain better top-Nperformance than the baseline ones on both accuracy and diversity.


2021 ◽  
pp. 284-298
Author(s):  
Julissa Villanueva Llerena ◽  
Denis Deratani Mauá ◽  
Alessandro Antonucci

2018 ◽  
Author(s):  
Eric Tchetgen Tchetgen ◽  
Baoluo Sun ◽  
Lan Liu ◽  
Wang Miao ◽  
Kathleen Wirth ◽  
...  

2017 ◽  
Vol 18 (2) ◽  
pp. 113-128 ◽  
Author(s):  
Juho Kopra ◽  
Juha Karvanen ◽  
Tommi Härkänen

In epidemiological surveys, data missing not at random (MNAR) due to survey nonresponse may potentially lead to a bias in the risk factor estimates. We propose an approach based on Bayesian data augmentation and survival modelling to reduce the nonresponse bias. The approach requires additional information based on follow-up data. We present a case study of smoking prevalence using FINRISK data collected between 1972 and 2007 with a follow-up to the end of 2012 and compare it to other commonly applied missing at random (MAR) imputation approaches. A simulation experiment is carried out to study the validity of the approaches. Our approach appears to reduce the nonresponse bias substantially, whereas MAR imputation was not successful in bias reduction.


Sign in / Sign up

Export Citation Format

Share Document