Training and testing of recommender systems on data missing not at random

It is common in longitudinal studies that missing data occur due to subjects’ no response, missed visits, dropout, death or other reasons during the course of study. To perform valid analysis in this setting, data missing not at random (MNAR) have to be considered. However, models for data MNAR often suffer from the identifiability issue and hence result in difficulty in estimation and computational convergence. To ameliorate this issue, we propose the LASSO and ridge-regularized selection models that regularize the missing data mechanism model to handle data MNAR, with the regularization parameter selected via a cross-validation procedure. The proposed models can be also employed for sensitivity analysis to examine the effects on inference of different assumptions about the missing data mechanism. We illustrate the performance of the proposed models via simulation studies and the analysis of data from a randomized clinical trial.

Download Full-text

Semiparametric maximum likelihood estimation with data missing not at random

Canadian Journal of Statistics ◽

10.1002/cjs.11340 ◽

2017 ◽

Vol 45 (4) ◽

pp. 393-409 ◽

Cited By ~ 7

Author(s):

Kosuke Morikawa ◽

Jae Kwang Kim ◽

Yutaka Kano

Keyword(s):

Maximum Likelihood ◽

Maximum Likelihood Estimation ◽

Likelihood Estimation ◽

Missing Not At Random ◽

Data Missing

Download Full-text

Robust estimation for moment condition models with data missing not at random

Journal of Statistical Planning and Inference ◽

10.1016/j.jspi.2020.01.001 ◽

2020 ◽

Vol 207 ◽

pp. 246-254 ◽

Cited By ~ 1

Author(s):

Wei Li ◽

Shu Yang ◽

Peisong Han

Keyword(s):

Robust Estimation ◽

Moment Condition ◽

Missing Not At Random ◽

Data Missing

Download Full-text

Improving Top-NRecommendation Performance Using Missing Data

Mathematical Problems in Engineering ◽

10.1155/2015/380472 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 4

Author(s):

Xiangyu Zhao ◽

Zhendong Niu ◽

Kaiyi Wang ◽

Ke Niu ◽

Zhongqiang Liu

Keyword(s):

Missing Data ◽

Recommender Systems ◽

Matrix Factorization ◽

State Of The Art ◽

User Preferences ◽

Missing Not At Random ◽

Main Challenge ◽

Recommendation Algorithms ◽

Random Part ◽

Problem Data

Recommender systems become increasingly significant in solving the information explosion problem. Data sparse is a main challenge in this area. Massive unrated items constitute missing data with only a few observed ratings. Most studies consider missing data as unknown information and only use observed data to learn models and generate recommendations. However, data are missing not at random. Part of missing data is due to the fact that users choose not to rate them. This part of missing data is negative examples of user preferences. Utilizing this information is expected to leverage the performance of recommendation algorithms. Unfortunately, negative examples are mixed with unlabeled positive examples in missing data, and they are hard to be distinguished. In this paper, we propose three schemes to utilize the negative examples in missing data. The schemes are then adapted with SVD++, which is a state-of-the-art matrix factorization recommendation approach, to generate recommendations. Experimental results on two real datasets show that our proposed approaches gain better top-Nperformance than the baseline ones on both accuracy and diversity.

Download Full-text

Cautious Classification with Data Missing Not at Random Using Generative Random Forests

10.1007/978-3-030-86772-0_21 ◽

2021 ◽

pp. 284-298

Author(s):

Julissa Villanueva Llerena ◽

Denis Deratani Mauá ◽

Alessandro Antonucci

Keyword(s):

Random Forests ◽

Missing Not At Random ◽

Data Missing

Download Full-text

Semiparametric Estimation with Data Missing Not at Random Using an Instrumental Variable

Statistica Sinica ◽

10.5705/ss.202016.0324 ◽

2018 ◽

Author(s):

Eric Tchetgen Tchetgen ◽

Baoluo Sun ◽

Lan Liu ◽

Wang Miao ◽

Kathleen Wirth ◽

...

Keyword(s):

Instrumental Variable ◽

Semiparametric Estimation ◽

Missing Not At Random ◽

Data Missing

Download Full-text

Bayesian models for data missing not at random in health examination surveys

Statistical Modelling ◽

10.1177/1471082x17722605 ◽

2017 ◽

Vol 18 (2) ◽

pp. 113-128 ◽

Cited By ~ 3

Author(s):

Juho Kopra ◽

Juha Karvanen ◽

Tommi Härkänen

Keyword(s):

Data Augmentation ◽

Missing At Random ◽

Bias Reduction ◽

Nonresponse Bias ◽

Health Examination ◽

Missing Not At Random ◽

Additional Information ◽

Epidemiological Surveys ◽

Data Missing

In epidemiological surveys, data missing not at random (MNAR) due to survey nonresponse may potentially lead to a bias in the risk factor estimates. We propose an approach based on Bayesian data augmentation and survival modelling to reduce the nonresponse bias. The approach requires additional information based on follow-up data. We present a case study of smoking prevalence using FINRISK data collected between 1972 and 2007 with a follow-up to the end of 2012 and compare it to other commonly applied missing at random (MAR) imputation approaches. A simulation experiment is carried out to study the validity of the approaches. Our approach appears to reduce the nonresponse bias substantially, whereas MAR imputation was not successful in bias reduction.

Download Full-text