Evaluating Recommender Systems
Replicating the results of the recommender system's evaluation is one of the main concerns in the area. This paper discusses this issue from different angles: 1) It investigates the uniformity of recommenders' evaluation designs presented in practice and their consistency with the theoretical side. 2) It highlights some of the issues and challenges that face recommenders' evaluators. 3) It provides stepwise guidelines for offline evaluation settings. A quantitative study of articles published in the last decade is studied. The search process is a manual search for a conference and a random search of journals. The results show a lack of uniformity and consistency in presenting the evaluation methods. Most of the articles miss at least one evaluation aspect (i.e., some aspects are not presented in the article). These discrepancies and the wide variety of evaluation settings lead to non-replicable experiments. To mitigate this issue, the paper proposes the recommender evaluation guidelines (REval), which presents a roadmap for recommender systems' evaluators.