Flatter Is Better

It is well known that explicit user ratings in recommender systems are biased toward high ratings and that users differ significantly in their usage of the rating scale. Implementers usually compensate for these issues through rating normalization or the inclusion of a user bias term in factorization models. However, these methods adjust only for the central tendency of users’ distributions. In this work, we demonstrate that a lack of flatness in rating distributions is negatively correlated with recommendation performance. We propose a rating transformation model that compensates for skew in the rating distribution as well as its central tendency by converting ratings into percentile values as a pre-processing step before recommendation generation. This transformation flattens the rating distribution, better compensates for differences in rating distributions, and improves recommendation performance. We also show that a smoothed version of this transformation can yield more intuitive results for users with very narrow rating distributions. A comprehensive set of experiments, with state-of-the-art recommendation algorithms in four real-world datasets, show improved ranking performance for these percentile transformations.

Download Full-text

Novel Collaborative Filtering Recommender Friendly to Privacy Protection

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/668 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jun Wang ◽

Qiang Tang ◽

Afonso Arriaga ◽

Peter Y. A. Ryan

Keyword(s):

Privacy Protection ◽

State Of The Art ◽

Research Question ◽

Huge Amount ◽

Recommendation Algorithm ◽

Recommendation Algorithms ◽

Very Large Datasets ◽

Recommendation Accuracy ◽

Real World Datasets ◽

Indispensable Tool

Nowadays, recommender system is an indispensable tool in many information services, and a large number of algorithms have been designed and implemented. However, fed with very large datasets, state-of-the-art recommendation algorithms often face an efficiency bottleneck, i.e., it takes huge amount of computing resources to train a recommendation model. In order to satisfy the needs of privacy-savvy users who do not want to disclose their information to the service provider, the complexity of most existing solutions becomes prohibitive. As such, it is an interesting research question to design simple and efficient recommendation algorithms that achieve reasonable accuracy and facilitate privacy protection at the same time. In this paper, we propose an efficient recommendation algorithm, named CryptoRec, which has two nice properties: (1) can estimate a new user's preferences by directly using a model pre-learned from an expert dataset, and the new user's data is not required to train the model; (2) can compute recommendations with only addition and multiplication operations. As to the evaluation, we first test the recommendation accuracy on three real-world datasets and show that CryptoRec is competitive with state-of-the-art recommenders. Then, we evaluate the performance of the privacy-preserving variants of CryptoRec and show that predictions can be computed in seconds on a PC. In contrast, existing solutions will need tens or hundreds of hours on more powerful computers.

Download Full-text

Exploring Clustering-Based Reinforcement Learning for Personalized Book Recommendation in Digital Library

Information ◽

10.3390/info12050198 ◽

2021 ◽

Vol 12 (5) ◽

pp. 198

Author(s):

Xinhua Wang ◽

Yuchen Wang ◽

Lei Guo ◽

Liancheng Xu ◽

Baozhong Gao ◽

...

Keyword(s):

Reinforcement Learning ◽

Digital Library ◽

Learning Algorithm ◽

State Of The Art ◽

Decision Making Process ◽

Learning Method ◽

Large Collection ◽

Recommendation Algorithms ◽

Small Set ◽

Real World Datasets

Digital library as one of the most important ways in helping students acquire professional knowledge and improve their professional level has gained great attention in recent years. However, its large collection (especially the book resources) hinders students from finding the resources that they are interested in. To overcome this challenge, many researchers have already turned to recommendation algorithms. Compared with traditional recommendation tasks, in the digital library, there are two challenges in book recommendation problems. The first is that users may borrow books that they are not interested in (i.e., noisy borrowing behaviours), such as borrowing books for classmates. The second is that the number of books in a digital library is usually very large, which means one student can only borrow a small set of books in history (i.e., data sparsity issue). As the noisy interactions in students’ borrowing sequences may harm the recommendation performance of a book recommender, we focus on refining recommendations via filtering out data noises. Moreover, due to the the lack of direct supervision information, we treat noise filtering in sequences as a decision-making process and innovatively introduce a reinforcement learning method as our recommendation framework. Furthermore, to overcome the sparsity issue of students’ borrowing behaviours, a clustering-based reinforcement learning algorithm is further developed. Experimental results on two real-world datasets demonstrate the superiority of our proposed method compared with several state-of-the-art recommendation methods.

Download Full-text

Auxiliary Information-Enhanced Recommendations

Applied Sciences ◽

10.3390/app11198830 ◽

2021 ◽

Vol 11 (19) ◽

pp. 8830

Author(s):

Shoujin Wang ◽

Wanggen Wan ◽

Tong Qu ◽

Yanqiu Dong

Keyword(s):

Real World ◽

State Of The Art ◽

Auxiliary Information ◽

Recommendation Algorithm ◽

Textual Information ◽

Image Information ◽

Sequential Dependencies ◽

Recommendation Algorithms ◽

The Rich ◽

Real World Datasets

Sequential recommendations have attracted increasing attention from both academia and industry in recent years. They predict a given user’s next choice of items by mainly modeling the sequential relations over a sequence of the user’s interactions with the items. However, most of the existing sequential recommendation algorithms mainly focus on the sequential dependencies between item IDs within sequences, while ignoring the rich and complex relations embedded in the auxiliary information, such as items’ image information and textual information. Such complex relations can help us better understand users’ preferences towards items, and thus benefit from the recommendations. To bridge this gap, we propose an auxiliary information-enhanced sequential recommendation algorithm called memory fusion network for recommendation (MFN4Rec) to incorporate both items’ image and textual information for sequential recommendations. Accordingly, item IDs, item image information and item textual information are regarded as three modalities. By comprehensively modelling the sequential relations within modalities and interaction relations across modalities, MFN4Rec can learn a more informative representation of users’ preferences for more accurate recommendations. Extensive experiments on two real-world datasets demonstrate the superiority of MFN4Rec over state-of-the-art sequential recommendation algorithms.

Download Full-text

Understanding Users' Budgets for Recommendation with Hierarchical Poisson Factorization

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/247 ◽

2017 ◽

Cited By ~ 1

Author(s):

Yunhui Guo ◽

Congfu Xu ◽

Hanzhang Song ◽

Xin Wang

Keyword(s):

Recommender Systems ◽

Real World ◽

Online Shopping ◽

State Of The Art ◽

Generative Model ◽

Personal Consumption ◽

Proposed Model ◽

Recommendation Algorithms ◽

Real World Datasets ◽

Consumption Habits

People consume and rate products in online shopping websites. The historical purchases of customers reflect their personal consumption habits and indicate their future shopping behaviors. Traditional preference-based recommender systems try to provide recommendations by analyzing users' feedback such as ratings and clicks. But unfortunately, most of the existing recommendation algorithms ignore the budget of the users. So they cannot avoid recommending users with products that will exceed their budgets. And they also cannot understand how the users will assign their budgets to different products. In this paper, we develop a generative model named collaborative budget-aware Poisson factorization (CBPF) to connect users' ratings and budgets. The CBPF model is intuitive and highly interpretable. We compare the proposed model with several state-of-the-art budget-unaware recommendation methods on several real-world datasets. The results show the advantage of uncovering users' budgets for recommendation.

Download Full-text

G-Tric: generating three-way synthetic datasets with triclustering solutions

BMC Bioinformatics ◽

10.1186/s12859-020-03925-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

João Lobo ◽

Rui Henriques ◽

Sara C. Madeira

Keyword(s):

State Of The Art ◽

Synthetic Data ◽

Ground Truth ◽

Real Data ◽

Three Dimensions ◽

Additional Advantage ◽

Urban Dynamics ◽

Data Generator ◽

Real World Datasets ◽

Synthetic Datasets

Abstract Background Three-way data started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations $$\times$$ × features $$\times$$ × contexts). With increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount. These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real 3-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. Results G-Tric can replicate real-world datasets and create new ones that match researchers needs across several properties, including data type (numeric or symbolic), dimensions, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled, by defining the amount of missing, noise or errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. Conclusions Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric’s potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches.

Download Full-text

TransET: Knowledge Graph Embedding with Entity Types

Electronics ◽

10.3390/electronics10121407 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1407

Author(s):

Peng Wang ◽

Jing Zhou ◽

Yuzhang Liu ◽

Xingchen Zhou

Keyword(s):

Link Prediction ◽

State Of The Art ◽

Score Function ◽

Graph Embedding ◽

Vector Spaces ◽

Knowledge Graph ◽

Semantic Features ◽

Knowledge Graphs ◽

Real World Datasets ◽

Low Dimensional

Knowledge graph embedding aims to embed entities and relations into low-dimensional vector spaces. Most existing methods only focus on triple facts in knowledge graphs. In addition, models based on translation or distance measurement cannot fully represent complex relations. As well-constructed prior knowledge, entity types can be employed to learn the representations of entities and relations. In this paper, we propose a novel knowledge graph embedding model named TransET, which takes advantage of entity types to learn more semantic features. More specifically, circle convolution based on the embeddings of entity and entity types is utilized to map head entity and tail entity to type-specific representations, then translation-based score function is used to learn the presentation triples. We evaluated our model on real-world datasets with two benchmark tasks of link prediction and triple classification. Experimental results demonstrate that it outperforms state-of-the-art models in most cases.

Download Full-text

Density Guarantee on Finding Multiple Subgraphs and Subtensors

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3446668 ◽

2021 ◽

Vol 15 (5) ◽

pp. 1-32

Author(s):

Quang-huy Duong ◽

Heri Ramampiaro ◽

Kjetil Nørvåg ◽

Thu-lan Dam

Keyword(s):

Lower Bound ◽

State Of The Art ◽

The State ◽

The Other ◽

Exact Methods ◽

Practical Solution ◽

Novel Approach ◽

Wide Range ◽

Real World Datasets ◽

Tensor Data

Dense subregion (subgraph & subtensor) detection is a well-studied area, with a wide range of applications, and numerous efficient approaches and algorithms have been proposed. Approximation approaches are commonly used for detecting dense subregions due to the complexity of the exact methods. Existing algorithms are generally efficient for dense subtensor and subgraph detection, and can perform well in many applications. However, most of the existing works utilize the state-or-the-art greedy 2-approximation algorithm to capably provide solutions with a loose theoretical density guarantee. The main drawback of most of these algorithms is that they can estimate only one subtensor, or subgraph, at a time, with a low guarantee on its density. While some methods can, on the other hand, estimate multiple subtensors, they can give a guarantee on the density with respect to the input tensor for the first estimated subsensor only. We address these drawbacks by providing both theoretical and practical solution for estimating multiple dense subtensors in tensor data and giving a higher lower bound of the density. In particular, we guarantee and prove a higher bound of the lower-bound density of the estimated subgraph and subtensors. We also propose a novel approach to show that there are multiple dense subtensors with a guarantee on its density that is greater than the lower bound used in the state-of-the-art algorithms. We evaluate our approach with extensive experiments on several real-world datasets, which demonstrates its efficiency and feasibility.

Download Full-text

A social trust and preference segmentation-based matrix factorization recommendation algorithm

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-019-1600-4 ◽

2019 ◽

Vol 2019 (1) ◽

Cited By ~ 2

Author(s):

Wei Peng ◽

Baogui Xin

Keyword(s):

Matrix Factorization ◽

Social Trust ◽

State Of The Art ◽

User Preference ◽

Social Recommendation ◽

Recommendation Algorithm ◽

Commercial Activities ◽

Trust Relationships ◽

Recommendation Algorithms ◽

The Difference

AbstractA recommendation can inspire potential demands of users and make e-commerce platforms more intelligent and is essential for e-commerce enterprises’ sustainable development. The traditional social recommendation algorithm ignores the following fact: the preferences of users with trust relationships are not necessarily similar, and the consideration of user preference similarity should be limited to specific areas. To solve these problems mentioned above, we propose a social trust and preference segmentation-based matrix factorization (SPMF) recommendation algorithm. Experimental results based on the Ciao and Epinions datasets show that the accuracy of the SPMF algorithm is significantly superior to that of some state-of-the-art recommendation algorithms. The SPMF algorithm is a better recommendation algorithm based on distinguishing the difference of trust relations and preference domain, which can support commercial activities such as product marketing.

Download Full-text

Chi-Squared Distance Metric Learning for Histogram Data

Mathematical Problems in Engineering ◽

10.1155/2015/352849 ◽

2015 ◽

Vol 2015 ◽

pp. 1-12 ◽

Cited By ~ 2

Author(s):

Wei Yang ◽

Luhui Xu ◽

Xiaopan Chen ◽

Fengbin Zheng ◽

Yang Liu

Keyword(s):

Nearest Neighbor ◽

State Of The Art ◽

Metric Learning ◽

Nearest Neighbors ◽

Distance Metric Learning ◽

Distance Metric ◽

Projected Gradient Method ◽

Proper Distance ◽

Chi Squared ◽

Real World Datasets

Learning a proper distance metric for histogram data plays a crucial role in many computer vision tasks. The chi-squared distance is a nonlinear metric and is widely used to compare histograms. In this paper, we show how to learn a general form of chi-squared distance based on the nearest neighbor model. In our method, the margin of sample is first defined with respect to the nearest hits (nearest neighbors from the same class) and the nearest misses (nearest neighbors from the different classes), and then the simplex-preserving linear transformation is trained by maximizing the margin while minimizing the distance between each sample and its nearest hits. With the iterative projected gradient method for optimization, we naturally introduce thel2,1norm regularization into the proposed method for sparse metric learning. Comparative studies with the state-of-the-art approaches on five real-world datasets verify the effectiveness of the proposed method.

Download Full-text

SetRank: A Setwise Bayesian Approach for Collaborative Ranking from Implicit Feedback

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6077 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6127-6136

Author(s):

Chao Wang ◽

Hengshu Zhu ◽

Chen Zhu ◽

Chuan Qin ◽

Hui Xiong

Keyword(s):

Bayesian Approach ◽

Posterior Probability ◽

State Of The Art ◽

User Preferences ◽

Implicit Feedback ◽

Pairwise Preference ◽

Entire List ◽

Collaborative Ranking ◽

Real World Datasets ◽

Made In

The recent development of online recommender systems has a focus on collaborative ranking from implicit feedback, such as user clicks and purchases. Different from explicit ratings, which reflect graded user preferences, the implicit feedback only generates positive and unobserved labels. While considerable efforts have been made in this direction, the well-known pairwise and listwise approaches have still been limited by various challenges. Specifically, for the pairwise approaches, the assumption of independent pairwise preference is not always held in practice. Also, the listwise approaches cannot efficiently accommodate “ties” due to the precondition of the entire list permutation. To this end, in this paper, we propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to inherently accommodate the characteristics of implicit feedback in recommender system. Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons and can be implemented with matrix factorization and neural networks. Meanwhile, we also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to √M/N, where M and N are the numbers of items and users, respectively. Finally, extensive experiments on four real-world datasets clearly validate the superiority of SetRank compared with various state-of-the-art baselines.

Download Full-text