Efficient Retrieval of Matrix Factorization-Based Top-k Recommendations: A Survey of Recent Approaches

Top-k recommendation seeks to deliver a personalized list of k items to each individual user. An established methodology in the literature based on matrix factorization (MF), which usually represents users and items as vectors in low-dimensional space, is an effective approach to recommender systems, thanks to its superior performance in terms of recommendation quality and scalability. A typical matrix factorization recommender system has two main phases: preference elicitation and recommendation retrieval. The former analyzes user-generated data to learn user preferences and item characteristics in the form of latent feature vectors, whereas the latter ranks the candidate items based on the learnt vectors and returns the top-k items from the ranked list. For preference elicitation, there have been numerous works to build accurate MF-based recommendation algorithms that can learn from large datasets. However, for the recommendation retrieval phase, naively scanning a large number of items to identify the few most relevant ones may inhibit truly real-time applications. In this work, we survey recent advances and state-of-the-art approaches in the literature that enable fast and accurate retrieval for MF-based personalized recommendations. Also, we include analytical discussions of approaches along different dimensions to provide the readers with a more comprehensive understanding of the surveyed works.

Download Full-text

Community detection in complex network by network embedding and density clustering

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202961 ◽

2021 ◽

pp. 1-12

Author(s):

JinFang Sheng ◽

Huaiyu Zuo ◽

Bin Wang ◽

Qiong Li

Keyword(s):

Complex Network ◽

Community Detection ◽

Dimensional Space ◽

Detection Algorithm ◽

Superior Performance ◽

Network Embedding ◽

Detection Algorithms ◽

Density Clustering ◽

Community Detection Algorithm ◽

Low Dimensional

In a complex network system, the structure of the network is an extremely important element for the analysis of the system, and the study of community detection algorithms is key to exploring the structure of the complex network. Traditional community detection algorithms would represent the network using an adjacency matrix based on observations, which may contain redundant information or noise that interferes with the detection results. In this paper, we propose a community detection algorithm based on density clustering. In order to improve the performance of density clustering, we consider an algorithmic framework for learning the continuous representation of network nodes in a low-dimensional space. The network structure is effectively preserved through network embedding, and density clustering is applied in the embedded low-dimensional space to compute the similarity of nodes in the network, which in turn reveals the implied structure in a given network. Experiments show that the algorithm has superior performance compared to other advanced community detection algorithms for real-world networks in multiple domains as well as synthetic networks, especially when the network data chaos is high.

Download Full-text

An anomaly detection method based on double encoder–decoder generative adversarial networks

Industrial Robot the international journal of robotics research and application ◽

10.1108/ir-09-2020-0200 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Hui Liu ◽

Tinglong Tang ◽

Jake Luo ◽

Meng Zhao ◽

Baole Zheng ◽

...

Keyword(s):

Anomaly Detection ◽

Latent Variables ◽

Dimensional Space ◽

Superior Performance ◽

Generative Adversarial Networks ◽

Training Process ◽

Content Type ◽

Detection Model ◽

Adversarial Networks ◽

Low Dimensional

Purpose This study aims to address the challenge of training a detection model for the robot to detect the abnormal samples in the industrial environment, while abnormal patterns are very rare under this condition. Design/methodology/approach The authors propose a new model with double encoder–decoder (DED) generative adversarial networks to detect anomalies when the model is trained without any abnormal patterns. The DED approach is used to map high-dimensional input images to a low-dimensional space, through which the latent variables are obtained. Minimizing the change in the latent variables during the training process helps the model learn the data distribution. Anomaly detection is achieved by calculating the distance between two low-dimensional vectors obtained from two encoders. Findings The proposed method has better accuracy and F1 score when compared with traditional anomaly detection models. Originality/value A new architecture with a DED pipeline is designed to capture the distribution of images in the training process so that anomalous samples are accurately identified. A new weight function is introduced to control the proportion of losses in the encoding reconstruction and adversarial phases to achieve better results. An anomaly detection model is proposed to achieve superior performance against prior state-of-the-art approaches.

Download Full-text

Prediction of Disease-related microRNAs through Integrating Attributes of microRNA Nodes and Multiple Kinds of Connecting Edges

Molecules ◽

10.3390/molecules24173099 ◽

2019 ◽

Vol 24 (17) ◽

pp. 3099 ◽

Cited By ~ 3

Author(s):

Xuan ◽

Li ◽

Zhang ◽

Song

Keyword(s):

State Of The Art ◽

Dimensional Space ◽

Nonnegative Matrix ◽

Superior Performance ◽

Pancreatic Cancers ◽

Node Attribute ◽

Disease Associations ◽

Node Attributes ◽

Novel Method ◽

Low Dimensional

Identifying disease-associated microRNAs (disease miRNAs) contributes to the understanding of disease pathogenesis. Most previous computational biology studies focused on multiple kinds of connecting edges of miRNAs and diseases, including miRNA–miRNA similarities, disease–disease similarities, and miRNA–disease associations. Few methods exploited the node attribute information related to miRNA family and cluster. The previous methods do not completely consider the sparsity of node attributes. Additionally, it is challenging to deeply integrate the node attributes of miRNAs and the similarities and associations related to miRNAs and diseases. In the present study, we propose a novel method, known as MDAPred, based on nonnegative matrix factorization to predict candidate disease miRNAs. MDAPred integrates the node attributes of miRNAs and the related similarities and associations of miRNAs and diseases. Since a miRNA is typically subordinate to a family or a cluster, the node attributes of miRNAs are sparse. Similarly, the data for miRNA and disease similarities are sparse. Projecting the miRNA and disease similarities and miRNA node attributes into a common low-dimensional space contributes to estimating miRNA-disease associations. Simultaneously, the possibility that a miRNA is associated with a disease depends on the miRNA’s neighbour information. Therefore, MDAPred deeply integrates projections of multiple kinds of connecting edges, projections of miRNAs node attributes, and neighbour information of miRNAs. The cross-validation results showed that MDAPred achieved superior performance compared to other state-of-the-art methods for predicting disease-miRNA associations. MDAPred can also retrieve more actual miRNA-disease associations at the top of prediction results, which is very important for biologists. Additionally, case studies of breast, lung, and pancreatic cancers further confirmed the ability of MDAPred to discover potential miRNA–disease associations.

Download Full-text

Simple Surveys: Response Retrieval Inspired by Recommendation Systems

Social Science Computer Review ◽

10.1177/0894439319848374 ◽

2019 ◽

pp. 089443931984837

Author(s):

Nandana Sengupta ◽

Nati Srebro ◽

James Evans

Keyword(s):

Social Science ◽

Digital Media ◽

Predictive Accuracy ◽

Dimensional Space ◽

Cognitive Effort ◽

User Preferences ◽

Machine Learning Algorithms ◽

Perceived Safety ◽

Science Application ◽

Low Dimensional

In the last decade, the use of simple rating and comparison surveys has proliferated on social and digital media platforms to fuel recommendations. These simple surveys and their extrapolation with machine learning algorithms such as matrix factorization shed light on user preferences over large and growing pools of items such as movies, songs, and ads. Social scientists also have a long history of measuring perceptions, preferences, and opinions, typically often over smaller, discrete item sets with exhaustive rating or ranking surveys. This article introduces simple surveys for social science application. We ran experiments to compare the predictive accuracy of both individual and aggregate comparative assessments using four types of simple surveys—pairwise comparisons (PCs) and ratings on 2, 5, and continuous point scales in three contexts—perceived safety of Google Street View images, likability of artwork, and hilarity of animal GIFs. Across contexts, we find that continuous scale ratings best predict individual assessments but consume the most time and cognitive effort. Binary choice surveys are quick and best predict aggregate assessments, useful for collective decision tasks, but poorly predict personalized preferences, for which they are currently used by Netflix to recommend movies. PCs, by contrast, successfully predict personal assessments but poorly predict aggregate assessments despite being widely used to crowdsource ideas and collective preferences. We also demonstrate how findings from these surveys can be visualized in a low-dimensional space to reveal distinct respondent interpretations of questions asked in each context. We conclude by reflecting on differences between sparse, incomplete “simple surveys” and their traditional survey counterparts in terms of efficiency, information elicited, and settings in which knowing less about more may be critical for social science.

Download Full-text

Network-Specific Variational Auto-Encoder for Embedding in Attribute Networks

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/370 ◽

2019 ◽

Cited By ~ 6

Author(s):

Di Jin ◽

Bingyi Li ◽

Pengfei Jiao ◽

Dongxiao He ◽

Weixiong Zhang

Keyword(s):

Information Integration ◽

Dimensional Space ◽

Gaussian Mixture ◽

Superior Performance ◽

Network Structures ◽

New Approach ◽

Network Topologies ◽

Node Attributes ◽

Novel Method ◽

Low Dimensional

Network embedding (NE) maps a network into a low-dimensional space while preserving intrinsic features of the network. Variational Auto-Encoder (VAE) has been actively studied for NE. These VAE-based methods typically utilize both network topologies and node semantics and treat these two types of data in the same way. However, the information of network topology and information of node semantics are orthogonal and are often from different sources; the former quantifies coupling relationships among nodes, whereas the latter represents node specific properties. Ignoring this difference affects NE. To address this issue, we develop a network-specific VAE for NE, named as NetVAE. In the encoding phase of our new approach, compression of network structures and compression of node attributes share the same encoder in order to perform co-training to achieve transfer learning and information integration. In the decoding phase, a dual decoder is introduced to reconstruct network topologies and node attributes separately. Specifically, as a part of the dual decoder, we develop a novel method based on a Gaussian mixture model and the block model to reconstruct network structures. Extensive experiments on large real-world networks demonstrate a superior performance of the new approach over the state-of-the-art methods.

Download Full-text

A Convolutional Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction

10.1101/2020.09.29.318642 ◽

2020 ◽

Author(s):

Ziqi Ke ◽

Haris Vikalo

Keyword(s):

High Throughput Sequencing ◽

Dimensional Space ◽

Superior Performance ◽

Stochastic Gradient Descent ◽

Viral Quasispecies ◽

Sequencing Data ◽

Consensus Sequences ◽

Haplotype Assembly ◽

Sequencing Technologies ◽

Low Dimensional

AbstractHaplotype assembly and viral quasispecies reconstruction are challenging tasks concerned with analysis of genomic mixtures using sequencing data. High-throughput sequencing technologies generate enormous amounts of short fragments (reads) which essentially oversample components of a mixture; the representation redundancy enables reconstruction of the components (haplotypes, viral strains). The reconstruction problem, known to be NP-hard, boils down to grouping together reads originating from the same component in a mixture. Existing methods struggle to solve this problem with required level of accuracy and low runtimes; the problem is becoming increasingly more challenging as the number and length of the components increase. This paper proposes a read clustering method based on a convolutional auto-encoder designed to first project sequenced fragments to a low-dimensional space and then estimate the probability of the read origin using learned embedded features. The components are reconstructed by finding consensus sequences that agglomerate reads from the same origin. Mini-batch stochastic gradient descent and dimension reduction of reads allow the proposed method to efficiently deal with massive numbers of long reads. Experiments on simulated, semi-experimental and experimental data demonstrate the ability of the proposed method to accurately reconstruct haplotypes and viral quasispecies, often demonstrating superior performance compared to state-of-the-art methods.

Download Full-text

Improving Top-NRecommendation Performance Using Missing Data

Mathematical Problems in Engineering ◽

10.1155/2015/380472 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 4

Author(s):

Xiangyu Zhao ◽

Zhendong Niu ◽

Kaiyi Wang ◽

Ke Niu ◽

Zhongqiang Liu

Keyword(s):

Missing Data ◽

Recommender Systems ◽

Matrix Factorization ◽

State Of The Art ◽

User Preferences ◽

Missing Not At Random ◽

Main Challenge ◽

Recommendation Algorithms ◽

Random Part ◽

Problem Data

Recommender systems become increasingly significant in solving the information explosion problem. Data sparse is a main challenge in this area. Massive unrated items constitute missing data with only a few observed ratings. Most studies consider missing data as unknown information and only use observed data to learn models and generate recommendations. However, data are missing not at random. Part of missing data is due to the fact that users choose not to rate them. This part of missing data is negative examples of user preferences. Utilizing this information is expected to leverage the performance of recommendation algorithms. Unfortunately, negative examples are mixed with unlabeled positive examples in missing data, and they are hard to be distinguished. In this paper, we propose three schemes to utilize the negative examples in missing data. The schemes are then adapted with SVD++, which is a state-of-the-art matrix factorization recommendation approach, to generate recommendations. Experimental results on two real datasets show that our proposed approaches gain better top-Nperformance than the baseline ones on both accuracy and diversity.

Download Full-text

Predicting miRNA-Disease Associations by Incorporating Projections in Low-Dimensional Space and Local Topological Information

Genes ◽

10.3390/genes10090685 ◽

2019 ◽

Vol 10 (9) ◽

pp. 685 ◽

Cited By ~ 1

Author(s):

Xuan ◽

Zhang ◽

Li ◽

Zhao

Keyword(s):

Dimensional Space ◽

Characteristic Curve ◽

Feature Space ◽

Superior Performance ◽

Topological Information ◽

Heterogeneous Information ◽

Feature Representations ◽

Disease Associations ◽

Precision Recall Curve ◽

Low Dimensional

Predicting the potential microRNA (miRNA) candidates associated with a disease helps in exploring the mechanisms of disease development. Most recent approaches have utilized heterogeneous information about miRNAs and diseases, including miRNA similarities, disease similarities, and miRNA-disease associations. However, these methods do not utilize the projections of miRNAs and diseases in a low-dimensional space. Thus, it is necessary to develop a method that can utilize the effective information in the low-dimensional space to predict potential disease-related miRNA candidates. We proposed a method based on non-negative matrix factorization, named DMAPred, to predict potential miRNA-disease associations. DMAPred exploits the similarities and associations of diseases and miRNAs, and it integrates local topological information of the miRNA network. The likelihood that a miRNA is associated with a disease also depends on their projections in low-dimensional space. Therefore, we project miRNAs and diseases into low-dimensional feature space to yield their low-dimensional and dense feature representations. Moreover, the sparse characteristic of miRNA-disease associations was introduced to make our predictive model more credible. DMAPred achieved superior performance for 15 well-characterized diseases with AUCs (area under the receiver operating characteristic curve) ranging from 0.860 to 0.973 and AUPRs (area under the precision-recall curve) ranging from 0.118 to 0.761. In addition, case studies on breast, prostatic, and lung neoplasms demonstrated the ability of DMAPred to discover potential disease-related miRNAs.

Download Full-text

Deep multiple non-negative matrix factorization for multi-view clustering

Intelligent Data Analysis ◽

10.3233/ida-195075 ◽

2021 ◽

Vol 25 (2) ◽

pp. 339-357

Author(s):

Guowang Du ◽

Lihua Zhou ◽

Kevin Lü ◽

Haiyan Ding

Keyword(s):

Matrix Factorization ◽

Superior Performance ◽

Dimensional Representation ◽

Geometric Information ◽

Heterogeneous Information ◽

Benchmark Datasets ◽

Abstract Level ◽

Low Dimensional ◽

Shallow Structure ◽

Non Negative Matrix Factorization

Multi-view clustering aims to group similar samples into the same clusters and dissimilar samples into different clusters by integrating heterogeneous information from multi-view data. Non-negative matrix factorization (NMF) has been widely applied to multi-view clustering owing to its interpretability. However, most NMF-based algorithms only factorize multi-view data based on the shallow structure, neglecting complex hierarchical and heterogeneous information in multi-view data. In this paper, we propose a deep multiple non-negative matrix factorization (DMNMF) framework based on AutoEncoder for multi-view clustering. DMNMF consists of multiple Encoder Components and Decoder Components with deep structures. Each pair of Encoder Component and Decoder Component are used to hierarchically factorize the input data from a view for capturing the hierarchical information, and all Encoder and Decoder Components are integrated into an abstract level to learn a common low-dimensional representation for combining the heterogeneous information across multi-view data. Furthermore, graph regularizers are also introduced to preserve the local geometric information of each view. To optimize the proposed framework, an iterative updating scheme is developed. Besides, the corresponding algorithm called MVC-DMNMF is also proposed and implemented. Extensive experiments on six benchmark datasets have been conducted, and the experimental results demonstrate the superior performance of our proposed MVC-DMNMF for multi-view clustering compared to other baseline algorithms.

Download Full-text

Convolutional Gaussian Embeddings for Personalized Recommendation with Uncertainty

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/367 ◽

2019 ◽

Cited By ~ 2

Author(s):

Junyang Jiang ◽

Deqing Yang ◽

Yanghua Xiao ◽

Chenlu Shen

Keyword(s):

State Of The Art ◽

Dimensional Space ◽

Monte Carlo Sampling ◽

Superior Performance ◽

Personalized Recommendation ◽

Latent Features ◽

Benchmark Datasets ◽

Low Dimensional ◽

Uncertain Preferences ◽

Candidate Item

Most of existing embedding based recommendation models use embeddings (vectors) to represent users and items which contain latent features of users and items. Each of such embeddings corresponds to a single fixed point in low-dimensional space, thus fails to precisely represent the users/items with uncertainty which are often observed in recommender systems. Addressing this problem, we propose a unified deep recommendation framework employing Gaussian embeddings, which are proven adaptive to uncertain preferences exhibited by some users, resulting in better user representations and recommendation performance. Furthermore, our framework adopts Monte-Carlo sampling and convolutional neural networks to compute the correlation between the objective user and the candidate item, based on which precise recommendations are achieved. Our extensive experiments on two benchmark datasets not only justify that our proposed Gaussian embeddings capture the uncertainty of users very well, but also demonstrate its superior performance over the state-of-the-art recommendation models.

Download Full-text