ALLEVIATING THE SPARSITY PROBLEM OF COLLABORATIVE FILTERING USING AN EFFICIENT ITERATIVE CLUSTERED PREDICTION TECHNIQUE

Author(s):  
AMIRA ABDELWAHAB ◽  
HIROO SEKIYA ◽  
IKUO MATSUBA ◽  
YASUO HORIUCHI ◽  
SHINGO KUROIWA

Collaborative filtering (CF) is one of the most prevalent recommendation techniques, providing personalized recommendations to users based on their previously expressed preferences and those of other similar users. Although CF has been widely applied in various applications, its applicability is restricted due to the data sparsity, the data inadequateness of new users and new items (cold start problem), and the growth of both the number of users and items in the database (scalability problem). In this paper, we propose an efficient iterative clustered prediction technique to transform user-item sparse matrix to a dense one and overcome the scalability problem. In this technique, spectral clustering algorithm is utilized to optimize the neighborhood selection and group the data into users' and items' clusters. Then, both clustered user-based and clustered item-based approaches are aggregated to efficiently predict the unknown ratings. Our experiments on MovieLens and book-crossing data sets indicate substantial and consistent improvements in recommendations accuracy compared to the hybrid user-based and item-based approach without clustering, hybrid approach with k-means and singular value decomposition (SVD)-based CF. Furthermore, we demonstrated the effectiveness of the proposed iterative technique and proved its performance through a varying number of iterations.

2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Zhe Zhang ◽  
Xiyu Liu ◽  
Lin Wang

There are two problems in the traditional spectral clustering algorithm. Firstly, when it uses Gaussian kernel function to construct the similarity matrix, different scale parameters in Gaussian kernel function will lead to different results of the algorithm. Secondly, K-means algorithm is often used in the clustering stage of the spectral clustering algorithm. It needs to initialize the cluster center randomly, which will result in the instability of the results. In this paper, an improved spectral clustering algorithm is proposed to solve these two problems. In constructing a similarity matrix, we proposed an improved Gaussian kernel function, which is based on the distance information of some nearest neighbors and can adaptively select scale parameters. In the clustering stage, beetle antennae search algorithm with damping factor is proposed to complete the clustering to overcome the problem of instability of the clustering results. In the experiment, we use four artificial data sets and seven UCI data sets to verify the performance of our algorithm. In addition, four images in BSDS500 image data sets are segmented in this paper, and the results show that our algorithm is better than other comparison algorithms in image segmentation.


2011 ◽  
Vol 8 (4) ◽  
pp. 1143-1157 ◽  
Author(s):  
Xinyue Liu ◽  
Xing Yong ◽  
Hongfei Lin

Similarity matrix is critical to the performance of spectral clustering. Mercer kernels have become popular largely due to its successes in applying kernel methods such as kernel PCA. A novel spectral clustering method is proposed based on local neighborhood in kernel space (SC-LNK), which assumes that each data point can be linearly reconstructed from its neighbors. The SC-LNK algorithm tries to project the data to a feature space by the Mercer kernel, and then learn a sparse matrix using linear reconstruction as the similarity graph for spectral clustering. Experiments have been performed on synthetic and real world data sets and have shown that spectral clustering based on linear reconstruction in kernel space outperforms the conventional spectral clustering and the other two algorithms, especially in real world data sets.


Author(s):  
Badr Hssina ◽  
Abdelkader Grota ◽  
Mohammed Erritali

<span>Nowadays, recommendation systems are used successfully to provide items (example: movies, music, books, news, images) tailored to user preferences. Amongst the approaches existing to recommend adequate content, we use the collaborative filtering approach of finding the information that satisfies the user by using the reviews of other users. These reviews are stored in matrices that their sizes increase exponentially to predict whether an item is relevant or not. The evaluation shows that these systems provide unsatisfactory recommendations because of what we call the cold start factor. Our objective is to apply a hybrid approach to improve the quality of our recommendation system. The benefit of this approach is the fact that it does not require a new algorithm for calculating the predictions. We are going to apply two algorithms: k-nearest neighbours (KNN) and the matrix factorization algorithm of collaborative filtering which are based on the method of (singular-value-decomposition). Our combined model has a very high precision and the experiments show that our method can achieve better results.</span>


2021 ◽  
Author(s):  
Guangliang Chen

Chen (2018) proposed a scalable spectral clustering algorithm for cosine similarity to handle the task of clustering large data sets. It runs extremely fast, with a linear complexity in the size of the data, and achieves state of the art accuracy. This paper conducts perturbation analysis of the algorithm to understand the effect of discarding a perturbation term in an eigendecomposition step. Our results show that the accuracy of the approximation by the scalable algorithm depends on the connectivity of the clusters, their separation and sizes, and is especially accurate for large data sets.


Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 596
Author(s):  
Krishna Kumar Sharma ◽  
Ayan Seal ◽  
Enrique Herrera-Viedma ◽  
Ondrej Krejcar

Calculating and monitoring customer churn metrics is important for companies to retain customers and earn more profit in business. In this study, a churn prediction framework is developed by modified spectral clustering (SC). However, the similarity measure plays an imperative role in clustering for predicting churn with better accuracy by analyzing industrial data. The linear Euclidean distance in the traditional SC is replaced by the non-linear S-distance (Sd). The Sd is deduced from the concept of S-divergence (SD). Several characteristics of Sd are discussed in this work. Assays are conducted to endorse the proposed clustering algorithm on four synthetics, eight UCI, two industrial databases and one telecommunications database related to customer churn. Three existing clustering algorithms—k-means, density-based spatial clustering of applications with noise and conventional SC—are also implemented on the above-mentioned 15 databases. The empirical outcomes show that the proposed clustering algorithm beats three existing clustering algorithms in terms of its Jaccard index, f-score, recall, precision and accuracy. Finally, we also test the significance of the clustering results by the Wilcoxon’s signed-rank test, Wilcoxon’s rank-sum test, and sign tests. The relative study shows that the outcomes of the proposed algorithm are interesting, especially in the case of clusters of arbitrary shape.


Sign in / Sign up

Export Citation Format

Share Document