A collaborative filtering similarity measure based on potential field

Kybernetes ◽  
2016 ◽  
Vol 45 (3) ◽  
pp. 434-445 ◽  
Author(s):  
Yajun Leng ◽  
Qing Lu ◽  
Changyong Liang

Purpose – Collaborative recommender systems play a crucial role in providing personalized services to online consumers. Most online shopping sites and many other applications now use the collaborative recommender systems. The measurement of the similarity plays a fundamental role in collaborative recommender systems. Some of the most well-known similarity measures are: Pearson’s correlation coefficient, cosine similarity and mean squared differences. However, due to data sparsity, accuracy of the above similarity measures decreases, which makes the formation of inaccurate neighborhood, thereby resulting in poor recommendations. The purpose of this paper is to propose a novel similarity measure based on potential field. Design/methodology/approach – The proposed approach constructs a dense matrix: user-user potential matrix, and uses this matrix to compute potential similarities between users. Then the potential similarities are modified based on users’ preliminary neighborhoods, and k users with the highest modified similarity values are selected as the active user’s nearest neighbors. Compared to the rating matrix, the potential matrix is much denser. Thus, the sparsity problem can be efficiently alleviated. The similarity modification scheme considers the number of common neighbors of two users, which can further improve the accuracy of similarity computation. Findings – Experimental results show that the proposed approach is superior to the traditional similarity measures. Originality/value – The research highlights of this paper are as follows: the authors construct a dense matrix: user-user potential matrix, and use this matrix to compute potential similarities between users; the potential similarities are modified based on users’ preliminary neighborhoods, and k users with the highest modified similarity values are selected as the active user’s nearest neighbors; and the proposed approach performs better than the traditional similarity measures. The manuscript will be of particular interests to the scientists interested in recommender systems research as well as to readers interested in solution of related complex practical engineering problems.

2015 ◽  
Vol 37 ◽  
pp. 339 ◽  
Author(s):  
Saeed Garmsiri ◽  
Ali Hamzeh

Trust network in social networks can be considered as graph which trustors and trustees are graph vertices and edges present trust between them with measured values. To evaluate trust between trustors and trustees there is some similarity measures to measure similarity between trustors together or trustees together and then by using evaluated values predict trust value between them. Similarity measure has important effect on final accuracy. In this paper we propose graph based similarity measure. Similarity between two users is computed by connection between them on graph then this computed similarity used with k- nearest neighbors method to evaluate(predict) trust between users. To the best of our knowledge this is the first work introduces graph based similarity measure, empirical results on two real datasets show accuracy of predicted trust using proposed similarity measure outperforms accuracy of method without it.


2019 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Manjula Wijewickrema ◽  
Vivien Petras ◽  
Naomal Dias

Purpose The purpose of this paper is to develop a journal recommender system, which compares the content similarities between a manuscript and the existing journal articles in two subject corpora (covering the social sciences and medicine). The study examines the appropriateness of three text similarity measures and the impact of numerous aspects of corpus documents on system performance. Design/methodology/approach Implemented three similarity measures one at a time on a journal recommender system with two separate journal corpora. Two distinct samples of test abstracts were classified and evaluated based on the normalized discounted cumulative gain. Findings The BM25 similarity measure outperforms both the cosine and unigram language similarity measures overall. The unigram language measure shows the lowest performance. The performance results are significantly different between each pair of similarity measures, while the BM25 and cosine similarity measures are moderately correlated. The cosine similarity achieves better performance for subjects with higher density of technical vocabulary and shorter corpus documents. Moreover, increasing the number of corpus journals in the domain of social sciences achieved better performance for cosine similarity and BM25. Originality/value This is the first work related to comparing the suitability of a number of string-based similarity measures with distinct corpora for journal recommender systems.


2015 ◽  
Vol 14 (05) ◽  
pp. 947-970 ◽  
Author(s):  
Jiajin Hunag ◽  
Xi Yuan ◽  
Ning Zhong ◽  
Yiyu Yao

A recommender system aims at recommending items that users might be interested in. With an increasing popularity of social tagging systems, it becomes urgent to model recommendations on users, items, and tags in a unified way. In this paper, we propose a framework for studying recommender systems by modeling user preferences as a relation on (user, item, tag) triples. We discuss tag-aware recommender systems from two aspects. On the one hand, we compute associations between users and items related to tags by using an adaptive method and recommend tags to users or predict item properties for users. On the other hand, by taking the similarity-based recommendation as a case study, we discuss similarity measures from both qualitative and quantitative perspectives and k-nearest neighbors and reverse k-nearest neighbors for recommendations.


2021 ◽  
Vol 11 (13) ◽  
pp. 6108
Author(s):  
Jehan Al-Safi ◽  
Cihan Kaleli

A technique employed by recommendation systems is collaborative filtering,,which predicts the item ratings and recommends the items that may be interesting to the user. Naturally, users have diverse opinions, and only trusting user ratings of products may produce inaccurate recommendations. Therefore, it is essential to offer a new similarity measure that enhances recommendation accuracy, even for customers who only leave a few ratings. Thus, this article proposes an algorithm for user similarity measures that exploit item genre information to make more accurate recommendations. This algorithm measures the relationship between users using item genre information, discovers the active user’s nearest neighbors in each genre, and finds the final nearest neighbors list who can share with them the same preference in a genre. Finally, it predicts the active-user rating of items using a definite prediction procedure. To measure the accuracy, we propose new evaluation criteria: the rating level and reliability among users, according to rating level. We implement the proposed method on real datasets. The empirical results clarify that the proposed algorithm produces a predicted rating accuracy, rating level, and reliability between users, which are better than many existing collaborative filtering algorithms.


Author(s):  
B. Mathura Bai ◽  
N. Mangathayaru ◽  
B. Padmaja Rani ◽  
Shadi Aljawarneh

: Missing attribute values in medical datasets are one of the most common problems faced when mining medical datasets. Estimation of missing values is a major challenging task in pre-processing of datasets. Any wrong estimate of missing attribute values can lead to inefficient and improper classification thus resulting in lower classifier accuracies. Similarity measures play a key role during the imputation process. The use of an appropriate and better similarity measure can help to achieve better imputation and improved classification accuracies. This paper proposes a novel imputation measure for finding similarity between missing and non-missing instances in medical datasets. Experiments are carried by applying both the proposed imputation technique and popular benchmark existing imputation techniques. Classification is carried using KNN, J48, SMO and RBFN classifiers. Experiment analysis proved that after imputation of medical records using proposed imputation technique, the resulting classification accuracies reported by the classifiers KNN, J48 and SMO have improved when compared to other existing benchmark imputation techniques.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Ali A. Amer ◽  
Hassan I. Abdalla

Abstract Similarity measures have long been utilized in information retrieval and machine learning domains for multi-purposes including text retrieval, text clustering, text summarization, plagiarism detection, and several other text-processing applications. However, the problem with these measures is that, until recently, there has never been one single measure recorded to be highly effective and efficient at the same time. Thus, the quest for an efficient and effective similarity measure is still an open-ended challenge. This study, in consequence, introduces a new highly-effective and time-efficient similarity measure for text clustering and classification. Furthermore, the study aims to provide a comprehensive scrutinization for seven of the most widely used similarity measures, mainly concerning their effectiveness and efficiency. Using the K-nearest neighbor algorithm (KNN) for classification, the K-means algorithm for clustering, and the bag of word (BoW) model for feature selection, all similarity measures are carefully examined in detail. The experimental evaluation has been made on two of the most popular datasets, namely, Reuters-21 and Web-KB. The obtained results confirm that the proposed set theory-based similarity measure (STB-SM), as a pre-eminent measure, outweighs all state-of-art measures significantly with regards to both effectiveness and efficiency.


2021 ◽  
Vol 10 (2) ◽  
pp. 90
Author(s):  
Jin Zhu ◽  
Dayu Cheng ◽  
Weiwei Zhang ◽  
Ci Song ◽  
Jie Chen ◽  
...  

People spend more than 80% of their time in indoor spaces, such as shopping malls and office buildings. Indoor trajectories collected by indoor positioning devices, such as WiFi and Bluetooth devices, can reflect human movement behaviors in indoor spaces. Insightful indoor movement patterns can be discovered from indoor trajectories using various clustering methods. These methods are based on a measure that reflects the degree of similarity between indoor trajectories. Researchers have proposed many trajectory similarity measures. However, existing trajectory similarity measures ignore the indoor movement constraints imposed by the indoor space and the characteristics of indoor positioning sensors, which leads to an inaccurate measure of indoor trajectory similarity. Additionally, most of these works focus on the spatial and temporal dimensions of trajectories and pay less attention to indoor semantic information. Integrating indoor semantic information such as the indoor point of interest into the indoor trajectory similarity measurement is beneficial to discovering pedestrians having similar intentions. In this paper, we propose an accurate and reasonable indoor trajectory similarity measure called the indoor semantic trajectory similarity measure (ISTSM), which considers the features of indoor trajectories and indoor semantic information simultaneously. The ISTSM is modified from the edit distance that is a measure of the distance between string sequences. The key component of the ISTSM is an indoor navigation graph that is transformed from an indoor floor plan representing the indoor space for computing accurate indoor walking distances. The indoor walking distances and indoor semantic information are fused into the edit distance seamlessly. The ISTSM is evaluated using a synthetic dataset and real dataset for a shopping mall. The experiment with the synthetic dataset reveals that the ISTSM is more accurate and reasonable than three other popular trajectory similarities, namely the longest common subsequence (LCSS), edit distance on real sequence (EDR), and the multidimensional similarity measure (MSM). The case study of a shopping mall shows that the ISTSM effectively reveals customer movement patterns of indoor customers.


2021 ◽  
Vol 13 (1) ◽  
pp. 1-25
Author(s):  
Michael Loster ◽  
Ioannis Koumarelas ◽  
Felix Naumann

The integration of multiple data sources is a common problem in a large variety of applications. Traditionally, handcrafted similarity measures are used to discover, merge, and integrate multiple representations of the same entity—duplicates—into a large homogeneous collection of data. Often, these similarity measures do not cope well with the heterogeneity of the underlying dataset. In addition, domain experts are needed to manually design and configure such measures, which is both time-consuming and requires extensive domain expertise. We propose a deep Siamese neural network, capable of learning a similarity measure that is tailored to the characteristics of a particular dataset. With the properties of deep learning methods, we are able to eliminate the manual feature engineering process and thus considerably reduce the effort required for model construction. In addition, we show that it is possible to transfer knowledge acquired during the deduplication of one dataset to another, and thus significantly reduce the amount of data required to train a similarity measure. We evaluated our method on multiple datasets and compare our approach to state-of-the-art deduplication methods. Our approach outperforms competitors by up to +26 percent F-measure, depending on task and dataset. In addition, we show that knowledge transfer is not only feasible, but in our experiments led to an improvement in F-measure of up to +4.7 percent.


Author(s):  
Djamel Guessoum ◽  
Moeiz Miraoui ◽  
Chakib Tadj

Purpose This paper aims to apply a contextual case-based reasoning (CBR) to a mobile device. The CBR method was chosen because it does not require training, demands minimal processing resources and easily integrates with the dynamic and uncertain nature of pervasive computing. Based on a mobile user’s location and activity, which can be determined through the device’s inertial sensors and GPS capabilities, it is possible to select and offer appropriate services to this user. Design/methodology/approach The proposed approach comprises two stages. The first stage uses simple semantic similarity measures to retrieve the case from the case base that best matches the current case. In the second stage, the obtained selection of services is then filtered based on current contextual information. Findings This two-stage method adds a higher level of relevance to the services proposed to the user; yet, it is easy to implement on a mobile device. Originality/value A two-stage CBR using light processing methods and generating context aware services is discussed. Ontological location modeling adds reasoning flexibility and knowledge sharing capabilities.


Sign in / Sign up

Export Citation Format

Share Document