similarity measure
Recently Published Documents


TOTAL DOCUMENTS

2568
(FIVE YEARS 555)

H-INDEX

64
(FIVE YEARS 8)

2022 ◽  
Vol 16 (3) ◽  
pp. 1-32
Author(s):  
Junchen Jin ◽  
Mark Heimann ◽  
Di Jin ◽  
Danai Koutra

While most network embedding techniques model the proximity between nodes in a network, recently there has been significant interest in structural embeddings that are based on node equivalences , a notion rooted in sociology: equivalences or positions are collections of nodes that have similar roles—i.e., similar functions, ties or interactions with nodes in other positions—irrespective of their distance or reachability in the network. Unlike the proximity-based methods that are rigorously evaluated in the literature, the evaluation of structural embeddings is less mature. It relies on small synthetic or real networks with labels that are not perfectly defined, and its connection to sociological equivalences has hitherto been vague and tenuous. With new node embedding methods being developed at a breakneck pace, proper evaluation, and systematic characterization of existing approaches will be essential to progress. To fill in this gap, we set out to understand what types of equivalences structural embeddings capture. We are the first to contribute rigorous intrinsic and extrinsic evaluation methodology for structural embeddings, along with carefully-designed, diverse datasets of varying sizes. We observe a number of different evaluation variables that can lead to different results (e.g., choice of similarity measure, classifier, and label definitions). We find that degree distributions within nodes’ local neighborhoods can lead to simple yet effective baselines in their own right and guide the future development of structural embedding. We hope that our findings can influence the design of further node embedding methods and also pave the way for more comprehensive and fair evaluation of structural embedding methods.


2022 ◽  
Vol 24 (3) ◽  
pp. 0-0

The cost-effective and easy availability of handheld mobile devices and ubiquity of location acquisition services such as GPS and GSM networks has helped expedient logging and sharing of location histories of mobile users. This work aims to find semantic user similarity using their past travel histories. Application of the semantic similarity measure can be found in tourism-related recommender systems and information retrieval. The paper presents Earth Mover’s Distance (EMD) based semantic user similarity measure using users' GPS logs. The similarity measure is applied and evaluated on the GPS dataset of 182 users collected from April 2007 to August 2012 by Microsoft's GeoLife project. The proposed similarity measure is compared with conventional similarity measures used in literature such as Jaccard, Dice, and Pearsons’ Correlation. The percentage improvement of EMD based approach over existing approaches in terms of average RMSE is 10.70%, and average MAE is 5.73%.


2022 ◽  
Vol 24 (3) ◽  
pp. 1-17
Author(s):  
Sunita Tiwari ◽  
Saroj Kaushik

The cost-effective and easy availability of handheld mobile devices and ubiquity of location acquisition services such as GPS and GSM networks has helped expedient logging and sharing of location histories of mobile users. This work aims to find semantic user similarity using their past travel histories. Application of the semantic similarity measure can be found in tourism-related recommender systems and information retrieval. The paper presents Earth Mover’s Distance (EMD) based semantic user similarity measure using users' GPS logs. The similarity measure is applied and evaluated on the GPS dataset of 182 users collected from April 2007 to August 2012 by Microsoft's GeoLife project. The proposed similarity measure is compared with conventional similarity measures used in literature such as Jaccard, Dice, and Pearsons’ Correlation. The percentage improvement of EMD based approach over existing approaches in terms of average RMSE is 10.70%, and average MAE is 5.73%.


2022 ◽  
Author(s):  
Andy Lin ◽  
Brooke L. Deatherage Kaiser ◽  
Janine R. Hutchison ◽  
Jeffrey A. Bilmes ◽  
William Stafford Noble

Interpretation of newly acquired mass spectrometry data can be improved by identifying, from an online repos- itory, previous mass spectrometry runs that resemble the new data. However, this retrieval task requires comput- ing the similarity between an arbitrary pair of mass spectrometry runs. This is particularly challenging for runs acquired using different experimental protocols. We propose a method, MS1Connect, that calculates the simi- larity between a pair of runs by examining only the intact peptide (MS1) scans, and we show evidence that the MS1Connect score is accurate. Specifically, we show that MS1Connect outperforms several baseline methods on the task of predicting the species from which a given proteomics sample originated. In addition, we show that MS1Connect scores are highly correlated with similarities computed from fragment (MS2) scans, even though this data is not used by MS1Connect.


2022 ◽  
Vol 3 ◽  
Author(s):  
Günther Wirsching

Reasonable quantification of uncertainty is a major issue of cognitive infocommunications, and logic is a backbone for successful communication. Here, an axiomatic approach to quantum logic, which highlights similarity to and differences to classical logic, is presented. The axiomatic method ensures that applications are not restricted to quantum physics. Based on this, algorithms are developed that assign to an incoming signal a similarity measure to a pattern generated by a set of training signals.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Juan J. Lastra-Díaz ◽  
Alicia Lara-Clares ◽  
Ana Garcia-Serrano

Abstract Background Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic measures libraries based on the aforementioned ontologies. However, current state-of-the-art semantic measures libraries have some performance and scalability drawbacks derived from their ontology representations based on relational databases, or naive in-memory graph representations. Likewise, a recent reproducible survey on word similarity shows that one hybrid IC-based measure which integrates a shortest-path computation sets the state of the art in the family of ontology-based semantic measures. However, the lack of an efficient shortest-path algorithm for their real-time computation prevents both their practical use in any application and the use of any other path-based semantic similarity measure. Results To bridge the two aforementioned gaps, this work introduces for the first time an updated version of the HESML Java software library especially designed for the biomedical domain, which implements the most efficient and scalable ontology representation reported in the literature, together with a new method for the approximation of the Dijkstra’s algorithm for taxonomies, called Ancestors-based Shortest-Path Length (AncSPL), which allows the real-time computation of any path-based semantic similarity measure. Conclusions We introduce a set of reproducible benchmarks showing that HESML outperforms by several orders of magnitude the current state-of-the-art libraries in the three aforementioned biomedical ontologies, as well as the real-time performance and approximation quality of the new AncSPL shortest-path algorithm. Likewise, we show that AncSPL linearly scales regarding the dimension of the common ancestor subgraph regardless of the ontology size. Path-based measures based on the new AncSPL algorithm are up to six orders of magnitude faster than their exact implementation in large ontologies like SNOMED-CT and GO. Finally, we provide a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.


2022 ◽  
Vol 11 (2) ◽  
pp. 167-180
Author(s):  
Laxminarayan Sahoo

The intention of this paper is to propose some similarity measures between Fermatean fuzzy sets (FFSs). Firstly, we propose some score based similarity measures for finding similarity measures of FFSs and also propose score based cosine similarity measures between FFSs. Furthermore, we introduce three newly scored functions for effective uses of Fermatean fuzzy sets and discuss some relevant properties of cosine similarity measure. Fermatean fuzzy sets introduced by Senapati and Yager can manipulate uncertain information more easily in the process of multi-criteria decision making (MCDM) and group decision making. Here, we investigate score based similarity measures of Fermatean fuzzy sets and scout the uses of FFSs in pattern recognition. Based on different types of similarity measures a pattern recognition problem viz. personnel appointment is presented to describe the use of FFSs and its similarity measure as well as scores. The counterfeit results show that the proposed method is more malleable than the existing method(s). Finally, concluding remarks and the scope of future research of the proposed approach are given.


2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

The traditional frequency based approach to creating multi-document extractive summary ranks sentences based on scores computed by summing up TF*IDF weights of words contained in the sentences. In this approach, TF or term frequency is calculated based on how frequently a term (word) occurs in the input and TF calculated in this way does not take into account the semantic relations among terms. In this paper, we propose methods that exploits semantic term relations for improving sentence ranking and redundancy removal steps of a summarization system. Our proposed summarization system has been tested on DUC 2003 and DUC 2004 benchmark multi-document summarization datasets. The experimental results reveal that performance of our multi-document text summarizer is significantly improved when the distributional term similarity measure is used for finding semantic term relations. Our multi-document text summarizer also outperforms some well known summarization baselines to which it is compared.


Sign in / Sign up

Export Citation Format

Share Document