pairwise similarity Latest Research Papers

The cross-modal hashing method can map heterogeneous multimodal data into a compact binary code that preserves semantic similarity, which can significantly enhance the convenience of cross-modal retrieval. However, the currently available supervised cross-modal hashing methods generally only factorize the label matrix and do not fully exploit the supervised information. Furthermore, these methods often only use one-directional mapping, which results in an unstable hash learning process. To address these problems, we propose a new supervised cross-modal hash learning method called Discrete Two-step Cross-modal Hashing (DTCH) through the exploitation of pairwise relations. Specifically, this method fully exploits the pairwise similarity relations contained in the supervision information: for the label matrix, the hash learning process is stabilized by combining matrix factorization and label regression; for the pairwise similarity matrix, a semirelaxed and semidiscrete strategy is adopted to potentially reduce the cumulative quantization errors while improving the retrieval efficiency and accuracy. The approach further combines an exploration of fine-grained features in the objective function with a novel out-of-sample extension strategy to enable the implicit preservation of consistency between the different modal distributions of samples and the pairwise similarity relations. The superiority of our method was verified through extensive experiments using two widely used datasets.

Download Full-text

RoleSim*: Scaling axiomatic role-based similarity ranking on large graphs

World Wide Web ◽

10.1007/s11280-021-00925-z ◽

2021 ◽

Author(s):

Weiren Yu ◽

Sima Iranmanesh ◽

Aparajita Haldar ◽

Maoyin Zhang ◽

Hakan Ferhatosmanoglu

Keyword(s):

Web Search ◽

Similarity Measures ◽

Computational Time ◽

Pairwise Similarity ◽

Large Graphs ◽

Triangular Inequality ◽

Graph Theoretic ◽

Role Based ◽

Automorphic Equivalence ◽

Similarity Information

AbstractRoleSim and SimRank are among the popular graph-theoretic similarity measures with many applications in, e.g., web search, collaborative filtering, and sociometry. While RoleSim addresses the automorphic (role) equivalence of pairwise similarity which SimRank lacks, it ignores the neighboring similarity information out of the automorphically equivalent set. Consequently, two pairs of nodes, which are not automorphically equivalent by nature, cannot be well distinguished by RoleSim if the averages of their neighboring similarities over the automorphically equivalent set are the same. To alleviate this problem: 1) We propose a novel similarity model, namely RoleSim*, which accurately evaluates pairwise role similarities in a more comprehensive manner. RoleSim* not only guarantees the automorphic equivalence that SimRank lacks, but also takes into account the neighboring similarity information outside the automorphically equivalent sets that are overlooked by RoleSim. 2) We prove the existence and uniqueness of the RoleSim* solution, and show its three axiomatic properties (i.e., symmetry, boundedness, and non-increasing monotonicity). 3) We provide a concise bound for iteratively computing RoleSim* formula, and estimate the number of iterations required to attain a desired accuracy. 4) We induce a distance metric based on RoleSim* similarity, and show that the RoleSim* metric fulfills the triangular inequality, which implies the sum-transitivity of its similarity scores. 5) We present a threshold-based RoleSim* model that reduces the computational time further with provable accuracy guarantee. 6) We propose a single-source RoleSim* model, which scales well for sizable graphs. 7) We also devise methods to scale RoleSim* based search by incorporating its triangular inequality property with partitioning techniques. Our experimental results on real datasets demonstrate that RoleSim* achieves higher accuracy than its competitors while scaling well on sizable graphs with billions of edges.

Download Full-text

Label Consistent Flexible Matrix Factorization Hashing for Efficient Cross-modal Retrieval

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3446774 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-18

Author(s):

Donglin Zhang ◽

Xiao-Jun Wu ◽

Jun Yu

Keyword(s):

Matrix Factorization ◽

Large Scale ◽

Semantic Representation ◽

Heterogeneous Data ◽

Binary Codes ◽

Similarity Matrix ◽

Pairwise Similarity ◽

Multimodal Data ◽

Cross Media ◽

Hash Codes

Hashing methods have sparked a great revolution on large-scale cross-media search due to its effectiveness and efficiency. Most existing approaches learn unified hash representation in a common Hamming space to represent all multimodal data. However, the unified hash codes may not characterize the cross-modal data discriminatively, because the data may vary greatly due to its different dimensionalities, physical properties, and statistical information. In addition, most existing supervised cross-modal algorithms preserve the similarity relationship by constructing an n × n pairwise similarity matrix, which requires a large amount of calculation and loses the category information. To mitigate these issues, a novel cross-media hashing approach is proposed in this article, dubbed label flexible matrix factorization hashing (LFMH). Specifically, LFMH jointly learns the modality-specific latent subspace with similar semantic by the flexible matrix factorization. In addition, LFMH guides the hash learning by utilizing the semantic labels directly instead of the large n × n pairwise similarity matrix. LFMH transforms the heterogeneous data into modality-specific latent semantic representation. Therefore, we can obtain the hash codes by quantifying the representations, and the learned hash codes are consistent with the supervised labels of multimodal data. Then, we can obtain the similar binary codes of the corresponding modality, and the binary codes can characterize such samples flexibly. Accordingly, the derived hash codes have more discriminative power for single-modal and cross-modal retrieval tasks. Extensive experiments on eight different databases demonstrate that our model outperforms some competitive approaches.

Download Full-text

Dissent and rebellion in the House of Commons: a social network analysis of Brexit-related divisions in the 57th Parliament

Applied Network Science ◽

10.1007/s41109-021-00379-2 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Carla Intal ◽

Taha Yasseri

Keyword(s):

Social Network ◽

Social Network Analysis ◽

Network Analysis ◽

European Integration ◽

Party System ◽

House Of Commons ◽

Pairwise Similarity ◽

Significant Difference ◽

The Individual ◽

Similarity Scores

AbstractThe British party system is known for its discipline and cohesion, but it remains wedged on one issue: European integration. We offer a methodology using social network analysis that considers the individual interactions of MPs in the voting process. Using public Parliamentary records, we scraped votes of individual MPs in the 57th parliament (June 2017 to April 2019), computed pairwise similarity scores and calculated rebellion metrics based on eigenvector centralities. Comparing the networks of Brexit- and non-Brexit divisions, our methodology was able to detect a significant difference in eurosceptic behaviour for the former, and using a rebellion metric we predicted how MPs would vote in a forthcoming Brexit deal with over 90% accuracy.

Download Full-text

ENRIch: Exploiting Image Similarity to Maximize Efficient Machine Learning in Medical Imaging

10.1101/2021.05.22.21257645 ◽

2021 ◽

Author(s):

Erin M Chinn ◽

Rohit Arora ◽

Ramy Arnaout ◽

Rima Arnaout

Keyword(s):

Medical Imaging ◽

Binary Classification ◽

Model Performance ◽

Training Data ◽

Image Similarity ◽

P Value ◽

Pairwise Similarity ◽

Original Dataset ◽

Improve Model ◽

Segmentation Task

Deep learning (DL) has been applied with success in proofs of concept across biomedical imaging, including across modalities and medical specialties1-17. Labeled data is critical to training and testing DL models, and such models traditionally require large amounts of training data, straining the limited (human) resources available for expert labeling/annotation. It would be ideal to prioritize labeling those images that are most likely to improve model performance and skip images that are redundant. However, straightforward, robust, and quantitative metrics for measuring and eliminating redundancy in datasets have not yet been described. Here, we introduce a new method, ENRIch (Eliminate Needless Redundancy in Imaging datasets), for assessing image dataset redundancy and test it on a well benchmarked medical imaging dataset3. First, we compute pairwise similarity metrics for images in a given dataset, resulting in a matrix of pairwise similarity values. We then rank images based on this matrix and use these rankings to curate the dataset, to minimize dataset redundancy. Using this method, we achieve similar AUC scores in a binary classification task with just a fraction of our original dataset (AUC of 0.99 +/- 1.35e-05 on 44 percent of available images vs. AUC of 0.99 +/- 9.32e-06 on all available images, p-value 0.0002) and better scores than the same sized training subsets chosen at random. We also demonstrate similar Jaccard scores in a multi-class segmentation task while eliminating redundant images (average Jaccard index of 0.58 on 80 percent of available images vs. 0.60 on all available images). Thus, algorithms that reduce dataset redundancy based on image similarity can significantly reduce the number of training images required, while preserving performance, in medical imaging datasets.

Download Full-text

A Framework for Predicting Memory Errors with a Bayesian Model of Concept Generalization

10.31234/osf.io/atfm9 ◽

2021 ◽

Author(s):

Isabella Destefano ◽

Timothy F. Brady ◽

Edward Vul

Keyword(s):

Exponential Function ◽

Bayesian Model ◽

Visual Memory ◽

Stimulus Space ◽

Pairwise Similarity ◽

Memory Errors ◽

Complex Stimuli ◽

Set Membership ◽

Number Game

“Similarity” is often thought to dictate memory errors. For example, in visual memory, memory judgements of lures are related to their psychophysical similarity to targets: an approximately exponential function in stimulus space (Schurgin et al. 2020). However, similarity is ill-defined for more complex stimuli, and memory errors seem to depend on all the remembered items, not just pairwise similarity. Such effects can be captured by a model that views similarity as a byproduct of Bayesian generalization (Tenenbaum & Griffiths, 2001). Here we ask whether the propensity of people to generalize from a set to an item predicts memory errors to that item. We use the “number game” generalization task to collect human judgements about set membership for symbolic numbers and show that memory errors for numbers are consistent with these generalization judgements rather than pairwise similarity. These results suggest that generalization propensity, rather than “similarity”, drives memory errors.

Download Full-text

iSubGen: Integrative Subtype Generation by Pairwise Similarity Assessment

10.1101/2021.05.13.444087 ◽

2021 ◽

Author(s):

Natalie S Fox ◽

Constance H Li ◽

Syed Haider ◽

Paul C Boutros

Keyword(s):

Underlying Disease ◽

Environmental Data ◽

Biomedical Data ◽

Missing Information ◽

Pairwise Similarity ◽

Data Types ◽

Mutational Signature ◽

The Face ◽

Disease Subtypes ◽

Diverse Data

There are myriad types of biomedical data - genetics, transcriptomics, clinical, imaging, wearable devices and many more. When a group of patients with the same underlying disease exhibit similarities across multiple types of data, this is called a subtype. Disease subtypes can reflect etiology and sometimes predict clinical behaviour. Existing subtyping approaches struggle to simultaneously handle multiple diverse data types, particularly when there is missing information, as is common in most real-world clinical datasets. To improve subtype discovery, we exploited changes in the correlation-structure between different data types to create iSubGen, an algorithm for integrative subtype generation. iSubGen can combine arbitrary data types for subtype discovery, such as merging molecular, mutational signature, pathway and micro-environmental data. iSubGen recapitulates known subtypes across multiple diseases, even in the face of substantial missing data. It identifies groups of patients with divergent clinical outcomes, and can combine arbitrary data types for subtype discovery, such as merging molecular, mutational signature, pathway and micro-environmental data. iSubGen can accommodate any feature that can be compared with a similarity-metric, and provides a versatile approach for creating subtypes. It is available at https://CRAN.R-project.org/package=iSubGen.

Download Full-text

Legal document recommendation system: A cluster based pairwise similarity computation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189871 ◽

2021 ◽

pp. 1-13

Author(s):

Jenish Dhanani ◽

Rupa Mehta ◽

Dipti Rana

Keyword(s):

Recommender Systems ◽

Recommendation System ◽

Real Life ◽

Citation Network ◽

Search Space ◽

Pairwise Similarity ◽

Large Numbers ◽

Legal Document ◽

Legal Domain ◽

Similarity Scores

Legal practitioners analyze relevant previous judgments to prepare favorable and advantageous arguments for an ongoing case. In Legal domain, recommender systems (RS) effectively identify and recommend referentially and/or semantically relevant judgments. Due to the availability of enormous amounts of judgments, RS needs to compute pairwise similarity scores for all unique judgment pairs in advance, aiming to minimize the recommendation response time. This practice introduces the scalability issue as the number of pairs to be computed increases quadratically with the number of judgments i.e., O (n2). However, there is a limited number of pairs consisting of strong relevance among the judgments. Therefore, it is insignificant to compute similarities for pairs consisting of trivial relevance between judgments. To address the scalability issue, this research proposes a graph clustering based novel Legal Document Recommendation System (LDRS) that forms clusters of referentially similar judgments and within those clusters find semantically relevant judgments. Hence, pairwise similarity scores are computed for each cluster to restrict search space within-cluster only instead of the entire corpus. Thus, the proposed LDRS severely reduces the number of similarity computations that enable large numbers of judgments to be handled. It exploits a highly scalable Louvain approach to cluster judgment citation network, and Doc2Vec to capture the semantic relevance among judgments within a cluster. The efficacy and efficiency of the proposed LDRS are evaluated and analyzed using the large real-life judgments of the Supreme Court of India. The experimental results demonstrate the encouraging performance of proposed LDRS in terms of Accuracy, F1-Scores, MCC Scores, and computational complexity, which validates the applicability for scalable recommender systems.

Download Full-text

Comparison of Pairwise Similarity Distance Methods for Effective Hashing

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/1099/1/012072 ◽

2021 ◽

Vol 1099 (1) ◽

pp. 012072

Author(s):

Ş Öztürk

Keyword(s):

Pairwise Similarity ◽

Distance Methods ◽

Similarity Distance

Download Full-text

Triplet Cross-Modal Retrieval Based on TopN Pairwise Similarity Transfer

Computer Science and Application ◽

10.12677/csa.2021.1110256 ◽

2021 ◽

Vol 11 (10) ◽

pp. 2529-2537

Author(s):

钜源谭

Keyword(s):

Pairwise Similarity

Download Full-text

pairwise similarity
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Discrete Two-Step Cross-Modal Hashing through the Exploitation of Pairwise Relations

RoleSim*: Scaling axiomatic role-based similarity ranking on large graphs

Label Consistent Flexible Matrix Factorization Hashing for Efficient Cross-modal Retrieval

Dissent and rebellion in the House of Commons: a social network analysis of Brexit-related divisions in the 57th Parliament

ENRIch: Exploiting Image Similarity to Maximize Efficient Machine Learning in Medical Imaging

A Framework for Predicting Memory Errors with a Bayesian Model of Concept Generalization

iSubGen: Integrative Subtype Generation by Pairwise Similarity Assessment

Legal document recommendation system: A cluster based pairwise similarity computation

Comparison of Pairwise Similarity Distance Methods for Effective Hashing

Triplet Cross-Modal Retrieval Based on TopN Pairwise Similarity Transfer

Export Citation Format

pairwise similarityRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Discrete Two-Step Cross-Modal Hashing through the Exploitation of Pairwise Relations

RoleSim*: Scaling axiomatic role-based similarity ranking on large graphs

Label Consistent Flexible Matrix Factorization Hashing for Efficient Cross-modal Retrieval

Dissent and rebellion in the House of Commons: a social network analysis of Brexit-related divisions in the 57th Parliament

ENRIch: Exploiting Image Similarity to Maximize Efficient Machine Learning in Medical Imaging

A Framework for Predicting Memory Errors with a Bayesian Model of Concept Generalization

iSubGen: Integrative Subtype Generation by Pairwise Similarity Assessment

Legal document recommendation system: A cluster based pairwise similarity computation

Comparison of Pairwise Similarity Distance Methods for Effective Hashing

Triplet Cross-Modal Retrieval Based on TopN Pairwise Similarity Transfer

pairwise similarity
Recently Published Documents