pairwise similarity
Recently Published Documents


TOTAL DOCUMENTS

85
(FIVE YEARS 28)

H-INDEX

15
(FIVE YEARS 3)

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Shaohua Wang ◽  
Xiao Kang ◽  
Fasheng Liu ◽  
Xiushan Nie ◽  
Xingbo Liu

The cross-modal hashing method can map heterogeneous multimodal data into a compact binary code that preserves semantic similarity, which can significantly enhance the convenience of cross-modal retrieval. However, the currently available supervised cross-modal hashing methods generally only factorize the label matrix and do not fully exploit the supervised information. Furthermore, these methods often only use one-directional mapping, which results in an unstable hash learning process. To address these problems, we propose a new supervised cross-modal hash learning method called Discrete Two-step Cross-modal Hashing (DTCH) through the exploitation of pairwise relations. Specifically, this method fully exploits the pairwise similarity relations contained in the supervision information: for the label matrix, the hash learning process is stabilized by combining matrix factorization and label regression; for the pairwise similarity matrix, a semirelaxed and semidiscrete strategy is adopted to potentially reduce the cumulative quantization errors while improving the retrieval efficiency and accuracy. The approach further combines an exploration of fine-grained features in the objective function with a novel out-of-sample extension strategy to enable the implicit preservation of consistency between the different modal distributions of samples and the pairwise similarity relations. The superiority of our method was verified through extensive experiments using two widely used datasets.


2021 ◽  
Author(s):  
Weiren Yu ◽  
Sima Iranmanesh ◽  
Aparajita Haldar ◽  
Maoyin Zhang ◽  
Hakan Ferhatosmanoglu

AbstractRoleSim and SimRank are among the popular graph-theoretic similarity measures with many applications in, e.g., web search, collaborative filtering, and sociometry. While RoleSim addresses the automorphic (role) equivalence of pairwise similarity which SimRank lacks, it ignores the neighboring similarity information out of the automorphically equivalent set. Consequently, two pairs of nodes, which are not automorphically equivalent by nature, cannot be well distinguished by RoleSim if the averages of their neighboring similarities over the automorphically equivalent set are the same. To alleviate this problem: 1) We propose a novel similarity model, namely RoleSim*, which accurately evaluates pairwise role similarities in a more comprehensive manner. RoleSim* not only guarantees the automorphic equivalence that SimRank lacks, but also takes into account the neighboring similarity information outside the automorphically equivalent sets that are overlooked by RoleSim. 2) We prove the existence and uniqueness of the RoleSim* solution, and show its three axiomatic properties (i.e., symmetry, boundedness, and non-increasing monotonicity). 3) We provide a concise bound for iteratively computing RoleSim* formula, and estimate the number of iterations required to attain a desired accuracy. 4) We induce a distance metric based on RoleSim* similarity, and show that the RoleSim* metric fulfills the triangular inequality, which implies the sum-transitivity of its similarity scores. 5) We present a threshold-based RoleSim* model that reduces the computational time further with provable accuracy guarantee. 6) We propose a single-source RoleSim* model, which scales well for sizable graphs. 7) We also devise methods to scale RoleSim* based search by incorporating its triangular inequality property with partitioning techniques. Our experimental results on real datasets demonstrate that RoleSim* achieves higher accuracy than its competitors while scaling well on sizable graphs with billions of edges.


Author(s):  
Donglin Zhang ◽  
Xiao-Jun Wu ◽  
Jun Yu

Hashing methods have sparked a great revolution on large-scale cross-media search due to its effectiveness and efficiency. Most existing approaches learn unified hash representation in a common Hamming space to represent all multimodal data. However, the unified hash codes may not characterize the cross-modal data discriminatively, because the data may vary greatly due to its different dimensionalities, physical properties, and statistical information. In addition, most existing supervised cross-modal algorithms preserve the similarity relationship by constructing an n × n pairwise similarity matrix, which requires a large amount of calculation and loses the category information. To mitigate these issues, a novel cross-media hashing approach is proposed in this article, dubbed label flexible matrix factorization hashing (LFMH). Specifically, LFMH jointly learns the modality-specific latent subspace with similar semantic by the flexible matrix factorization. In addition, LFMH guides the hash learning by utilizing the semantic labels directly instead of the large n × n pairwise similarity matrix. LFMH transforms the heterogeneous data into modality-specific latent semantic representation. Therefore, we can obtain the hash codes by quantifying the representations, and the learned hash codes are consistent with the supervised labels of multimodal data. Then, we can obtain the similar binary codes of the corresponding modality, and the binary codes can characterize such samples flexibly. Accordingly, the derived hash codes have more discriminative power for single-modal and cross-modal retrieval tasks. Extensive experiments on eight different databases demonstrate that our model outperforms some competitive approaches.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Carla Intal ◽  
Taha Yasseri

AbstractThe British party system is known for its discipline and cohesion, but it remains wedged on one issue: European integration. We offer a methodology using social network analysis that considers the individual interactions of MPs in the voting process. Using public Parliamentary records, we scraped votes of individual MPs in the 57th parliament (June 2017 to April 2019), computed pairwise similarity scores and calculated rebellion metrics based on eigenvector centralities. Comparing the networks of Brexit- and non-Brexit divisions, our methodology was able to detect a significant difference in eurosceptic behaviour for the former, and using a rebellion metric we predicted how MPs would vote in a forthcoming Brexit deal with over 90% accuracy.


2021 ◽  
Author(s):  
Erin M Chinn ◽  
Rohit Arora ◽  
Ramy Arnaout ◽  
Rima Arnaout

Deep learning (DL) has been applied with success in proofs of concept across biomedical imaging, including across modalities and medical specialties1-17. Labeled data is critical to training and testing DL models, and such models traditionally require large amounts of training data, straining the limited (human) resources available for expert labeling/annotation. It would be ideal to prioritize labeling those images that are most likely to improve model performance and skip images that are redundant. However, straightforward, robust, and quantitative metrics for measuring and eliminating redundancy in datasets have not yet been described. Here, we introduce a new method, ENRIch (Eliminate Needless Redundancy in Imaging datasets), for assessing image dataset redundancy and test it on a well benchmarked medical imaging dataset3. First, we compute pairwise similarity metrics for images in a given dataset, resulting in a matrix of pairwise similarity values. We then rank images based on this matrix and use these rankings to curate the dataset, to minimize dataset redundancy. Using this method, we achieve similar AUC scores in a binary classification task with just a fraction of our original dataset (AUC of 0.99 +/- 1.35e-05 on 44 percent of available images vs. AUC of 0.99 +/- 9.32e-06 on all available images, p-value 0.0002) and better scores than the same sized training subsets chosen at random. We also demonstrate similar Jaccard scores in a multi-class segmentation task while eliminating redundant images (average Jaccard index of 0.58 on 80 percent of available images vs. 0.60 on all available images). Thus, algorithms that reduce dataset redundancy based on image similarity can significantly reduce the number of training images required, while preserving performance, in medical imaging datasets.


2021 ◽  
Author(s):  
Isabella Destefano ◽  
Timothy F. Brady ◽  
Edward Vul

“Similarity” is often thought to dictate memory errors. For example, in visual memory, memory judgements of lures are related to their psychophysical similarity to targets: an approximately exponential function in stimulus space (Schurgin et al. 2020). However, similarity is ill-defined for more complex stimuli, and memory errors seem to depend on all the remembered items, not just pairwise similarity. Such effects can be captured by a model that views similarity as a byproduct of Bayesian generalization (Tenenbaum & Griffiths, 2001). Here we ask whether the propensity of people to generalize from a set to an item predicts memory errors to that item. We use the “number game” generalization task to collect human judgements about set membership for symbolic numbers and show that memory errors for numbers are consistent with these generalization judgements rather than pairwise similarity. These results suggest that generalization propensity, rather than “similarity”, drives memory errors.


2021 ◽  
Author(s):  
Natalie S Fox ◽  
Constance H Li ◽  
Syed Haider ◽  
Paul C Boutros

There are myriad types of biomedical data - genetics, transcriptomics, clinical, imaging, wearable devices and many more. When a group of patients with the same underlying disease exhibit similarities across multiple types of data, this is called a subtype. Disease subtypes can reflect etiology and sometimes predict clinical behaviour. Existing subtyping approaches struggle to simultaneously handle multiple diverse data types, particularly when there is missing information, as is common in most real-world clinical datasets. To improve subtype discovery, we exploited changes in the correlation-structure between different data types to create iSubGen, an algorithm for integrative subtype generation. iSubGen can combine arbitrary data types for subtype discovery, such as merging molecular, mutational signature, pathway and micro-environmental data. iSubGen recapitulates known subtypes across multiple diseases, even in the face of substantial missing data. It identifies groups of patients with divergent clinical outcomes, and can combine arbitrary data types for subtype discovery, such as merging molecular, mutational signature, pathway and micro-environmental data. iSubGen can accommodate any feature that can be compared with a similarity-metric, and provides a versatile approach for creating subtypes. It is available at https://CRAN.R-project.org/package=iSubGen.


2021 ◽  
pp. 1-13
Author(s):  
Jenish Dhanani ◽  
Rupa Mehta ◽  
Dipti Rana

Legal practitioners analyze relevant previous judgments to prepare favorable and advantageous arguments for an ongoing case. In Legal domain, recommender systems (RS) effectively identify and recommend referentially and/or semantically relevant judgments. Due to the availability of enormous amounts of judgments, RS needs to compute pairwise similarity scores for all unique judgment pairs in advance, aiming to minimize the recommendation response time. This practice introduces the scalability issue as the number of pairs to be computed increases quadratically with the number of judgments i.e., O (n2). However, there is a limited number of pairs consisting of strong relevance among the judgments. Therefore, it is insignificant to compute similarities for pairs consisting of trivial relevance between judgments. To address the scalability issue, this research proposes a graph clustering based novel Legal Document Recommendation System (LDRS) that forms clusters of referentially similar judgments and within those clusters find semantically relevant judgments. Hence, pairwise similarity scores are computed for each cluster to restrict search space within-cluster only instead of the entire corpus. Thus, the proposed LDRS severely reduces the number of similarity computations that enable large numbers of judgments to be handled. It exploits a highly scalable Louvain approach to cluster judgment citation network, and Doc2Vec to capture the semantic relevance among judgments within a cluster. The efficacy and efficiency of the proposed LDRS are evaluated and analyzed using the large real-life judgments of the Supreme Court of India. The experimental results demonstrate the encouraging performance of proposed LDRS in terms of Accuracy, F1-Scores, MCC Scores, and computational complexity, which validates the applicability for scalable recommender systems.


Sign in / Sign up

Export Citation Format

Share Document