similarity measuring Latest Research Papers

In the data mining of road networks, trajectory clustering of moving objects plays an important role in many applications. Most existing algorithms for this problem are based on every position point in a trajectory and face a significant challenge in dealing with complex and length-varying trajectories. This paper proposes a grid-based whole trajectory clustering model (GBWTC) in road networks, which regards the trajectory as a whole. In this model, we first propose a trajectory mapping algorithm based on grid estimation, which transforms the trajectories in road network space into grid sequences in grid space and forms grid trajectories by recognizing and eliminating redundant, abnormal, and stranded information of grid sequences. We then design an algorithm to extract initial clustering centers based on density weight and improve a shape similarity measuring algorithm to measure the distance between two grid trajectories. Finally, we dynamically allocate every grid trajectory to the best clusters by the nearest neighbor principle and an outlier function. For the evaluation of clustering performance, we establish a clustering criterion based on the classical Silhouette Coefficient to maximize intercluster separation and intracluster homogeneity. The clustering accuracy and performance superiority of the proposed algorithm are illustrated on a real-world dataset in comparison with existing algorithms.

Download Full-text

A Novel Hybrid Methodology of Measuring Sentence Similarity

Symmetry ◽

10.3390/sym13081442 ◽

2021 ◽

Vol 13 (8) ◽

pp. 1442

Author(s):

Yongmin Yoo ◽

Tak-Sung Heo ◽

Yeongjoon Park ◽

Kyungsun Kim

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Correlation Coefficient ◽

Language Processing ◽

Pearson Correlation ◽

Word Structure ◽

Sentence Similarity ◽

Similarity Measuring ◽

Evaluation Metric

The problem of measuring sentence similarity is an essential issue in the natural language processing area. It is necessary to measure the similarity between sentences accurately. Sentence similarity measuring is the task of finding semantic symmetry between two sentences, regardless of word order and context of the words. There are many approaches to measuring sentence similarity. Deep learning methodology shows a state-of-the-art performance in many natural language processing fields and is used a lot in sentence similarity measurement methods. However, in the natural language processing field, considering the structure of the sentence or the word structure that makes up the sentence is also important. In this study, we propose a methodology combined with both deep learning methodology and a method considering lexical relationships. Our evaluation metric is the Pearson correlation coefficient and Spearman correlation coefficient. As a result, the proposed method outperforms the current approaches on a KorSTS standard benchmark Korean dataset. Moreover, it performs a maximum of a 65% increase than only using deep learning methodology. Experiments show that our proposed method generally results in better performance than those with only a deep learning model.

Download Full-text

Automated Extraction of Labels from Large-Scale Historical Maps

AGILE: GIScience Series ◽

10.5194/agile-giss-2-12-2021 ◽

2021 ◽

Vol 2 ◽

pp. 1-14

Author(s):

Inga Schlegel

Keyword(s):

Spatial Orientation ◽

Large Scale ◽

Text Detection ◽

Historical Maps ◽

True Positive ◽

Automated Extraction ◽

Manual Intervention ◽

Ancillary Information ◽

Similarity Measuring ◽

Machine Readable

Abstract. Historical maps are frequently neither readable, searchable nor analyzable by machines due to lacking databases or ancillary information about their content. Identifying and annotating map labels is seen as a first step towards an automated legibility of those. This article investigates a universal and transferable methodology for the work with large-scale historical maps and their comparability to others while reducing manual intervention to a minimum. We present an end-to-end approach which increases the number of true positive identified labels by combining available text detection, recognition, and similarity measuring tools with own enhancements. The comparison of recognized historical with current street names produces a satisfactory accordance which can be used to assign their point-like representatives within a final rough georeferencing. The demonstrated workflow facilitates a spatial orientation within large-scale historical maps by enabling the establishment of relating databases. Assigning the identified labels to the geometries of related map features may contribute to machine-readable and analyzable historical maps.

Download Full-text

Software similarity measurements using UML diagrams: A systematic literature review

Register Jurnal Ilmiah Teknologi Sistem Informasi ◽

10.26594/register.v8i1.2248 ◽

2021 ◽

Vol 8 (1) ◽

pp. 10

Author(s):

Evi Triandini ◽

Reza Fauzan ◽

Daniel O. Siahaan ◽

Siti Rochimah ◽

I Gede Suardika ◽

...

Keyword(s):

Literature Review ◽

Systematic Literature Review ◽

Software Reuse ◽

Unified Modeling Language ◽

Structural Similarity ◽

Future Research ◽

Unified Modeling ◽

Software Products ◽

Uml Diagrams ◽

Similarity Measuring

Every piece of software uses a model to derive its operational, auxiliary, and functional procedures. Unified Modeling Language (UML) is a standard displaying language for determining, recording, and building a software product. Several algorithms have been used by researchers to measure similarities between UML artifacts. However, there no literature studies have considered measurements of UML diagram similarities. This paper presents the results of a systematic literature review concerning similarity measurements between the UML diagrams of different software products. The study reviews and identifies similarity measurements of UML artifacts, with class diagram, sequence diagram, statechart diagram, and use case diagram being UML diagrams that are widely used as research objects for measuring similarity. Measuring similarity enables resolution of the problem domains of software reuse, similarity measurement, and clone detection. The instruments used to measure similarity are semantic and structural similarity. The findings indicate opportunities for future research regarding calculating other UML diagrams, compiling calculation information for each diagram, adapting semantic and structural similarity calculation methods, determining the best weight for each item in the diagram, testing novel proposed methods, and building or finding good datasets for use as testing material.

Download Full-text

Steadiness analysis of means-end conceptual paths and problem-chains based on concept lattices and similarity measuring

International Journal of Machine Learning and Cybernetics ◽

10.1007/s13042-021-01309-5 ◽

2021 ◽

Author(s):

Lankun Guo ◽

Zhenhua Jia ◽

Qingguo Li ◽

Jianhua Dai

Keyword(s):

Concept Lattices ◽

Analysis Of Means ◽

Similarity Measuring

Download Full-text

Intelligent recognition of semantic relationships based on antonymy

Multiagent and Grid Systems ◽

10.3233/mgs-200332 ◽

2020 ◽

Vol 16 (3) ◽

pp. 263-290

Author(s):

Hui Guan ◽

Chengzhen Jia ◽

Hongji Yang

Keyword(s):

Semantic Similarity ◽

New Approach ◽

Word Similarity ◽

Semantic Relationships ◽

Proposed Model ◽

Path Distance ◽

The Hierarchical Structure ◽

Thinking Process ◽

Similarity Measuring ◽

Intelligent Recognition

Since computing semantic similarity tends to simulate the thinking process of humans, semantic dissimilarity must play a part in this process. In this paper, we present a new approach for semantic similarity measuring by taking consideration of dissimilarity into the process of computation. Specifically, the proposed measures explore the potential antonymy in the hierarchical structure of WordNet to represent the dissimilarity between concepts and then combine the dissimilarity with the results of existing methods to achieve semantic similarity results. The relation between parameters and the correlation value is discussed in detail. The proposed model is then applied to different text granularity levels to validate the correctness on similarity measurement. Experimental results show that the proposed approach not only achieves high correlation value against human ratings but also has effective improvement to existing path-distance based methods on the word similarity level, in the meanwhile effectively correct existing sentence similarity method in some cases in Microsoft Research Paraphrase Corpus and SemEval-2014 date set.

Download Full-text