Similarity Measure for Matching Fuzzy Object Shapes

In this chapter, the Common Bin Similarity Measure (CBSM) is introduced to estimate the degree of overlapping between the query and the database objects. All available similarity measures fail to handle the problem of Integrated Region Matching (IRM). The technical procedure followed for extracting the objects from images is well defined with an example. The performance of CBSM is compared with well-known methods and the results are given. The effect of IRM with CBSM is also proved by the experimental results. In addition, the performance of CBSM in encoded feature is compared with similar approaches. Overall, the CBSM is a novel idea and very much suitable for matching objects and ranking on their similarities.

Author(s):  
Qiang Shen ◽  
Tossapon Boongoen

In the wake of recent terrorist atrocities, intelligence experts have commented that failures in detecting terrorist and criminal activities are not so much due to a lack of data, as they are due to difficulties in relating and interpreting the available intelligence. An intelligent tool for monitoring and interpreting intelligence data will provide a helpful means for intelligence analysts to consider emerging scenarios of plausible threats, thereby offering useful assistance in devising and deploying preventive measures against such possibilities. One of the major problems in need of such attention is detecting false identity that has become the common denominator of all serious crime, especially terrorism. Typical approaches to this problem rely on the similarity measure of textual and other content-based characteristics, which are usually not applicable in the case of deceptive and erroneous description. This barrier may be overcome through link information presented in communication behaviors, financial interactions and social networks. Quantitative link-based similarity measures have proven effective for identifying similar problems in the Internet and publication domains. However, these numerical methods only concentrate on link structures, and fail to achieve accurate and coherent interpretation of the information. Inspired by this observation, the chapter presents a novel qualitative similarity measure that makes use of multiple link properties to refine the underlying similarity estimation process and consequently derive semantic-rich similarity descriptors. The approach is based on order-of-magnitude reasoning. Its performance is empirically evaluated over a terrorism-related dataset, and compared against several state-of-the-art link-based algorithms and other alternative methods.


2012 ◽  
Vol 38 (5) ◽  
pp. 459-475 ◽  
Author(s):  
Peigang Xu ◽  
Yadong Wang ◽  
Bo Liu

Ontology matching, aimed at finding semantically related entities from different ontologies, plays an important role in establishing interoperation among Semantic Web applications. Recently, many similarity measures have been proposed to explore the lexical, structural or semantic features of ontologies. However, a key problem is how to integrate various similarities automatically. In this paper, we define a novel metric termed a “differentor” to assess the probability that a similarity measure can find the one-to-one mappings between two ontologies at the entity level, and use it to integrate different similarity measures. The proposed approach can assign weights automatically to each pair of entities from different ontologies without any prior knowledge, and the aggregation task is accomplished based on these weights. The proposed approach has been tested on OAEI2010 benchmarks for evaluation. The experimental results show that the differentor can reflect the performance of individual similarity measures, and a differentor-based aggregation strategy outperforms other existing aggregation strategies.


2010 ◽  
Vol 29-32 ◽  
pp. 2620-2626
Author(s):  
Jing Li Zhou ◽  
Xue Jun Nie ◽  
Lei Hua Qin ◽  
Jian Feng Zhu

This paper proposes a novel fuzzy similarity measure based on the relationships between terms and categories. A term-category matrix is presented to represent such relationships and each element in the matrix denotes a membership degree that a term belongs to a category, which is computed using term frequency inverse document frequency and fuzzy relationships between documents and categories. Fuzzy similarity takes into account the situation that one document belongs to multiple categories and is computed using fuzzy operators. The experimental results show that the proposed fuzzy similarity surpasses other common similarity measures not only in the reliable derivation of document clustering results, but also in document clustering accuracies.


Author(s):  
B. Mathura Bai ◽  
N. Mangathayaru ◽  
B. Padmaja Rani ◽  
Shadi Aljawarneh

: Missing attribute values in medical datasets are one of the most common problems faced when mining medical datasets. Estimation of missing values is a major challenging task in pre-processing of datasets. Any wrong estimate of missing attribute values can lead to inefficient and improper classification thus resulting in lower classifier accuracies. Similarity measures play a key role during the imputation process. The use of an appropriate and better similarity measure can help to achieve better imputation and improved classification accuracies. This paper proposes a novel imputation measure for finding similarity between missing and non-missing instances in medical datasets. Experiments are carried by applying both the proposed imputation technique and popular benchmark existing imputation techniques. Classification is carried using KNN, J48, SMO and RBFN classifiers. Experiment analysis proved that after imputation of medical records using proposed imputation technique, the resulting classification accuracies reported by the classifiers KNN, J48 and SMO have improved when compared to other existing benchmark imputation techniques.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Ali A. Amer ◽  
Hassan I. Abdalla

Abstract Similarity measures have long been utilized in information retrieval and machine learning domains for multi-purposes including text retrieval, text clustering, text summarization, plagiarism detection, and several other text-processing applications. However, the problem with these measures is that, until recently, there has never been one single measure recorded to be highly effective and efficient at the same time. Thus, the quest for an efficient and effective similarity measure is still an open-ended challenge. This study, in consequence, introduces a new highly-effective and time-efficient similarity measure for text clustering and classification. Furthermore, the study aims to provide a comprehensive scrutinization for seven of the most widely used similarity measures, mainly concerning their effectiveness and efficiency. Using the K-nearest neighbor algorithm (KNN) for classification, the K-means algorithm for clustering, and the bag of word (BoW) model for feature selection, all similarity measures are carefully examined in detail. The experimental evaluation has been made on two of the most popular datasets, namely, Reuters-21 and Web-KB. The obtained results confirm that the proposed set theory-based similarity measure (STB-SM), as a pre-eminent measure, outweighs all state-of-art measures significantly with regards to both effectiveness and efficiency.


2021 ◽  
Vol 10 (2) ◽  
pp. 90
Author(s):  
Jin Zhu ◽  
Dayu Cheng ◽  
Weiwei Zhang ◽  
Ci Song ◽  
Jie Chen ◽  
...  

People spend more than 80% of their time in indoor spaces, such as shopping malls and office buildings. Indoor trajectories collected by indoor positioning devices, such as WiFi and Bluetooth devices, can reflect human movement behaviors in indoor spaces. Insightful indoor movement patterns can be discovered from indoor trajectories using various clustering methods. These methods are based on a measure that reflects the degree of similarity between indoor trajectories. Researchers have proposed many trajectory similarity measures. However, existing trajectory similarity measures ignore the indoor movement constraints imposed by the indoor space and the characteristics of indoor positioning sensors, which leads to an inaccurate measure of indoor trajectory similarity. Additionally, most of these works focus on the spatial and temporal dimensions of trajectories and pay less attention to indoor semantic information. Integrating indoor semantic information such as the indoor point of interest into the indoor trajectory similarity measurement is beneficial to discovering pedestrians having similar intentions. In this paper, we propose an accurate and reasonable indoor trajectory similarity measure called the indoor semantic trajectory similarity measure (ISTSM), which considers the features of indoor trajectories and indoor semantic information simultaneously. The ISTSM is modified from the edit distance that is a measure of the distance between string sequences. The key component of the ISTSM is an indoor navigation graph that is transformed from an indoor floor plan representing the indoor space for computing accurate indoor walking distances. The indoor walking distances and indoor semantic information are fused into the edit distance seamlessly. The ISTSM is evaluated using a synthetic dataset and real dataset for a shopping mall. The experiment with the synthetic dataset reveals that the ISTSM is more accurate and reasonable than three other popular trajectory similarities, namely the longest common subsequence (LCSS), edit distance on real sequence (EDR), and the multidimensional similarity measure (MSM). The case study of a shopping mall shows that the ISTSM effectively reveals customer movement patterns of indoor customers.


2021 ◽  
Vol 13 (1) ◽  
pp. 1-25
Author(s):  
Michael Loster ◽  
Ioannis Koumarelas ◽  
Felix Naumann

The integration of multiple data sources is a common problem in a large variety of applications. Traditionally, handcrafted similarity measures are used to discover, merge, and integrate multiple representations of the same entity—duplicates—into a large homogeneous collection of data. Often, these similarity measures do not cope well with the heterogeneity of the underlying dataset. In addition, domain experts are needed to manually design and configure such measures, which is both time-consuming and requires extensive domain expertise. We propose a deep Siamese neural network, capable of learning a similarity measure that is tailored to the characteristics of a particular dataset. With the properties of deep learning methods, we are able to eliminate the manual feature engineering process and thus considerably reduce the effort required for model construction. In addition, we show that it is possible to transfer knowledge acquired during the deduplication of one dataset to another, and thus significantly reduce the amount of data required to train a similarity measure. We evaluated our method on multiple datasets and compare our approach to state-of-the-art deduplication methods. Our approach outperforms competitors by up to +26 percent F-measure, depending on task and dataset. In addition, we show that knowledge transfer is not only feasible, but in our experiments led to an improvement in F-measure of up to +4.7 percent.


Author(s):  
Guanghsu A. Chang ◽  
Cheng-Chung Su ◽  
John W. Priest

Artificial intelligence (AI) approaches have been successfully applied to many fields. Among the numerous AI approaches, Case-Based Reasoning (CBR) is an approach that mainly focuses on the reuse of knowledge and experience. However, little work is done on applications of CBR to improve assembly part design. Similarity measures and the weight of different features are crucial in determining the accuracy of retrieving cases from the case base. To develop the weight of part features and retrieve a similar part design, the research proposes using Genetic Algorithms (GAs) to learn the optimum feature weight and employing nearest-neighbor technique to measure the similarity of assembly part design. Early experimental results indicate that the similar part design is effectively retrieved by these similarity measures.


In data mining ample techniques use distance based measures for data clustering. Improving clustering performance is the fundamental goal in cluster domain related tasks. Many techniques are available for clustering numerical data as well as categorical data. Clustering is an unsupervised learning technique and objects are grouped or clustered based on similarity among the objects. A new cluster similarity finding measure, which is cosine like cluster similarity measure (CLCSM), is proposed in this paper. The proposed cluster similarity measure is used for data classification. Extensive experiments are conducted by taking UCI machine learning datasets. The experimental results have shown that the proposed cosinelike cluster similarity measure is superior to many of the existing cluster similarity measures for data classification.


2021 ◽  
pp. 122-146
Author(s):  
Matthew Johnson ◽  
Jeffrey M. Bradshaw

AbstractCurrent attempts to understand human-machine systems are complex and unwieldy. Multiple disciplines throw different concepts and constructs at the problem, but there is no agreed-to framework to assemble these interrelated moving parts into a coherent system. We propose interdependence as the common factor that unifies and explains these moving parts and undergirds the different terms people use to talk about them. In this chapter, we will describe a sound and practical theoretical framework based on interdependence that enables researchers to predict and explain experimental results in terms of interlocking relationships among well-defined operational principles. Our exposition is not intended to be exhaustive, but instead aims to describe the basic principles in a way that allows the gist to be grasped by a broad cross-disciplinary audience through simple illustrations.


Sign in / Sign up

Export Citation Format

Share Document