A COMPARISON OF SIMILARITY MEASURES FOR CLUSTERING OF QRS COMPLEXES

Similarity or distance measures play important role in the performance of algorithms for ECG clustering problems. This paper compares four similarity measures such as the city block (L1-norm), Euclidean (L2-norm), normalized correlation coefficient, and simplified grey relational grade for clustering of QRS complexes. Performances of the measures include classification accuracy, threshold value selection, noise robustness, execution time, and the capability of automated selection of templates. The clustering algorithm used is the so-called two-step unsupervised method. The best out of the 10 independent runs of the clustering algorithm with randomly selected initial template beat for each run is used to compare the performances of each similarity measure. To investigate the capability of automated selection of templates for ECG classification algorithms, we use the cluster centers generated by the clustering algorithm with various measures as templates. Four sets of templates are obtained, each set for a measure. And the four sets of templates are used in the k-nearest neighbor classification method to evaluate the performance of the templates. Tested with MIT/BIH arrhythmia data, we observe that the simplified grey relational grade outperforms the other measures in classification accuracy, threshold value selection, noise robustness, and the capability of automated selection of templates.

Download Full-text

Boolean logic algebra driven similarity measure for text based applications

PeerJ Computer Science ◽

10.7717/peerj-cs.641 ◽

2021 ◽

Vol 7 ◽

pp. e641

Author(s):

Hassan I. Abdalla ◽

Ali A. Amer

Keyword(s):

Similarity Measure ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Comprehensive Evaluation ◽

State Of The Art ◽

Similarity Measures ◽

Boolean Logic ◽

K Nearest Neighbor ◽

Complex Design ◽

Clustering And Classification

In Information Retrieval (IR), Data Mining (DM), and Machine Learning (ML), similarity measures have been widely used for text clustering and classification. The similarity measure is the cornerstone upon which the performance of most DM and ML algorithms is completely dependent. Thus, till now, the endeavor in literature for an effective and efficient similarity measure is still immature. Some recently-proposed similarity measures were effective, but have a complex design and suffer from inefficiencies. This work, therefore, develops an effective and efficient similarity measure of a simplistic design for text-based applications. The measure developed in this work is driven by Boolean logic algebra basics (BLAB-SM), which aims at effectively reaching the desired accuracy at the fastest run time as compared to the recently developed state-of-the-art measures. Using the term frequency–inverse document frequency (TF-IDF) schema, the K-nearest neighbor (KNN), and the K-means clustering algorithm, a comprehensive evaluation is presented. The evaluation has been experimentally performed for BLAB-SM against seven similarity measures on two most-popular datasets, Reuters-21 and Web-KB. The experimental results illustrate that BLAB-SM is not only more efficient but also significantly more effective than state-of-the-art similarity measures on both classification and clustering tasks.

Download Full-text

Prediction of Customer Churn in Telecom Sector using Clustering Technique

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1207.0886s219 ◽

2019 ◽

Vol 8 (6S2) ◽

pp. 826-832

Keyword(s):

Clustering Algorithm ◽

Scientific Discovery ◽

Distance Measures ◽

Churn Prediction ◽

Customer Churn ◽

Main Challenge ◽

Validity Indices ◽

China Telecom ◽

Telecom Sector ◽

Selection Of

These days the data is producing at an incredible rate. Handling and analyzing such a big data in a specific time is the main challenge today. Clustering is majorly familiar with analyzing the data visually and used for efficient decision making process. Clustering is broadly used in a range of applications like education, field of computer science, marketing, insurance, surveillance detection, fraud detection and scientific discovery to mine the functional information from the data. This paper concentrates on the unsupervised learning k-means clustering algorithm to perform the analysis on churn prediction on telecom sector. The selection of distance measures and the category of data that a clustering algorithm cans effort is a decisive step in clustering. It defines how two elements are resemblance with each other and how this resemblance will impact the outline of the clusters. Another foremost difficulty in clustering process is to determine the goodness or validity of the cluster. Hence this paper discusses and addresses the different issues with K-means clustering. Experimentation was done on china telecom data to identify analogous group of clients who more likely to prone from the services is a major task. The results were analyzed to identify best feature, distance measures and validity indices to get qualitative clusters.

Download Full-text

About Similarity Measures of Components Arrangement of Naturally Ordered Data Arrays

SPIIRAS Proceedings ◽

10.15622/sp.18.2.471-503 ◽

2019 ◽

Vol 18 (2) ◽

pp. 471-503

Author(s):

Alexander Gumenyuk ◽

Artemiy Skiba ◽

Nikolay Pozdnichenko ◽

Stanislav Shpynov

Keyword(s):

Probabilistic Models ◽

Geometric Mean ◽

A Priori ◽

Similarity Measures ◽

Mathematical Linguistics ◽

Distance Measures ◽

Numerical Sequence ◽

Symbolic Sequences ◽

Ordered Data ◽

Selection Of

At present, adequate mathematical tools are not used to analyze the arrangement of components in arrays of naturally ordered data of a different nature, including words or letters in texts, notes in musical compositions, symbols in sign sequences, monitoring data, numbers representing ordered measurement results, components in genetic texts. Therefore, it is difficult or impossible to measure and compare the order of messages allocated in long information chains. The main approaches for comparing symbol sequences are using probabilistic models and statistical tools, pairwise and multiple alignment, which makes it possible to determine the degree of similarity of sequences using edit distance measures. The application of pseudospectral and fractal representation of symbolic sequences is somewhat exotic. "The curse of a priori unconscious knowledge" of the obvious orderliness of the sequence should be especially noticed, as it is widespread in mathematical linguistics, bioinformatics (mathematical biology), and other similar fields of science. The noted approaches almost do not pay attention to the study and detection of the patterns of the specific arrangement of all symbols, words, and components of data sets that constitute a separate sequence. The object of study in our works is a specifically organized numerical tuple – the arrangement of components (order) in symbolic or numerical sequence. The intervals between the closest identical components of the order are used as the basis for the quantitative representation of the chain arrangement. Multiplying all the intervals or summing their logarithms allows one to get numbers that uniquely reflect the arrangement of components in a particular sequence. These numbers, allow us to obtain a whole set of normalized characteristics of the order, among which the geometric mean interval and its logarithm. Such characteristics surprisingly accurately reflect the arrangement of the components in the symbolic sequences. In this paper, we present an approach for quantitative comparing the arrangement of arrays of naturally ordered data (information chains) of an arbitrary nature. The measures of similarity/distinction and procedure of comparison of the chain order, based on the selection of a list of equal and similar by the order characteristics of the subsequences (components), are proposed. Rank distributions are used for faster selection of a list of matching components. The paper presents a toolkit for comparing the order of information chains and demonstrates some of its applications for studying the structure of nucleotide sequences.

Download Full-text

A New TOPSIS Approach Using Cosine Similarity Measures and Cubic Bipolar Fuzzy Information for Sustainable Plastic Recycling Process

Mathematical Problems in Engineering ◽

10.1155/2021/4309544 ◽

2021 ◽

Vol 2021 ◽

pp. 1-18

Author(s):

Muhammad Riaz ◽

Dragan Pamucar ◽

Anam Habib ◽

Mishal Riaz

Keyword(s):

Fuzzy Numbers ◽

Similarity Measures ◽

Cosine Similarity ◽

Distance Measures ◽

Recycling Process ◽

Plastic Recycling ◽

Multicriteria Group Decision Making ◽

Topsis Approach ◽

Cosine Similarity Measures ◽

Selection Of

A cubic bipolar fuzzy set (CBFS) is a robust paradigm to express bipolarity and vagueness in terms of bipolar fuzzy numbers and interval-valued bipolar fuzzy numbers. The abstraction of similarity measures (SMs) has a large number of applications in various fields. Therefore, in this study, taking the advantage of CBFSs, three cosine similarity measures for CBFSs are proposed successively by using cosine of the angle between two vectors, new distance measures, and cosine function. Some key properties of these similarity measures (SMs) are explored. Based on suggested SMs, the problem of bacteria recognition is analyzed and an important application is provided to exhibit the efficiency of proposed SMs for CBF information. Moreover, the TOPSIS approach based on cosine SMs is developed for multicriteria group decision-making (MCGDM) problems. An illustrative example about the selection of sustainable plastic recycling process is presented to discuss the efficiency of the suggested MCGDM technique.

Download Full-text

Intuitionistic Fuzzy Similarity Measures and Their Role in Classification

Journal of Intelligent Systems ◽

10.1515/jisys-2015-0086 ◽

2016 ◽

Vol 25 (2) ◽

pp. 221-237 ◽

Cited By ~ 5

Author(s):

Leila Baccour ◽

Adel M. Alimi ◽

Robert I. John

Keyword(s):

Similarity Measures ◽

Intuitionistic Fuzzy Sets ◽

Distance Measures ◽

Fuzzy Information ◽

Intuitionistic Fuzzy ◽

Fuzzy Similarity ◽

Metric Distance ◽

Handwritten Arabic ◽

Selection Of

AbstractWe present some similarity and distance measures between intuitionistic fuzzy sets (IFSs). Thus, we propose two semi-metric distance measures between IFSs. The measures are applied to classification of shapes and handwritten Arabic sentences described with intuitionistic fuzzy information. The experimental results permitted to do a comparative analysis between intuitionistic fuzzy similarity and distance measures, which can facilitate the selection of such measure in similar applications.

Download Full-text

Kernel-Based Robust Bias-Correction Fuzzy Weighted C-Ordered-Means Clustering Algorithm

Symmetry ◽

10.3390/sym11060753 ◽

2019 ◽

Vol 11 (6) ◽

pp. 753

Author(s):

Wenyuan Zhang ◽

Xijuan Guo ◽

Tianyu Huang ◽

Jiale Liu ◽

Jun Chen

Keyword(s):

Bias Correction ◽

Euclidean Distance ◽

Clustering Algorithm ◽

Distance Measure ◽

Similarity Measures ◽

Distance Measures ◽

Background Information ◽

Local Similarity ◽

Original Algorithm ◽

Fcm Clustering

The spatial constrained Fuzzy C-means clustering (FCM) is an effective algorithm for image segmentation. Its background information improves the insensitivity to noise to some extent. In addition, the membership degree of Euclidean distance is not suitable for revealing the non-Euclidean structure of input data, since it still lacks enough robustness to noise and outliers. In order to overcome the problem above, this paper proposes a new kernel-based algorithm based on the Kernel-induced Distance Measure, which we call it Kernel-based Robust Bias-correction Fuzzy Weighted C-ordered-means Clustering Algorithm (KBFWCM). In the construction of the objective function, KBFWCM algorithm comprehensively takes into account that the spatial constrained FCM clustering algorithm is insensitive to image noise and involves a highly intensive computation. Aiming at the insensitivity of spatial constrained FCM clustering algorithm to noise and its image detail processing, the KBFWCM algorithm proposes a comprehensive algorithm combining fuzzy local similarity measures (space and grayscale) and the typicality of data attributes. Aiming at the poor robustness of the original algorithm to noise and outliers and its highly intensive computation, a Kernel-based clustering method that includes a class of robust non-Euclidean distance measures is proposed in this paper. The experimental results show that the KBFWCM algorithm has a stronger denoising and robust effect on noise image.

Download Full-text

MULTICRITERIA SELECTION OF PROJECT MANAGERS BY APPLYING GREY CRITERIA / PROJEKTŲ VALDYTOJO PARINKIMO DAUGIATIKSLIO VERTINIMO MODELIS

Technological and Economic Development of Economy ◽

10.3846/1392-8619.2008.14.462-477 ◽

2008 ◽

Vol 14 (4) ◽

pp. 462-477 ◽

Cited By ~ 109

Author(s):

Edmundas Kazimieras Zavadskas ◽

Zenonas Turskis ◽

Jolanta Tamošaitienė

Keyword(s):

Construction Projects ◽

Construction Project ◽

Project Managers ◽

Project Manager ◽

Management Personnel ◽

Grey Relational Grade ◽

Effective Decision ◽

Grey Relational ◽

Selection Of

There is a number of criteria and associated sub‐criteria influencing the match of managers to construction projects. Criteria and sub‐criteria were identified based on a thorough review of the related literature and interviews of management personnel involved in the project managers selection. Project managers characteristics are considered to be less important for an effective project management. The model is based on multicriteria evaluation of project managers. The evaluation embraces the identified criteria influencing the process of construction project manager selection. This paper considers the application of grey relations methodology to defining the utility of alternatives, and offers a multiple criteria method of Complex Proportional Assessment of alternatives with grey relations (COPRAS‐G) for analysis. In this model, the parameters of the alternatives are determined by the grey relational grade and expressed in terms of intervals. A case study presents the selection of construction project manager. The results obtained show that this method may be used as an effective decision aid in multicriteria selection. Santrauka Rodiklių ir susietų subrodiklių skaičius daro poveikį statybos projektų vadovo parinkimui. Išnagrinėjus literatūrą buvo atrinkti pagrindiniai rodikliai, turintys įtaką projektų valdytojo atrankai. Išskirtos svarbiausios projektų valdytojo charakteristikos, gerinančios efektyvų statybos projektų valdymą. Projektų valdytojo modelis pagrįstas daugiatikslio vertinimo metodais. Straipsnyje apžvelgtas intervalais išreikštų sprendinių metodologijos pritaikymas, apibrėžtas alternatyvų naudingumas. Pateiktas daugiatikslio vertinimo kompleksinio proporcingumo įvertinimo metodas naudojant intervalais išreikštus rodiklius intervalais (COPRAS-G). Pateikiamame modelyje alternatyvų parametrai išreikšti intervalais, alternatyvos ranguojamos, nustatomas jų prioritetas. Sprendžiamas uždavinio pavyzdys rodo efektyvų statybos projektų valdytojo parinkimą. Tai iliustruoja gauti rezultatai.

Download Full-text

Grey Comprehensive Relational Analysis of Influencing Factors of Communication Equipment Fighting Efficiency

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.765-767.2469 ◽

2013 ◽

Vol 765-767 ◽

pp. 2469-2473

Author(s):

Hai Xia Yu ◽

Guang Dong Liang ◽

Xian Jun Gao ◽

Hong Chun Zhang

Keyword(s):

Qualitative Analysis ◽

Influencing Factors ◽

Communication System ◽

Practical Application ◽

Early Communication ◽

Relational Analysis ◽

Grey Relational Grade ◽

Grey Relational ◽

Communication Equipment ◽

Selection Of

The paper introduced grey comprehensive relational grade into transverse analysis and compare for several similar communication equipments. By analyze the grey relational principle, the relational coefficient, the selection of the distinguishing coefficient and studying several relational grades, the grey comprehensive relational grade is successfully applied to analysis of communication equipment fighting efficiency .Results of the practical application show that the method based on the comprehensive grey relational grade analysis has the advantages of simplified computation and feasible, consistency to the qualitative analysis result ,It can provides theoretical guidance for targeted maintenance of the equipment and improve and increase rapidly fighting efficiency of airborne early communication system in war.

Download Full-text

Grey relational grade decision model for selection of project delivery system

2009 IEEE International Conference on Grey Systems and Intelligent Services (GSIS 2009) ◽

10.1109/gsis.2009.5408015 ◽

2009 ◽

Cited By ~ 2

Author(s):

Li Huimin ◽

Wang Zhuofu

Keyword(s):

Delivery System ◽

Decision Model ◽

Project Delivery ◽

Grey Relational Grade ◽

Grey Relational ◽

Selection Of

Download Full-text

DRSA: a non-hierarchical clustering algorithm using k-NN graph and its application in vegetation classification

Vegetation of Russia ◽

10.31111/vegrus/2015.27.125 ◽

2015 ◽

pp. 125-138 ◽

Cited By ~ 2

Author(s):

I. V. Goncharenko

Keyword(s):

Cluster Analysis ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Clustering Algorithms ◽

Protein Structures ◽

Hierarchical Cluster ◽

Vegetation Classification ◽

K Nearest Neighbor ◽

Neighbor Graph ◽

Nearest Neighbor Graph

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classiﬁcation was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.

Download Full-text