Exemplifying the Effects of Distance Metrics on Clustering Techniques: F-measure, Accuracy and Efficiency

Collateral mechanism in the Electricity Market ensures the payments are executed on a timely manner; thus maintains the continuous cash flow. In order to value collaterals, Takasbank, the authorized central settlement bank, creates segments of the market participants by considering their short-term and long-term debt/credit information arising from all market activities. In this study, the data regarding participants’ daily and monthly debt payment and penalty behaviors is analyzed with the aim of discovering high-risk participants that fail to clear their debts on-time frequently. Different clustering techniques along with different distance metrics are considered to obtain the best clustering. Moreover, data preprocessing techniques along with Recency, Frequency, Monetary Value (RFM) scoring have been used to determine the best representation of the data. The results show that Agglomerative Clustering with cosine distance achieves the best separated clustering when the non-normalized dataset is used; this is also acknowledged by a domain expert.

Download Full-text

EKEGWO: Enhanced Kernel-Based Exponential Grey Wolf Optimizer for Bi-Objective Data Clustering

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488519500296 ◽

2019 ◽

Vol 27 (04) ◽

pp. 669-688 ◽

Cited By ~ 1

Author(s):

Amolkumar Narayan Jadhav ◽

Gomathi N.

Keyword(s):

Data Clustering ◽

Clustering Algorithm ◽

Fitness Function ◽

Multidimensional Data ◽

Grey Wolf Optimizer ◽

Grey Wolf ◽

Widespread Application ◽

Clustering Techniques ◽

Cluster Distance ◽

F Measure

The widespread application of clustering in various fields leads to the discovery of different clustering techniques in order to partition multidimensional data into separable clusters. Although there are various clustering approaches used in literature, optimized clustering techniques with multi-objective consideration are rare. This paper proposes a novel data clustering algorithm, Enhanced Kernel-based Exponential Grey Wolf Optimization (EKEGWO), handling two objectives. EKEGWO, which is the extension of KEGWO, adopts weight exponential functions to improve the searching process of clustering. Moreover, the fitness function of the algorithm includes intra-cluster distance and the inter-cluster distance as an objective to provide an optimum selection of cluster centroids. The performance of the proposed technique is evaluated by comparing with the existing approaches PSC, mPSC, GWO, and EGWO for two datasets: banknote authentication and iris. Four metrics, Mean Square Error (MSE), F-measure, rand and jaccord coefficient, estimates the clustering efficiency of the algorithm. The proposed EKEGWO algorithm can attain an MSE of 837, F-measure of 0.9657, rand coefficient of 0.8472, jaccord coefficient of 0.7812, for the banknote dataset.

Download Full-text

Development Synonym Set for the English Wordnet Using the Method of Comutative and Agglomerative Clustering

Jurnal Sisfokom (Sistem Informasi dan Komputer) ◽

10.32736/sisfokom.v9i2.855 ◽

2020 ◽

Vol 9 (2) ◽

pp. 171

Author(s):

Munirsyah Munirsyah ◽

Moch. Arif Bijaksana ◽

Widi Astuti

Keyword(s):

English Language ◽

Reference Data ◽

Iteration Process ◽

Threshold Value ◽

Agglomerative Clustering ◽

Clustering Techniques ◽

The Difference ◽

F Measure

Wordnet is a collection of words that interpret or present a meaning, in its development Wordnet has an important part, the Synonym Set or Synset. In making Synonym sets, synonyms are needed and the commutative nature of words is needed. To get word synonyms, the English language thesaurus becomes the reference data for taking synonym data. Broadly speaking, the difference between Wordnet and the dictionary is that the meaning of the word is related to other words, to determine the equation requires a commutative process. The process is made easy by using commutative methods that will produce a candidate synonym set. Candidates for the synonym set cannot be used for word syntax, the grouping process of words which produces the Synonym set as the final result must be carried out. The process of grouping words can one of them use clustering techniques, in this study will use Agglomerative Clustering techniques. In the process of agglomerative clustering techniques there is a threshold value to determine the number of repetitions or as a condition to stop the iteration process. The clustering process in this study will use a threshold value of 0.1 to 1 to test the best threshold value to produce the best Synonym set and calculate its accuracy value. Accuracy calculation and evaluation will use the F-measure method to find the best results.

Download Full-text

Clustering techniques for thyroid nodules malignancy inference in the era of personalized medicine

Endocrine Abstracts ◽

10.1530/endoabs.70.ep445 ◽

2020 ◽

Author(s):

Andrea Giani ◽

de Souza Patricia Borges ◽

Stefania Bartoletti ◽

Flavio Morselli ◽

Andrea Conti ◽

...

Keyword(s):

Personalized Medicine ◽

Thyroid Nodules ◽

Clustering Techniques

Download Full-text

A Survival Study on Data Structure Based Clustering Techniques for Multidimensional Data Stream Analysis

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v5i12.101108 ◽

2017 ◽

Vol 5 (12) ◽

pp. 101-108

Author(s):

K. Chitra ◽

◽

D. Maheswari

Keyword(s):

Data Structure ◽

Data Stream ◽

Multidimensional Data ◽

Clustering Techniques ◽

Survival Study ◽

Data Stream Analysis

Download Full-text

A State of Art Approaches on Energy Efficient Clustering Techniques in WSN

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i3.5054 ◽

2019 ◽

Vol 7 (3) ◽

pp. 50-54

Author(s):

N. Thilagavathi ◽

Christy Wood ◽

V. Hemalakshumi ◽

V. Mathumiithaa

Keyword(s):

Energy Efficient ◽

Clustering Techniques ◽

Energy Efficient Clustering ◽

State Of Art

Download Full-text

Examination of Clustering Techniques using Genetic Algorithm

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i4.374378 ◽

2018 ◽

Vol 6 (4) ◽

pp. 374-378

Author(s):

S. Ramya ◽

◽

N. Subha

Keyword(s):

Genetic Algorithm ◽

Clustering Techniques

Download Full-text

Systematic Defect Identification through Layout Snippet Clustering

ISTFA 2010: Conference Proceedings from the 36th International Symposium for Testing and Failure Analysis ◽

10.31399/asm.cp.istfa2010p0320 ◽

2010 ◽

Author(s):

Wing Chiu Tam ◽

Osei Poku ◽

R. D. (Shawn) Blanton

Keyword(s):

Design Process ◽

Integrated Circuit ◽

Yield Loss ◽

Defect Identification ◽

Clustering Techniques ◽

Dominant Component

Abstract Systematic defects due to design-process interactions are a dominant component of integrated circuit (IC) yield loss in nano-scaled technologies. Test structures do not adequately represent the product in terms of feature diversity and feature volume, and therefore are unable to identify all the systematic defects that affect the product. This paper describes a method that uses diagnosis to identify layout features that do not yield as expected. Specifically, clustering techniques are applied to layout snippets of diagnosis-implicated regions from (ideally) a statistically-significant number of IC failures for identifying feature commonalties. Experiments involving an industrial chip demonstrate the identification of possible systematic yield loss due to lithographic hotspots.

Download Full-text

Estimation and Analysis of Heart Disease using Novel Clustering Techniques

International Journal of Pharmaceutical Research ◽

10.31838/ijpr/2020.sp2.438 ◽

2020 ◽

Vol 12 (sp2) ◽

Keyword(s):

Heart Disease ◽

Clustering Techniques

Download Full-text

Aplikasi Rekomendasi Buku Pada Katalog Perpustakaan Universitas Multimedia Nusantara Menggunakan Vector Space Model

Jurnal ULTIMATICS ◽

10.31937/ti.v9i2.639 ◽

2018 ◽

Vol 9 (2) ◽

pp. 97-105

Author(s):

Richard Firdaus Oeyliawan ◽

Dennis Gunawan

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Vector Model ◽

Library Management ◽

Space Model ◽

Library Management System ◽

Index Terms ◽

Library Catalogue ◽

Language Sample ◽

F Measure

Library is one of the facilities which provides information, knowledge resource, and acts as an academic helper for readers to get the information. The huge number of books which library has, usually make readers find the books with difficulty. Universitas Multimedia Nusantara uses the Senayan Library Management System (SLiMS) as the library catalogue. SLiMS has many features which help readers, but there is still no recommendation feature to help the readers finding the books which are relevant to the specific book that readers choose. The application has been developed using Vector Space Model to represent the document in vector model. The recommendation in this application is based on the similarity of the books description. Based on the testing phase using one-language sample of the relevant books, the F-Measure value gained is 55% using 0.1 as cosine similarity threshold. The books description and variety of languages affect the F-Measure value gained. Index Terms—Book Recommendation, Porter Stemmer, SLiMS Universitas Multimedia Nusantara, TF-IDF, Vector Space Model

Download Full-text