Exemplifying the Effects of Distance Metrics on Clustering Techniques: F-measure, Accuracy and Efficiency

Author(s):  
Tasleem Nizam ◽  
Sayed Imtiyaz Hassan
2020 ◽  
pp. 1-12
Author(s):  
Ayla Gülcü ◽  
Sedrettin Çalişkan

Collateral mechanism in the Electricity Market ensures the payments are executed on a timely manner; thus maintains the continuous cash flow. In order to value collaterals, Takasbank, the authorized central settlement bank, creates segments of the market participants by considering their short-term and long-term debt/credit information arising from all market activities. In this study, the data regarding participants’ daily and monthly debt payment and penalty behaviors is analyzed with the aim of discovering high-risk participants that fail to clear their debts on-time frequently. Different clustering techniques along with different distance metrics are considered to obtain the best clustering. Moreover, data preprocessing techniques along with Recency, Frequency, Monetary Value (RFM) scoring have been used to determine the best representation of the data. The results show that Agglomerative Clustering with cosine distance achieves the best separated clustering when the non-normalized dataset is used; this is also acknowledged by a domain expert.


Author(s):  
Amolkumar Narayan Jadhav ◽  
Gomathi N.

The widespread application of clustering in various fields leads to the discovery of different clustering techniques in order to partition multidimensional data into separable clusters. Although there are various clustering approaches used in literature, optimized clustering techniques with multi-objective consideration are rare. This paper proposes a novel data clustering algorithm, Enhanced Kernel-based Exponential Grey Wolf Optimization (EKEGWO), handling two objectives. EKEGWO, which is the extension of KEGWO, adopts weight exponential functions to improve the searching process of clustering. Moreover, the fitness function of the algorithm includes intra-cluster distance and the inter-cluster distance as an objective to provide an optimum selection of cluster centroids. The performance of the proposed technique is evaluated by comparing with the existing approaches PSC, mPSC, GWO, and EGWO for two datasets: banknote authentication and iris. Four metrics, Mean Square Error (MSE), F-measure, rand and jaccord coefficient, estimates the clustering efficiency of the algorithm. The proposed EKEGWO algorithm can attain an MSE of 837, F-measure of 0.9657, rand coefficient of 0.8472, jaccord coefficient of 0.7812, for the banknote dataset.


2020 ◽  
Vol 9 (2) ◽  
pp. 171
Author(s):  
Munirsyah Munirsyah ◽  
Moch. Arif Bijaksana ◽  
Widi Astuti

Wordnet is a collection of words that interpret or present a meaning, in its development Wordnet has an important part, the Synonym Set or Synset. In making Synonym sets, synonyms are needed and the commutative nature of words is needed. To get word synonyms, the English language thesaurus becomes the reference data for taking synonym data. Broadly speaking, the difference between Wordnet and the dictionary is that the meaning of the word is related to other words, to determine the equation requires a commutative process. The process is made easy by using commutative methods that will produce a candidate synonym set. Candidates for the synonym set cannot be used for word syntax, the grouping process of words which produces the Synonym set as the final result must be carried out. The process of grouping words can one of them use clustering techniques, in this study will use Agglomerative Clustering techniques. In the process of agglomerative clustering techniques there is a threshold value to determine the number of repetitions or as a condition to stop the iteration process. The clustering process in this study will use a threshold value of 0.1 to 1 to test the best threshold value to produce the best Synonym set and calculate its accuracy value. Accuracy calculation and evaluation will use the F-measure method to find the best results.


2020 ◽  
Author(s):  
Andrea Giani ◽  
de Souza Patricia Borges ◽  
Stefania Bartoletti ◽  
Flavio Morselli ◽  
Andrea Conti ◽  
...  

2019 ◽  
Vol 7 (3) ◽  
pp. 50-54
Author(s):  
N. Thilagavathi ◽  
Christy Wood ◽  
V. Hemalakshumi ◽  
V. Mathumiithaa

Author(s):  
Wing Chiu Tam ◽  
Osei Poku ◽  
R. D. (Shawn) Blanton

Abstract Systematic defects due to design-process interactions are a dominant component of integrated circuit (IC) yield loss in nano-scaled technologies. Test structures do not adequately represent the product in terms of feature diversity and feature volume, and therefore are unable to identify all the systematic defects that affect the product. This paper describes a method that uses diagnosis to identify layout features that do not yield as expected. Specifically, clustering techniques are applied to layout snippets of diagnosis-implicated regions from (ideally) a statistically-significant number of IC failures for identifying feature commonalties. Experiments involving an industrial chip demonstrate the identification of possible systematic yield loss due to lithographic hotspots.


2018 ◽  
Vol 9 (2) ◽  
pp. 97-105
Author(s):  
Richard Firdaus Oeyliawan ◽  
Dennis Gunawan

Library is one of the facilities which provides information, knowledge resource, and acts as an academic helper for readers to get the information. The huge number of books which library has, usually make readers find the books with difficulty. Universitas Multimedia Nusantara uses the Senayan Library Management System (SLiMS) as the library catalogue. SLiMS has many features which help readers, but there is still no recommendation feature to help the readers finding the books which are relevant to the specific book that readers choose. The application has been developed using Vector Space Model to represent the document in vector model. The recommendation in this application is based on the similarity of the books description. Based on the testing phase using one-language sample of the relevant books, the F-Measure value gained is 55% using 0.1 as cosine similarity threshold. The books description and variety of languages affect the F-Measure value gained. Index Terms—Book Recommendation, Porter Stemmer, SLiMS Universitas Multimedia Nusantara, TF-IDF, Vector Space Model


Sign in / Sign up

Export Citation Format

Share Document