scholarly journals Komparasi Distance Measure Pada K-Medoids Clustering untuk Pengelompokkan Penyakit ISPA

2021 ◽  
Vol 5 (1) ◽  
pp. 99-107
Author(s):  
Mia Nuranti Putri Pamulang ◽  
◽  
Mia Nuur Aini ◽  
Ultach Enri3 ◽  
◽  
...  

K-Medoids is an unsupervised algorithm that uses a distance measure to classify data. The distance measure is a method that can help an algorithm classify data based on the similarity of the variables. Several studies have shown that using the right distance measure can improve the performance of the algorithm in clustering. Euclidean and Chebyshev is two of some distance measures that can be used. In 2016, Karawang Health Office stated that 175.891 Karawang citizens were suffering from ISPA. This figure continued to increase in the following year until 2019. The total of Karawang citizens who suffering from ISPA reached 181.945 people. To assist the government in overcoming this problem, a clustering process will be carried out to group the areas where the ISPA is spreading in Karawang District. The area will be divided into three clusters, namely low, medium and high. Comparison of distance measures is carried out to find the best model based on the evaluation of the Davies Bouldin Index (DBI). The use of Euclidean-distance produces a DBI score of 0,088 meanwhile the use of Chebyshev distance resulted in a DBI score of 0,116. The performance of the K-Medoids algorithm with Euclidean-distance is considered to be better than Chebyshev distance because it produces a DBI score that is near to 0.

2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Shumpei Haginoya ◽  
Aiko Hanayama ◽  
Tamae Koike

Purpose The purpose of this paper was to compare the accuracy of linking crimes using geographical proximity between three distance measures: Euclidean (distance measured by the length of a straight line between two locations), Manhattan (distance obtained by summing north-south distance and east-west distance) and the shortest route distances. Design/methodology/approach A total of 194 cases committed by 97 serial residential burglars in Aomori Prefecture in Japan between 2004 and 2015 were used in the present study. The Mann–Whitney U test was used to compare linked (two offenses committed by the same offender) and unlinked (two offenses committed by different offenders) pairs for each distance measure. Discrimination accuracy between linked and unlinked crime pairs was evaluated using area under the receiver operating characteristic curve (AUC). Findings The Mann–Whitney U test showed that the distances of the linked pairs were significantly shorter than those of the unlinked pairs for all distance measures. Comparison of the AUCs showed that the shortest route distance achieved significantly higher accuracy compared with the Euclidean distance, whereas there was no significant difference between the Euclidean and the Manhattan distance or between the Manhattan and the shortest route distance. These findings give partial support to the idea that distance measures taking the impact of environmental factors into consideration might be able to identify a crime series more accurately than Euclidean distances. Research limitations/implications Although the results suggested a difference between the Euclidean and the shortest route distance, it was small, and all distance measures resulted in outstanding AUC values, probably because of the ceiling effects. Further investigation that makes the same comparison in a narrower area is needed to avoid this potential inflation of discrimination accuracy. Practical implications The shortest route distance might contribute to improving the accuracy of crime linkage based on geographical proximity. However, further investigation is needed to recommend using the shortest route distance in practice. Given that the targeted area in the present study was relatively large, the findings may contribute especially to improve the accuracy of proactive comparative case analysis for estimating the whole picture of the distribution of serial crimes in the region by selecting more effective distance measure. Social implications Implications to improve the accuracy in linking crimes may contribute to assisting crime investigations and the earlier arrest of offenders. Originality/value The results of the present study provide an initial indication of the efficacy of using distance measures taking environmental factors into account.


Author(s):  
Julian Le Grand ◽  
Bill New

This chapter examines the politics of paternalism. It first considers the question of whether the government can do better than the individual, outlining a set of justifications for government paternalism and showing how the state can intervene to improve the well-being of its citizens. It then discusses possible ways in which the government could be held to account to ensure that, in its paternalistic interventions aimed at improving its citizens' well-being, it does actually pursue the “right” agenda. It argues that the government can indeed raise the well-being of individuals who suffer from reasoning failure, even when allowance is made for possible reasoning failure among those individuals who constitute the government. However, democratic mechanisms must be put in place to ensure that the latter do not pursue their own agenda and turn the paternalistic state into an instrument of authoritarianism.


2013 ◽  
Vol 457-458 ◽  
pp. 1064-1068
Author(s):  
Dan Li ◽  
Xin Bao Li

K-means Algorithm is a popular method in cluster analysis, and it is most based on the Euclidean distance. In this paper, a modified version of the K-means algorithm based on the shape similarity distance (SSD-K-means) is presented. The shape similarity distance is one kind of non-metric distance measure for similarity estimation based on the characteristic of differences. To demonstrate the effectiveness of the method we proposed, this new algorithm has been tested on three shape data datasets. Experiment results prove that the performance of the SSD-K-means is better than those of the classical K-means algorithm based on the traditional Euclidean and Manhattan distances.


2021 ◽  
Vol 12 (1) ◽  
pp. 69
Author(s):  
Lu Wei ◽  
Zheng Qian ◽  
Yan Pei ◽  
Jingyue Wang

Wind farm operators are overwhelmed by a large amount of supervisory control and data acquisition (SCADA) alarms when faults occur. This paper presents an online root fault identification method for SCADA alarms to assist operators in wind turbine fault diagnosis. The proposed method is based on the similarity analysis between an unknown alarm vector and the feature vectors of known faults. The alarm vector is obtained from segmented alarm lists, which are filtered and simplified. The feature vector, which is a unique signature representing the occurrence of a fault, is extracted from the alarm lists belonging to the same fault. To mine the coupling correspondence between alarms and faults, we define the weights of the alarms in each fault. The similarities is measured by the weighted Euclidean distance and the weighted Hamming distance, respectively. One year of SCADA alarms and maintenance records are used to verify the proposed method. The results show that the performance of the weighted Hamming distance is better than that of the weighted Euclidean distance; 84.1% of alarm lists are labeled with the right root fault.


2020 ◽  
Vol 28 (1) ◽  
pp. 51-63 ◽  
Author(s):  
Rodrigo Naranjo ◽  
Matilde Santos ◽  
Luis Garmendia

A new method to measure the distance between fuzzy singletons (FSNs) is presented. It first fuzzifies a crisp number to a generalized trapezoidal fuzzy number (GTFN) using the Mamdani fuzzification method. It then treats an FSN as an impulse signal and transforms the FSN into a new GTFN by convoluting it with the original GTFN. In so doing, an existing distance measure for GTFNs can be used to measure distance between FSNs. It is shown that the new measure offers a desirable behavior over the Euclidean and weighted distance measures in the following sense: Under the new measure, the distance between two FSNs is larger when they are in different GTFNs, and smaller when they are in the same GTFN. The advantage of the new measure is demonstrated on a fuzzy forecasting trading system over two different real stock markets, which provides better predictions with larger profits than those obtained using the Euclidean distance measure for the same system.


2016 ◽  
Vol 8 (2) ◽  
pp. 23
Author(s):  
Songul Cinaroglu

<p>Out of pocket health expenditures points out to the payments made by households at the point<br />they receive health services. Frequently these include doctor consultation fees, purchase of<br />medication and hospital bills. In this study hierarchical clustering method was used for<br />classification of 34 countries which are members of OECD (Organization for Economic<br />Cooperation and Development) in terms of out of pocket health expenditures for the years<br />between 1995-2011. Longest common subsequences (LCS), correlation coefficient and<br />Euclidean distance measure was used as a measure of similarity and distance in hierarchical<br />clustering. At the end of the analysis it was found that LCS and Euclidean distance measures<br />were the best for determining clusters. Furthermore, study results led to understand grouping<br />of OECD countries according to health expenditures.</p>


2019 ◽  
Vol 2019 ◽  
pp. 1-20 ◽  
Author(s):  
Semra Erpolat Taşabat

Decision-making, briefly defined as choosing the best among the possible alternatives within the possibilities and conditions available, is a far more comprehensive process than instant. While in the decision-making process, there are often a lot of criteria as well as alternatives. In this case, methods referred to as Multicriteria Decision-Making (MCDM) are applied. The main purpose of the methods is to facilitate the decision-maker's job, to guide the decision-maker and help him to make the right decisions if there are too many options. In cases where there are many criteria, effective and useful decisions have been taken for granted at the beginning of the 1960s for the first time and supported by day-to-day work. A variety of methods have been developed for this purpose. The basis of some of these methods is based on distance measures. The most known method in the literature based on the concept of distance is, of course, a method called Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS). In this study, a new MCDM method that uses distance, similarity, and correlation measures has been proposed. This new method is shortly called DSC TOPSIS to include the initials of distance, similarity, and correlation words, respectively, prefix of TOPSIS name. In the method, Euclidean was used as distance measure, cosine was used as similarity measure, and Pearson correlation was used as relation measure. Using the positive ideal and negative-ideal values obtained from these measures, respectively, a common positive ideal value and a common negative-ideal value were obtained. Afterward DSC TOPSIS is discussed in terms of standardization and weighting. The study also proposed three different new ranking indexes from the ranking index used in the traditional TOPSIS method. The proposed method has been tested on the variables showing the development levels of the countries that have a very important place today. The results obtained were compared with the Human Development Index (HDI) value developed by the United Nations.


Symmetry ◽  
2019 ◽  
Vol 11 (6) ◽  
pp. 753
Author(s):  
Wenyuan Zhang ◽  
Xijuan Guo ◽  
Tianyu Huang ◽  
Jiale Liu ◽  
Jun Chen

The spatial constrained Fuzzy C-means clustering (FCM) is an effective algorithm for image segmentation. Its background information improves the insensitivity to noise to some extent. In addition, the membership degree of Euclidean distance is not suitable for revealing the non-Euclidean structure of input data, since it still lacks enough robustness to noise and outliers. In order to overcome the problem above, this paper proposes a new kernel-based algorithm based on the Kernel-induced Distance Measure, which we call it Kernel-based Robust Bias-correction Fuzzy Weighted C-ordered-means Clustering Algorithm (KBFWCM). In the construction of the objective function, KBFWCM algorithm comprehensively takes into account that the spatial constrained FCM clustering algorithm is insensitive to image noise and involves a highly intensive computation. Aiming at the insensitivity of spatial constrained FCM clustering algorithm to noise and its image detail processing, the KBFWCM algorithm proposes a comprehensive algorithm combining fuzzy local similarity measures (space and grayscale) and the typicality of data attributes. Aiming at the poor robustness of the original algorithm to noise and outliers and its highly intensive computation, a Kernel-based clustering method that includes a class of robust non-Euclidean distance measures is proposed in this paper. The experimental results show that the KBFWCM algorithm has a stronger denoising and robust effect on noise image.


2018 ◽  
Vol 2018 ◽  
pp. 1-7 ◽  
Author(s):  
Jingqin Lv ◽  
Jiangxiong Fang

In computer vision, Euclidean Distance is generally used to measure the color distance between two colors. And how to deal with illumination change is still an important research topic. However, our evaluation results demonstrate that Euclidean Distance does not perform well under illumination change. Since human eyes can recognize similar or irrelevant colors under illumination change, a novel color distance model based on visual recognition is proposed. First, we find that various colors are distributed complexly in color spaces. We propose to divide the HSV space into three less complex subspaces, and study their specific distance models. Then a novel hue distance is modeled based on visual recognition, and the chromatic distance model is proposed in line with our visual color distance principles. Finally, the gray distance model and the dark distance model are studied according to the natures of their subspaces, respectively. Experimental results show that the proposed model outperforms Euclidean Distance and the related methods and achieves a good distance measure against illumination change. In addition, the proposed model obtains good performance for matching patches of pedestrian images. The proposed model can be applied to image segmentation, pedestrian reidentification, visual tracking, and patch or superpixel-based tasks.


2019 ◽  
Vol 2019 ◽  
pp. 1-21 ◽  
Author(s):  
Cong Liu ◽  
Qianqian Chen ◽  
Yingxia Chen ◽  
Jie Liu

Most of the existing clustering algorithms are often based on Euclidean distance measure. However, only using Euclidean distance measure may not be sufficient enough to partition a dataset with different structures. Thus, it is necessary to combine multiple distance measures into clustering. However, the weights for different distance measures are hard to set. Accordingly, it appears natural to keep multiple distance measures separately and to optimize them simultaneously by applying a multiobjective optimization technique. Recently a new clustering algorithm called ‘multiobjective evolutionary clustering based on combining multiple distance measures’ (MOECDM) was proposed to integrate Euclidean and Path distance measures together for partitioning the dataset with different structures. However, it is time-consuming due to the large-sized genes. This paper proposes a fast multiobjective fuzzy clustering algorithm for partitioning the dataset with different structures. In this algorithm, a real encoding scheme is adopted to represent the individual. Two fuzzy clustering objective functions are designed based on Euclidean and Path distance measures, respectively, to evaluate the goodness of each individual. An improved evolutionary operator is also introduced accordingly to increase the convergence speed and the diversity of the population. In the final generation, a set of nondominated solutions can be obtained. The best solution and the best distance measure are selected by using a semisupervised method. Afterwards, an updated algorithm is also designed to detect the optimal cluster number automatically. The proposed algorithms are applied to many datasets with different structures, and the results of eight artificial and six real-life datasets are shown in experiments. Experimental results have shown that the proposed algorithms can not only successfully partition the dataset with different structures, but also reduce the computational cost.


Sign in / Sign up

Export Citation Format

Share Document