scholarly journals An efficient distance estimation and centroid selection based on k-means clustering for small and large dataset

Author(s):  
Girdhar Gopal Ladha ◽  
Ravi Kumar Singh Pippal

In this paper an efficient distance estimation and centroid selection based on k-means clustering for small and large dataset. Data pre-processing was performed first on the dataset. For the complete study and analysis PIMA Indian diabetes dataset was considered. After pre-processing distance and centroid estimation was performed. It includes initial selection based on randomization and then centroids updations were performed till the iterations or epochs determined. Distance measures used here are Euclidean distance (Ed), Pearson Coefficient distance (PCd), Chebyshev distance (Csd) and Canberra distance (Cad). The results indicate that all the distance algorithms performed approximately well in case of clustering but in terms of time Cad outperforms in comparison to other algorithms.

2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Shumpei Haginoya ◽  
Aiko Hanayama ◽  
Tamae Koike

Purpose The purpose of this paper was to compare the accuracy of linking crimes using geographical proximity between three distance measures: Euclidean (distance measured by the length of a straight line between two locations), Manhattan (distance obtained by summing north-south distance and east-west distance) and the shortest route distances. Design/methodology/approach A total of 194 cases committed by 97 serial residential burglars in Aomori Prefecture in Japan between 2004 and 2015 were used in the present study. The Mann–Whitney U test was used to compare linked (two offenses committed by the same offender) and unlinked (two offenses committed by different offenders) pairs for each distance measure. Discrimination accuracy between linked and unlinked crime pairs was evaluated using area under the receiver operating characteristic curve (AUC). Findings The Mann–Whitney U test showed that the distances of the linked pairs were significantly shorter than those of the unlinked pairs for all distance measures. Comparison of the AUCs showed that the shortest route distance achieved significantly higher accuracy compared with the Euclidean distance, whereas there was no significant difference between the Euclidean and the Manhattan distance or between the Manhattan and the shortest route distance. These findings give partial support to the idea that distance measures taking the impact of environmental factors into consideration might be able to identify a crime series more accurately than Euclidean distances. Research limitations/implications Although the results suggested a difference between the Euclidean and the shortest route distance, it was small, and all distance measures resulted in outstanding AUC values, probably because of the ceiling effects. Further investigation that makes the same comparison in a narrower area is needed to avoid this potential inflation of discrimination accuracy. Practical implications The shortest route distance might contribute to improving the accuracy of crime linkage based on geographical proximity. However, further investigation is needed to recommend using the shortest route distance in practice. Given that the targeted area in the present study was relatively large, the findings may contribute especially to improve the accuracy of proactive comparative case analysis for estimating the whole picture of the distribution of serial crimes in the region by selecting more effective distance measure. Social implications Implications to improve the accuracy in linking crimes may contribute to assisting crime investigations and the earlier arrest of offenders. Originality/value The results of the present study provide an initial indication of the efficacy of using distance measures taking environmental factors into account.


2020 ◽  
Vol 28 (1) ◽  
pp. 51-63 ◽  
Author(s):  
Rodrigo Naranjo ◽  
Matilde Santos ◽  
Luis Garmendia

A new method to measure the distance between fuzzy singletons (FSNs) is presented. It first fuzzifies a crisp number to a generalized trapezoidal fuzzy number (GTFN) using the Mamdani fuzzification method. It then treats an FSN as an impulse signal and transforms the FSN into a new GTFN by convoluting it with the original GTFN. In so doing, an existing distance measure for GTFNs can be used to measure distance between FSNs. It is shown that the new measure offers a desirable behavior over the Euclidean and weighted distance measures in the following sense: Under the new measure, the distance between two FSNs is larger when they are in different GTFNs, and smaller when they are in the same GTFN. The advantage of the new measure is demonstrated on a fuzzy forecasting trading system over two different real stock markets, which provides better predictions with larger profits than those obtained using the Euclidean distance measure for the same system.


2016 ◽  
Vol 8 (2) ◽  
pp. 23
Author(s):  
Songul Cinaroglu

<p>Out of pocket health expenditures points out to the payments made by households at the point<br />they receive health services. Frequently these include doctor consultation fees, purchase of<br />medication and hospital bills. In this study hierarchical clustering method was used for<br />classification of 34 countries which are members of OECD (Organization for Economic<br />Cooperation and Development) in terms of out of pocket health expenditures for the years<br />between 1995-2011. Longest common subsequences (LCS), correlation coefficient and<br />Euclidean distance measure was used as a measure of similarity and distance in hierarchical<br />clustering. At the end of the analysis it was found that LCS and Euclidean distance measures<br />were the best for determining clusters. Furthermore, study results led to understand grouping<br />of OECD countries according to health expenditures.</p>


The main aim of this paper is to handle centroid calculation in k-means efficiently. So that the distance estimation will be more accurate and prominent results will be fetched in terms of clustering. For this PIMA database has been considered. Data preprocessing has been performed for the unwanted data removal in terms of missing values. Then centroid initialization has been performed based on centroid tuning and randomization. For distance estimation Euclidean, Pearson Coefficient, Chebyshev and Canberra algorithms has been used. In this paper the evaluation has been performed based on the computational time analysis. The time calculation has been performed on different random sets. It is found to be prominent in all the cases considering the variations in all aspects of distance and population


Author(s):  
E. VENKATESWARLU ◽  
K.SOUNDARA RAJAN

This paper presents an approach for image retrieval by using multiwavelet and hsv color space. The HSV stands for the Hue, Saturation and Value, provides the perception representation according with human visual feature. The multiwavelets offer simultaneous orthogonality, symmetry and short support. In this paper, we have tested 140 images with 5 different categories. the experimental results show the better results interms of retrieval accuracy and computation complexity. The performance of this approach is measured and results are shown. Euclidean Distance and Canberra Distance are used as similarity measure in the proposed CBIR system.


2017 ◽  
Vol 248 ◽  
pp. 11-18 ◽  
Author(s):  
Diego P.P. Mesquita ◽  
João P.P. Gomes ◽  
Amauri H. Souza Junior ◽  
Juvêncio S. Nobre

Sensors ◽  
2019 ◽  
Vol 19 (14) ◽  
pp. 3142 ◽  
Author(s):  
Sai Krishna Pathi ◽  
Andrey Kiselev ◽  
Annica Kristoffersson ◽  
Dirk Repsilber ◽  
Amy Loutfi

Estimating distances between people and robots plays a crucial role in understanding social Human–Robot Interaction (HRI) from an egocentric view. It is a key step if robots should engage in social interactions, and to collaborate with people as part of human–robot teams. For distance estimation between a person and a robot, different sensors can be employed, and the number of challenges to be addressed by the distance estimation methods rise with the simplicity of the technology of a sensor. In the case of estimating distances using individual images from a single camera in a egocentric position, it is often required that individuals in the scene are facing the camera, do not occlude each other, and are fairly visible so specific facial or body features can be identified. In this paper, we propose a novel method for estimating distances between a robot and people using single images from a single egocentric camera. The method is based on previously proven 2D pose estimation, which allows partial occlusions, cluttered background, and relatively low resolution. The method estimates distance with respect to the camera based on the Euclidean distance between ear and torso of people in the image plane. Ear and torso characteristic points has been selected based on their relatively high visibility regardless of a person orientation and a certain degree of uniformity with regard to the age and gender. Experimental validation demonstrates effectiveness of the proposed method.


2020 ◽  
Vol 8 (4) ◽  
Author(s):  
Juan Luis Villareal–Haro ◽  
Alonso Ramirez–Manzanares ◽  
Juan Antonio Pichardo-Corpus

Abstract Measuring differences among complex networks is a well-studied research topic. Particularly, in the context of brain networks, there are several proposals. Nevertheless, most of them address the problem considering unweighted networks. Here, we propose a metric based on modularity and Jaccard index to measure differences among brain-connectivity weighted networks built from diffusion-weighted magnetic resonance data. We use a large dataset to test our metric: a synthetic Ground Truth network (GT) and a set of networks available from a tractography challenge, three sets computed from GT perturbations, and a set of classic random graphs. We compare the performance of our proposal with the most used methods as Euclidean distance between matrices and a kernel-based distance. Our results indicate that the proposed metric outperforms those previously published distances. More importantly, this work provides a methodology that allows differentiating diverse groups of graphs based on their differences in topological structure.


Symmetry ◽  
2019 ◽  
Vol 11 (6) ◽  
pp. 753
Author(s):  
Wenyuan Zhang ◽  
Xijuan Guo ◽  
Tianyu Huang ◽  
Jiale Liu ◽  
Jun Chen

The spatial constrained Fuzzy C-means clustering (FCM) is an effective algorithm for image segmentation. Its background information improves the insensitivity to noise to some extent. In addition, the membership degree of Euclidean distance is not suitable for revealing the non-Euclidean structure of input data, since it still lacks enough robustness to noise and outliers. In order to overcome the problem above, this paper proposes a new kernel-based algorithm based on the Kernel-induced Distance Measure, which we call it Kernel-based Robust Bias-correction Fuzzy Weighted C-ordered-means Clustering Algorithm (KBFWCM). In the construction of the objective function, KBFWCM algorithm comprehensively takes into account that the spatial constrained FCM clustering algorithm is insensitive to image noise and involves a highly intensive computation. Aiming at the insensitivity of spatial constrained FCM clustering algorithm to noise and its image detail processing, the KBFWCM algorithm proposes a comprehensive algorithm combining fuzzy local similarity measures (space and grayscale) and the typicality of data attributes. Aiming at the poor robustness of the original algorithm to noise and outliers and its highly intensive computation, a Kernel-based clustering method that includes a class of robust non-Euclidean distance measures is proposed in this paper. The experimental results show that the KBFWCM algorithm has a stronger denoising and robust effect on noise image.


Sign in / Sign up

Export Citation Format

Share Document