scholarly journals An Empirical Perusal of Distance Measures for Clustering with Big Data Mining

The distance measure is the core idea of data mining techniques such as classification, clustering, and statistical analysis and so on. All clustering taxonomies such as partition, hierarchical, density, grid, model, fuzzy and graphs used to distance measures for the data point’s categorization under difference cluster, cluster construction and validation. Big data mining is the advanced concept of data mining respect to the big data dimensions. When traditional clustering algorithm is used under the big data mining the distance measure is needed for scalable under big data mining and support to a huge size dataset, heterogeneous data and sources, and velocity characteristics of the big data. From a theoretically, practically and the existing research perspective, the paper focuses on volume, variety, and velocity big data criterion for identifying a distance measure for the big data mining and recognize how to distance measure works under clustering taxonomy. This study also analyzed all distance measures accuracy with the help of a confusion matrix through clustering.

2021 ◽  
Vol 14 (2) ◽  
pp. 26
Author(s):  
Na Li ◽  
Lianguan Huang ◽  
Yanling Li ◽  
Meng Sun

In recent years, with the development of the Internet, the data on the network presents an outbreak trend. Big data mining aims at obtaining useful information through data processing, such as clustering, clarifying and so on. Clustering is an important branch of big data mining and it is popular because of its simplicity. A new trend for clients who lack of storage and computational resources is to outsource the data and clustering task to the public cloud platforms. However, as datasets used for clustering may contain some sensitive information (e.g., identity information, health information), simply outsourcing them to the cloud platforms can't protect the privacy. So clients tend to encrypt their databases before uploading to the cloud for clustering. In this paper, we focus on privacy protection and efficiency promotion with respect to k-means clustering, and we propose a new privacy-preserving multi-user outsourced k-means clustering algorithm which is based on locality sensitive hashing (LSH). In this algorithm, we use a Paillier cryptosystem encrypting databases, and combine LSH to prune off some unnecessary computations during the clustering. That is, we don't need to compute the Euclidean distances between each data record and each clustering center. Finally, the theoretical and experimental results show that our algorithm is more efficient than most existing privacy-preserving k-means clustering.


2015 ◽  
Vol 7 (1) ◽  
pp. 17-32
Author(s):  
J.S. Shyam Mohan ◽  
P. Shanmugapriya ◽  
Bhamidipati Vinay Pawan Kumar

Abstract Finding out the widely used URL’s from online shopping sites for any particular category is a difficult task as there are many heterogeneous and multi-dimensional data set which depends on various factors. Traditional data mining methods are limited to homogenous data source, so they fail to sufficiently consider the characteristics of heterogeneous data. This paper presents a consistent Big Data mining search which performs analytics on text data to find the top rated URL’s. Though many heuristic search methods are available, our proposed method solves the problem of searching compared with traditional methods in data mining. The sample results are obtained in optimal time and are compared with other methods which is effective and efficient.


Author(s):  
Feng Ye ◽  
Zhi-Jian Wang ◽  
Fa-Chao Zhou ◽  
Ya-Pu Wang ◽  
Yuan-Chao Zhou
Keyword(s):  
Big Data ◽  

Sign in / Sign up

Export Citation Format

Share Document