Improvement of the Fast Clustering Algorithm Improved by K-Means in the Big Data

AbstractClustering as a fundamental unsupervised learning is considered an important method of data analysis, and K-means is demonstrably the most popular clustering algorithm. In this paper, we consider clustering on feature space to solve the low efficiency caused in the Big Data clustering by K-means. Different from the traditional methods, the algorithm guaranteed the consistency of the clustering accuracy before and after descending dimension, accelerated K-means when the clustering centeres and distance functions satisfy certain conditions, completely matched in the preprocessing step and clustering step, and improved the efficiency and accuracy. Experimental results have demonstrated the effectiveness of the proposed algorithm.

Download Full-text

A Parallel Clustering Algorithm for Power Big Data Analysis

Communications in Computer and Information Science - Parallel Architecture, Algorithm and Programming ◽

10.1007/978-981-10-6442-5_51 ◽

2017 ◽

pp. 533-540

Author(s):

Xiangjun Meng ◽

Liang Chen ◽

Yidong Li

Keyword(s):

Big Data ◽

Data Analysis ◽

Clustering Algorithm ◽

Big Data Analysis ◽

Parallel Clustering

Download Full-text

A Research Roadmap of Big Data Clustering Algorithms for Future Internet of Things

International Journal of Organizational and Collective Intelligence ◽

10.4018/ijoci.2019040102 ◽

2019 ◽

Vol 9 (2) ◽

pp. 16-30 ◽

Cited By ~ 1

Author(s):

Hind Bangui ◽

Mouzhi Ge ◽

Barbora Buhnova

Keyword(s):

Big Data ◽

Internet Of Things ◽

Mobile Networks ◽

Data Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Future Internet ◽

Research Challenges ◽

Initial Stage ◽

Big Data Technologies

Due to the massive data increase in different Internet of Things (IoT) domains such as healthcare IoT and Smart City IoT, Big Data technologies have been emerged as critical analytics tools for analyzing the IoT data. Among the Big Data technologies, data clustering is one of the essential approaches to process the IoT data. However, how to select a suitable clustering algorithm for IoT data is still unclear. Furthermore, since Big Data technology are still in its initial stage for different IoT domains, it is thus valuable to propose and structure the research challenges between Big Data and IoT. Therefore, this article starts by reviewing and comparing the data clustering algorithms that can be applied in IoT datasets, and then extends the discussions to a broader IoT context such as IoT dynamics and IoT mobile networks. Finally, this article identifies a set of research challenges that harvest a research roadmap for the Big Data research in IoT domains. The proposed research roadmap aims at bridging the research gaps between Big Data and various IoT contexts.

Download Full-text

A Novel on Transmission Line Tower Big Data Analysis Model Using Altered K-means and ADQL

Sustainability ◽

10.3390/su11133499 ◽

2019 ◽

Vol 11 (13) ◽

pp. 3499 ◽

Cited By ~ 5

Author(s):

Se-Hoon Jung ◽

Jun-Ho Huh

Keyword(s):

Big Data ◽

Data Analysis ◽

Transmission Line ◽

Clustering Algorithm ◽

Learning Algorithm ◽

Principal Component ◽

Big Data Analysis ◽

Standard Normal Distribution ◽

Analysis Model ◽

Q Learning

This study sought to propose a big data analysis and prediction model for transmission line tower outliers to assess when something is wrong with transmission line tower big data based on deep reinforcement learning. The model enables choosing automatic cluster K values based on non-labeled sensor big data. It also allows measuring the distance of action between data inside a cluster with the Q-value representing network output in the altered transmission line tower big data clustering algorithm containing transmission line tower outliers and old Deep Q Network. Specifically, this study performed principal component analysis to categorize transmission line tower data and proposed an automatic initial central point approach through standard normal distribution. It also proposed the A-Deep Q-Learning algorithm altered from the deep Q-Learning algorithm to explore policies based on the experiences of clustered data learning. It can be used to perform transmission line tower outlier data learning based on the distance of data within a cluster. The performance evaluation results show that the proposed model recorded an approximately 2.29%~4.19% higher prediction rate and around 0.8% ~ 4.3% higher accuracy rate compared to the old transmission line tower big data analysis model.

Download Full-text

The fast clustering algorithm for the big data based on K-means

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691320500538 ◽

2020 ◽

Vol 18 (06) ◽

pp. 2050053

Author(s):

Ting Xie ◽

Taiping Zhang

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Feature Space ◽

Data Sets ◽

Benchmark Data ◽

Clustering Model ◽

Alternating Direction ◽

Learning Technique ◽

Noise Data

As a powerful unsupervised learning technique, clustering is the fundamental task of big data analysis. However, many traditional clustering algorithms for big data that is a collection of high dimension, sparse and noise data do not perform well both in terms of computational efficiency and clustering accuracy. To alleviate these problems, this paper presents Feature K-means clustering model on the feature space of big data and introduces its fast algorithm based on Alternating Direction Multiplier Method (ADMM). We show the equivalence of the Feature K-means model in the original space and the feature space and prove the convergence of its iterative algorithm. Computationally, we compare the Feature K-means with Spherical K-means and Kernel K-means on several benchmark data sets, including artificial data and four face databases. Experiments show that the proposed approach is comparable to the state-of-the-art algorithm in big data clustering.

Download Full-text

A Succinct Distributive Big Data Clustering Algorithm Based on Local-Remote Coordination

2015 IEEE International Conference on Systems, Man, and Cybernetics ◽

10.1109/smc.2015.322 ◽

2015 ◽

Author(s):

Chao Ma ◽

Xun Liang ◽

Yuefeng Ma

Keyword(s):

Big Data ◽

Data Clustering ◽

Clustering Algorithm

Download Full-text

Geomarketing Big Data Analysis

INFORMACIONNYE TEHNOLOGII ◽

10.17587/it.27.180-187 ◽

2021 ◽

Vol 27 (4) ◽

pp. 180-187

Author(s):

S. V. Shaytura ◽

◽

D. A. Galkin ◽

Keyword(s):

Big Data ◽

Data Analysis ◽

Data Clustering ◽

Big Data Analysis ◽

Geospatial Data ◽

Shopping Center ◽

Housing Cost ◽

Cost Assessment ◽

Bank Branch ◽

New Approaches

The accumulation of a large amount of geospatial data requires new approaches to their processing and visualization. One of these approaches is the creation of a geomarketing system with a fundamentally new toolkit based on data clustering. The capabilities of such a system are shown using examples of housing cost assessment, determining the location of a new shopping center, a bank branch and a clinic.

Download Full-text

Nonlinear Data Analysis Using a New Hybrid Data Clustering Algorithm

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

10.1007/978-3-642-01307-2_17 ◽

2009 ◽

pp. 160-171 ◽

Cited By ~ 2

Author(s):

Ureerat Wattanachon ◽

Jakkarin Suksawatchon ◽

Chidchanok Lursinsap

Keyword(s):

Data Analysis ◽

Data Clustering ◽

Clustering Algorithm ◽

Hybrid Data ◽

Nonlinear Data Analysis

Download Full-text

Big Data Clustering Algorithm Based on Computer Cloud Platform

10.1007/978-3-030-89511-2_32 ◽

2021 ◽

pp. 254-262

Author(s):

Xiaoyun Gong

Keyword(s):

Big Data ◽

Data Clustering ◽

Clustering Algorithm ◽

Cloud Platform

Download Full-text

Big Data Clustering Analysis Algorithm for Internet of Things Based on K-Means

International Journal of Distributed Systems and Technologies ◽

10.4018/ijdst.2019010101 ◽

2019 ◽

Vol 10 (1) ◽

pp. 1-12 ◽

Cited By ~ 2

Author(s):

Zhanqiu Yu

Keyword(s):

Big Data ◽

Internet Of Things ◽

Clustering Analysis ◽

Data Clustering ◽

Clustering Algorithm ◽

Prototype System ◽

Point Selection ◽

Logistics System ◽

Relational Schema ◽

Analysis Algorithm

To explore the Internet of things logistics system application, an Internet of things big data clustering analysis algorithm based on K-mans was discussed. First of all, according to the complex event relation and processing technology, the big data processing of Internet of things was transformed into the extraction and analysis of complex relational schema, so as to provide support for simplifying the processing complexity of big data in Internet of things (IOT). The traditional K-means algorithm was optimized and improved to make it fit the demand of big data RFID data network. Based on Hadoop cloud cluster platform, a K-means cluster analysis was achieved. In addition, based on the traditional clustering algorithm, a center point selection technology suitable for RFID IOT data clustering was selected. The results showed that the clustering efficiency was improved to some extent. As a result, an RFID Internet of things clustering analysis prototype system is designed and realized, which further tests the feasibility.

Download Full-text

Big Data Clustering with Kernel k-Means: Resources, Time and Performance

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213018600060 ◽

2018 ◽

Vol 27 (04) ◽

pp. 1860006

Author(s):

Nikolaos Tsapanos ◽

Anastasios Tefas ◽

Nikolaos Nikolaidis ◽

Ioannis Pitas

Keyword(s):

Big Data ◽

Data Clustering ◽

Clustering Algorithm ◽

Learning Task ◽

Related Data ◽

Clustering Problem ◽

Processing Power ◽

Trade Offs ◽

Separable Kernel ◽

And Performance

Data clustering is an unsupervised learning task that has found many applications in various scientific fields. The goal is to find subgroups of closely related data samples (clusters) in a set of unlabeled data. A classic clustering algorithm is the so-called k-Means. It is very popular, however, it is also unable to handle cases in which the clusters are not linearly separable. Kernel k-Means is a state of the art clustering algorithm, which employs the kernel trick, in order to perform clustering on a higher dimensionality space, thus overcoming the limitations of classic k-Means regarding the non-linear separability of the input data. With respect to the challenges of Big Data research, a field that has established itself in the last few years and involves performing tasks on extremely large amounts of data, several adaptations of the Kernel k-Means have been proposed, each of which has different requirements in processing power and running time, while also incurring different trade-offs in performance. In this paper, we present several issues and techniques involving the usage of Kernel k-Means for Big Data clustering and how the combination of each component in a clustering framework fares in terms of resources, time and performance. We use experimental results, in order to evaluate several combinations and provide a recommendation on how to approach a Big Data clustering problem.

Download Full-text