parallel clustering Latest Research Papers

The conventional procedures of clustering algorithms are incapable of overcoming the difficulty of managing and analyzing the rapid growth of generated data from different sources. Using the concept of parallel clustering is one of the robust solutions to this problem. Apache Hadoop architecture is one of the assortment ecosystems that provide the capability to store and process the data in a distributed and parallel fashion. In this paper, a parallel model is designed to process the k-means clustering algorithm in the Apache Hadoop ecosystem by connecting three nodes, one is for server (name) nodes and the other two are for clients (data) nodes. The aim is to speed up the time of managing the massive scale of healthcare insurance dataset with the size of 11 GB and also using machine learning algorithms, which are provided by the Mahout Framework. The experimental results depict that the proposed model can efficiently process large datasets. The parallel k-means algorithm outperforms the sequential k-means algorithm based on the execution time of the algorithm, where the required time to execute a data size of 11 GB is around 1.847 hours using the parallel k-means algorithm, while it equals 68.567 hours using the sequential k-means algorithm. As a result, we deduce that when the nodes number in the parallel system increases, the computation time of the proposed algorithm decreases.

Download Full-text

Performance Improvement of Extreme Multi-Label Classification using K-way Tree Construction with Parallel Clustering Algorithm

Journal of King Saud University - Computer and Information Sciences ◽

10.1016/j.jksuci.2021.02.014 ◽

2021 ◽

Author(s):

Purvi Prajapati ◽

Amit Thakkar

Keyword(s):

Performance Improvement ◽

Clustering Algorithm ◽

Tree Construction ◽

Parallel Clustering

Download Full-text

An Introduction to Clustering Algorithms in Big Data

Encyclopedia of Information Science and Technology, Fifth Edition - Advances in Information Quality and Management ◽

10.4018/978-1-7998-3479-3.ch040 ◽

2021 ◽

pp. 559-576

Author(s):

Rajit Nair ◽

Amit Bhagat

Keyword(s):

Big Data ◽

Single Machine ◽

Data Clustering ◽

Clustering Algorithms ◽

Time Limit ◽

Computation Cost ◽

Different Types ◽

Clustering Approach ◽

Future Path ◽

Parallel Clustering

In big data, clustering is the process through which analysis is performed. Since the data is big, it is very difficult to perform clustering approach. Big data is mainly termed as petabytes and zeta bytes of data and high computation cost is needed for the implementation of clusters. In this chapter, the authors show how clustering can be performed on big data and what are the different types of clustering approach. The challenge during clustering approach is to find observations within the time limit. The chapter also covers the possible future path for more advanced clustering algorithms. The chapter will cover single machine clustering and multiple machines clustering, which also includes parallel clustering.

Download Full-text

CLUE: A Fast Parallel Clustering Algorithm for High Granularity Calorimeters in High-Energy Physics

Frontiers in Big Data ◽

10.3389/fdata.2020.591315 ◽

2020 ◽

Vol 3 ◽

Author(s):

Marco Rovere ◽

Ziheng Chen ◽

Antonio Di Pilato ◽

Felice Pantaleo ◽

Chris Seez

Keyword(s):

Heterogeneous Computing ◽

Clustering Algorithm ◽

High Energy Physics ◽

High Energy ◽

Spatial Index ◽

Phase 2 ◽

Number Of Clusters ◽

Density Based Clustering ◽

Parallel Clustering ◽

Energy Physics

One of the challenges of high granularity calorimeters, such as that to be built to cover the endcap region in the CMS Phase-2 Upgrade for HL-LHC, is that the large number of channels causes a surge in the computing load when clustering numerous digitized energy deposits (hits) in the reconstruction stage. In this article, we propose a fast and fully parallelizable density-based clustering algorithm, optimized for high-occupancy scenarios, where the number of clusters is much larger than the average number of hits in a cluster. The algorithm uses a grid spatial index for fast querying of neighbors and its timing scales linearly with the number of hits within the range considered. We also show a comparison of the performance on CPU and GPU implementations, demonstrating the power of algorithmic parallelization in the coming era of heterogeneous computing in high-energy physics.

Download Full-text

A survey on parallel clustering algorithms for Big Data

Artificial Intelligence Review ◽

10.1007/s10462-020-09918-2 ◽

2020 ◽

Author(s):

Zineb Dafir ◽

Yasmine Lamari ◽

Said Chah Slaoui

Keyword(s):

Big Data ◽

Clustering Algorithms ◽

Parallel Clustering

Download Full-text

parallel clustering
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Constraint-Based Parallel Clustering with Optimized Feature Selection for SDN-Enabled Traffic Anomaly Detection and Mitigation

Privacy Preserving Parallel Clustering Based Anonymization for Big Data Using MapReduce Framework

Development and Assessment of Outdated Computers: A Technology Waste for Alternative Using Parallel Clustering

The application of parallel clustering analysis based on big data mining in physical community discovery

Multi-level and Relevance-based Parallel Clustering of Massive Data Streams in Smart Manufacturing

A Parallel Clustering Analysis Based on Hadoop Multi-Node and Apache Mahout

Performance Improvement of Extreme Multi-Label Classification using K-way Tree Construction with Parallel Clustering Algorithm

An Introduction to Clustering Algorithms in Big Data

CLUE: A Fast Parallel Clustering Algorithm for High Granularity Calorimeters in High-Energy Physics

A survey on parallel clustering algorithms for Big Data

Export Citation Format

parallel clusteringRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Constraint-Based Parallel Clustering with Optimized Feature Selection for SDN-Enabled Traffic Anomaly Detection and Mitigation

Privacy Preserving Parallel Clustering Based Anonymization for Big Data Using MapReduce Framework

Development and Assessment of Outdated Computers: A Technology Waste for Alternative Using Parallel Clustering

The application of parallel clustering analysis based on big data mining in physical community discovery

Multi-level and Relevance-based Parallel Clustering of Massive Data Streams in Smart Manufacturing

A Parallel Clustering Analysis Based on Hadoop Multi-Node and Apache Mahout

Performance Improvement of Extreme Multi-Label Classification using K-way Tree Construction with Parallel Clustering Algorithm

An Introduction to Clustering Algorithms in Big Data

CLUE: A Fast Parallel Clustering Algorithm for High Granularity Calorimeters in High-Energy Physics

A survey on parallel clustering algorithms for Big Data

parallel clustering
Recently Published Documents