scholarly journals Parallelization of the K-Means++ Clustering Algorithm

2021 ◽  
Vol 26 (1) ◽  
pp. 59-66
Author(s):  
Sara Daoudi ◽  
Chakib Mustapha Anouar Zouaoui ◽  
Miloud Chikr El-Mezouar ◽  
Nasreddine Taleb

K-means++ is the clustering algorithm that is created to improve the process of getting initial clusters in the K-means algorithm. The k-means++ algorithm selects initial k-centroids arbitrarily dependent on a probability that is proportional to each data-point distance to the existing centroids. The most noteworthy problem of this algorithm is when running happens in sequential mode, as this reduces the speed of clustering. In this paper, we develop a new parallel k-means++ algorithm using the graphics processing units (GPU) where the Open Computing Language (OpenCL) platform is used as the programming environment to perform the data assignment phase in parallel while the Streaming SIMD Extension (SSE) technology is used to perform the initialization step to select the initial centroids in parallel on CPU. The focus is on optimizations directly targeted to this architecture to exploit the most of the available computing capabilities. Our objective is to minimize runtime while keeping the quality of the serial implementation. Our outcomes demonstrate that the implementation of targeting hybrid parallel architectures (CPU & GPU) is the most appropriate for large data. We have been able to achieve a 152 times higher throughput than that of the sequential implementation of k-means ++.

Processes ◽  
2020 ◽  
Vol 8 (9) ◽  
pp. 1199
Author(s):  
Ravie Chandren Muniyandi ◽  
Ali Maroosi

Long-timescale simulations of biological processes such as photosynthesis or attempts to solve NP-hard problems such as traveling salesman, knapsack, Hamiltonian path, and satisfiability using membrane systems without appropriate parallelization can take hours or days. Graphics processing units (GPU) deliver an immensely parallel mechanism to compute general-purpose computations. Previous studies mapped one membrane to one thread block on GPU. This is disadvantageous given that when the quantity of objects for each membrane is small, the quantity of active thread will also be small, thereby decreasing performance. While each membrane is designated to one thread block, the communication between thread blocks is needed for executing the communication between membranes. Communication between thread blocks is a time-consuming process. Previous approaches have also not addressed the issue of GPU occupancy. This study presents a classification algorithm to manage dependent objects and membranes based on the communication rate associated with the defined weighted network and assign them to sub-matrices. Thus, dependent objects and membranes are allocated to the same threads and thread blocks, thereby decreasing communication between threads and thread blocks and allowing GPUs to maintain the highest occupancy possible. The experimental results indicate that for 48 objects per membrane, the algorithm facilitates a 93-fold increase in processing speed compared to a 1.6-fold increase with previous algorithms.


2011 ◽  
Author(s):  
Richard Beare ◽  
Daniel Micevski ◽  
Chris Share ◽  
Luke Parkinson ◽  
Phillip Ward ◽  
...  

There is great interest in the use of graphics processing units (GPU)for general purpose applications because the highly parallel architectures used in GPUs offer the potential for huge performance increases. The use of GPUs in image analysis applications has been under investigation for a number of years. This article describes modifications to the InsightToolkit (ITK) that provide a simple architecture for transparent use of GPU enabled filters and examples of how to write GPU enabled filters using the NVIDIA CUDA tools.This work was performed between late 2009 and early 2010 and is being published as modifications to ITK 3.20. It is hoped that publication will help inform development of more general GPU support in ITK 4.0 and facilitate experimentation by users requiring functionality of 3.20 or wishing to pursue CUDA based developments.


Author(s):  
Nikitas Papangelopoulos ◽  
Dimitrios Vlachakis ◽  
Arianna Filntisi ◽  
Paraskevas Fakourelis ◽  
Louis Papageorgiou ◽  
...  

The exponential growth of available biological data in recent years coupled with their increasing complexity has made their analysis a computationally challenging process. Traditional central processing unist (CPUs) are reaching their limit in processing power and are not designed primarily for multithreaded applications. Graphics processing units (GPUs) on the other hand are affordable, scalable computer powerhouses that, thanks to the ever increasing demand for higher quality graphics, have yet to reach their limit. Typically high-end CPUs have 8-16 cores, whereas GPUs can have more than 2,500 cores. GPUs are also, by design, highly parallel, multicore and multithreaded, able of handling thousands of threads doing the same calculation on different subsets of a large data set. This ability is what makes them perfectly suited for biological analysis tasks. Lately this potential has been realized by many bioinformatics researches and a huge variety of tools and algorithms have been ported to GPUs, or designed from the ground up to maximize the usage of available cores. Here, we present a comprehensive review of available bioinformatics tools ranging from sequence and image analysis to protein structure prediction and systems biology that use NVIDIA Compute Unified Device Architecture (CUDA) general-purpose computing on graphics processing units (GPGPU) programming language.


Author(s):  
Amitava Datta ◽  
Amardeep Kaur ◽  
Tobias Lauer ◽  
Sami Chabbouh

Abstract Finding clusters in high dimensional data is a challenging research problem. Subspace clustering algorithms aim to find clusters in all possible subspaces of the dataset, where a subspace is a subset of dimensions of the data. But the exponential increase in the number of subspaces with the dimensionality of data renders most of the algorithms inefficient as well as ineffective. Moreover, these algorithms have ingrained data dependency in the clustering process, which means that parallelization becomes difficult and inefficient. SUBSCALE is a recent subspace clustering algorithm which is scalable with the dimensions and contains independent processing steps which can be exploited through parallelism. In this paper, we aim to leverage the computational power of widely available multi-core processors to improve the runtime performance of the SUBSCALE algorithm. The experimental evaluation shows linear speedup. Moreover, we develop an approach using graphics processing units (GPUs) for fine-grained data parallelism to accelerate the computation further. First tests of the GPU implementation show very promising results.


2015 ◽  
Vol 57 (2) ◽  
Author(s):  
Markus Steinberger

AbstractIn this paper, we present a series of scheduling approaches targeted for massively parallel architectures, which in combination allow a wider range of algorithms to be executed on modern graphics processors. At first, we describe a new processing model which enables the efficient execution of dynamic, irregular workloads. Then, we present the currently fastest queuing algorithm for graphics processors, the most efficient dynamic memory allocator for massively parallel architectures, and the only autonomous scheduler for graphics processing units that can dynamically support different granularities of parallelism. Finally, we show how these scheduling approaches help to advance the state-of-the-art in the rendering, visualization and procedural modeling.


Author(s):  
Mayank Bhura ◽  
Pranav H. Deshpande ◽  
K. Chandrasekaran

Usage of General Purpose Graphics Processing Units (GPGPUs) in high-performance computing is increasing as heterogeneous systems continue to become dominant. CUDA had been the programming environment for nearly all such NVIDIA GPU based GPGPU applications. Still, the framework runs only on NVIDIA GPUs, for other frameworks it requires reimplementation to utilize additional computing devices that are available. OpenCL provides a vendor-neutral and open programming environment, with many implementations available on CPUs, GPUs, and other types of accelerators, OpenCL can thus be regarded as write once, run anywhere framework. Despite this, both frameworks have their own pros and cons. This chapter presents a comparison of the performance of CUDA and OpenCL frameworks, using an algorithm to find the sum of all possible triple products on a list of integers, implemented on GPUs.


Author(s):  
Kunjan Aggarwal ◽  
Mainak Chaudhuri

Data analysis and classification play a big role in understanding various real life phenomena. Clustering helps analyze data with little or no prior knowledge about it. K-means clustering is a popular clustering algorithm with applications to computer vision, data mining, data visualization, etc.. Due to continuously increasing data volume, parallel computing is necessary to overcome the computational challenges involved in K-means clustering. We present the design and implementation of Kmeans clustering algorithm on widely available graphics processing units (GPUs), which have the required hardware architecture to meet these parallelism needs. We analyze the scalability of our proposed methods with increase in number and dimensionality of data points as well as the number of clusters. We also compare our results with current best available implementations on GPUs and a 24-way threaded parallel CPU implementation. We achieved a consistent speedup of 6.5x over the parallel CPU implementation.


Author(s):  
. Monika ◽  
Pardeep Kumar ◽  
Sanjay Tyagi

In Cloud computing environment QoS i.e. Quality-of-Service and cost is the key element that to be take care of. As, today in the era of big data, the data must be handled properly while satisfying the request. In such case, while handling request of large data or for scientific applications request, flow of information must be sustained. In this paper, a brief introduction of workflow scheduling is given and also a detailed survey of various scheduling algorithms is performed using various parameter.


Sign in / Sign up

Export Citation Format

Share Document