Parallelization of the K-Means++ Clustering Algorithm

K-means++ is the clustering algorithm that is created to improve the process of getting initial clusters in the K-means algorithm. The k-means++ algorithm selects initial k-centroids arbitrarily dependent on a probability that is proportional to each data-point distance to the existing centroids. The most noteworthy problem of this algorithm is when running happens in sequential mode, as this reduces the speed of clustering. In this paper, we develop a new parallel k-means++ algorithm using the graphics processing units (GPU) where the Open Computing Language (OpenCL) platform is used as the programming environment to perform the data assignment phase in parallel while the Streaming SIMD Extension (SSE) technology is used to perform the initialization step to select the initial centroids in parallel on CPU. The focus is on optimizations directly targeted to this architecture to exploit the most of the available computing capabilities. Our objective is to minimize runtime while keeping the quality of the serial implementation. Our outcomes demonstrate that the implementation of targeting hybrid parallel architectures (CPU & GPU) is the most appropriate for large data. We have been able to achieve a 152 times higher throughput than that of the sequential implementation of k-means ++.

Download Full-text

A Representation of Membrane Computing with a Clustering Algorithm on the Graphical Processing Unit

Processes ◽

10.3390/pr8091199 ◽

2020 ◽

Vol 8 (9) ◽

pp. 1199

Author(s):

Ravie Chandren Muniyandi ◽

Ali Maroosi

Keyword(s):

Graphics Processing Units ◽

Clustering Algorithm ◽

Hamiltonian Path ◽

Fold Increase ◽

General Purpose ◽

Processing Unit ◽

Thread Block ◽

Hard Problems ◽

Graphical Processing ◽

Graphics Processing

Long-timescale simulations of biological processes such as photosynthesis or attempts to solve NP-hard problems such as traveling salesman, knapsack, Hamiltonian path, and satisfiability using membrane systems without appropriate parallelization can take hours or days. Graphics processing units (GPU) deliver an immensely parallel mechanism to compute general-purpose computations. Previous studies mapped one membrane to one thread block on GPU. This is disadvantageous given that when the quantity of objects for each membrane is small, the quantity of active thread will also be small, thereby decreasing performance. While each membrane is designated to one thread block, the communication between thread blocks is needed for executing the communication between membranes. Communication between thread blocks is a time-consuming process. Previous approaches have also not addressed the issue of GPU occupancy. This study presents a classification algorithm to manage dependent objects and membranes based on the communication rate associated with the defined weighted network and assign them to sub-matrices. Thus, dependent objects and membranes are allocated to the same threads and thread blocks, thereby decreasing communication between threads and thread blocks and allowing GPUs to maintain the highest occupancy possible. The experimental results indicate that for 48 objects per membrane, the algorithm facilitates a 93-fold increase in processing speed compared to a 1.6-fold increase with previous algorithms.

Download Full-text

GPUDePiCt: A Parallel Implementation of a Clustering Algorithm for Computing Degenerate Primers on Graphics Processing Units

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2014.2355231 ◽

2015 ◽

Vol 12 (2) ◽

pp. 445-454 ◽

Cited By ~ 1

Author(s):

Trevor Cickovski ◽

Tiffany Flor ◽

Galen Irving-Sachs ◽

Philip Novikov ◽

James Parda ◽

...

Keyword(s):

Graphics Processing Units ◽

Clustering Algorithm ◽

Parallel Implementation ◽

Degenerate Primers ◽

Graphics Processing

Download Full-text

CITK - an architecture and examples of CUDA enabled ITK filters

The Insight Journal ◽

10.54294/ozctod ◽

2011 ◽

Author(s):

Richard Beare ◽

Daniel Micevski ◽

Chris Share ◽

Luke Parkinson ◽

Phillip Ward ◽

...

Keyword(s):

Image Analysis ◽

Graphics Processing Units ◽

General Purpose ◽

Parallel Architectures ◽

Nvidia Cuda ◽

Graphics Processing

There is great interest in the use of graphics processing units (GPU)for general purpose applications because the highly parallel architectures used in GPUs offer the potential for huge performance increases. The use of GPUs in image analysis applications has been under investigation for a number of years. This article describes modifications to the InsightToolkit (ITK) that provide a simple architecture for transparent use of GPU enabled filters and examples of how to write GPU enabled filters using the NVIDIA CUDA tools.This work was performed between late 2009 and early 2010 and is being published as modifications to ITK 3.20. It is hoped that publication will help inform development of more general GPU support in ITK 4.0 and facilitate experimentation by users requiring functionality of 3.20 or wishing to pursue CUDA based developments.

Download Full-text

State-of-the-Art GPGPU Applications in Bioinformatics

International Journal of Systems Biology and Biomedical Technologies ◽

10.4018/ijsbbt.2013100103 ◽

2013 ◽

Vol 2 (4) ◽

pp. 24-48 ◽

Cited By ~ 2

Author(s):

Nikitas Papangelopoulos ◽

Dimitrios Vlachakis ◽

Arianna Filntisi ◽

Paraskevas Fakourelis ◽

Louis Papageorgiou ◽

...

Keyword(s):

Structure Prediction ◽

Graphics Processing Units ◽

Large Data ◽

General Purpose ◽

Biological Data ◽

Data Set ◽

Central Processing ◽

Processing Power ◽

Increasing Demand ◽

Graphics Processing

The exponential growth of available biological data in recent years coupled with their increasing complexity has made their analysis a computationally challenging process. Traditional central processing unist (CPUs) are reaching their limit in processing power and are not designed primarily for multithreaded applications. Graphics processing units (GPUs) on the other hand are affordable, scalable computer powerhouses that, thanks to the ever increasing demand for higher quality graphics, have yet to reach their limit. Typically high-end CPUs have 8-16 cores, whereas GPUs can have more than 2,500 cores. GPUs are also, by design, highly parallel, multicore and multithreaded, able of handling thousands of threads doing the same calculation on different subsets of a large data set. This ability is what makes them perfectly suited for biological analysis tasks. Lately this potential has been realized by many bioinformatics researches and a huge variety of tools and algorithms have been ported to GPUs, or designed from the ground up to maximize the usage of available cores. Here, we present a comprehensive review of available bioinformatics tools ranging from sequence and image analysis to protein structure prediction and systems biology that use NVIDIA Compute Unified Device Architecture (CUDA) general-purpose computing on graphics processing units (GPGPU) programming language.

Download Full-text

Exploiting multi–core and many–core parallelism for subspace clustering

International Journal of Applied Mathematics and Computer Science ◽

10.2478/amcs-2019-0006 ◽

2019 ◽

Vol 29 (1) ◽

pp. 81-91

Author(s):

Amitava Datta ◽

Amardeep Kaur ◽

Tobias Lauer ◽

Sami Chabbouh

Keyword(s):

Graphics Processing Units ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Subspace Clustering ◽

Research Problem ◽

Fine Grained ◽

Linear Speedup ◽

Many Core ◽

Graphics Processing ◽

Gpu Implementation

Abstract Finding clusters in high dimensional data is a challenging research problem. Subspace clustering algorithms aim to find clusters in all possible subspaces of the dataset, where a subspace is a subset of dimensions of the data. But the exponential increase in the number of subspaces with the dimensionality of data renders most of the algorithms inefficient as well as ineffective. Moreover, these algorithms have ingrained data dependency in the clustering process, which means that parallelization becomes difficult and inefficient. SUBSCALE is a recent subspace clustering algorithm which is scalable with the dimensions and contains independent processing steps which can be exploited through parallelism. In this paper, we aim to leverage the computational power of widely available multi-core processors to improve the runtime performance of the SUBSCALE algorithm. The experimental evaluation shows linear speedup. Moreover, we develop an approach using graphics processing units (GPUs) for fine-grained data parallelism to accelerate the computation further. First tests of the GPU implementation show very promising results.

Download Full-text

An overview of dynamic resource scheduling on graphics processors

it - Information Technology ◽

10.1515/itit-2014-1083 ◽

2015 ◽

Vol 57 (2) ◽

Author(s):

Markus Steinberger

Keyword(s):

Graphics Processing Units ◽

Parallel Architectures ◽

Massively Parallel ◽

Graphics Processors ◽

Procedural Modeling ◽

Dynamic Memory ◽

Massively Parallel Architectures ◽

Dynamic Resource ◽

New Processing ◽

Graphics Processing

AbstractIn this paper, we present a series of scheduling approaches targeted for massively parallel architectures, which in combination allow a wider range of algorithms to be executed on modern graphics processors. At first, we describe a new processing model which enables the efficient execution of dynamic, irregular workloads. Then, we present the currently fastest queuing algorithm for graphics processors, the most efficient dynamic memory allocator for massively parallel architectures, and the only autonomous scheduler for graphics processing units that can dynamically support different granularities of parallelism. Finally, we show how these scheduling approaches help to advance the state-of-the-art in the rendering, visualization and procedural modeling.

Download Full-text

CUDA or OpenCL

Research Advances in the Integration of Big Data and Smart Computing - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-4666-8737-0.ch015 ◽

2016 ◽

pp. 267-279

Author(s):

Mayank Bhura ◽

Pranav H. Deshpande ◽

K. Chandrasekaran

Keyword(s):

High Performance Computing ◽

Graphics Processing Units ◽

High Performance ◽

Heterogeneous Systems ◽

General Purpose ◽

Programming Environment ◽

Pros And Cons ◽

Nvidia Gpu ◽

Graphics Processing ◽

Performance Computing

Usage of General Purpose Graphics Processing Units (GPGPUs) in high-performance computing is increasing as heterogeneous systems continue to become dominant. CUDA had been the programming environment for nearly all such NVIDIA GPU based GPGPU applications. Still, the framework runs only on NVIDIA GPUs, for other frameworks it requires reimplementation to utilize additional computing devices that are available. OpenCL provides a vendor-neutral and open programming environment, with many implementations available on CPUs, GPUs, and other types of accelerators, OpenCL can thus be regarded as write once, run anywhere framework. Despite this, both frameworks have their own pros and cons. This chapter presents a comparison of the performance of CUDA and OpenCL frameworks, using an algorithm to find the sum of all possible triple products on a list of integers, implemented on GPUs.

Download Full-text

K-means Clustering Algorithm: Efficient Implementation on Graphics Processing Units

10.14293/s2199-1006.1.sor-.pplhu62.v1 ◽

2019 ◽

Author(s):

Kunjan Aggarwal ◽

Mainak Chaudhuri

Keyword(s):

Graphics Processing Units ◽

Clustering Algorithm ◽

Real Life ◽

Number Of Clusters ◽

Design And Implementation ◽

Data Volume ◽

Data Points ◽

Life Phenomena ◽

Graphics Processing ◽

Analyze Data

Data analysis and classification play a big role in understanding various real life phenomena. Clustering helps analyze data with little or no prior knowledge about it. K-means clustering is a popular clustering algorithm with applications to computer vision, data mining, data visualization, etc.. Due to continuously increasing data volume, parallel computing is necessary to overcome the computational challenges involved in K-means clustering. We present the design and implementation of Kmeans clustering algorithm on widely available graphics processing units (GPUs), which have the required hardware architecture to meet these parallelism needs. We analyze the scalability of our proposed methods with increase in number and dimensionality of data points as well as the number of clusters. We also compare our results with current best available implementations on GPUs and a 24-way threaded parallel CPU implementation. We achieved a consistent speedup of 6.5x over the parallel CPU implementation.

Download Full-text

High performance computing on graphics processing units

Pollack Periodica ◽

10.1556/pollack.3.2008.2.3 ◽

2008 ◽

Vol 3 (2) ◽

pp. 27-34 ◽

Cited By ~ 2

Author(s):

Balázs Tukora ◽

Tibor Szalay

Keyword(s):

High Performance Computing ◽

Graphics Processing Units ◽

High Performance ◽

Graphics Processing ◽

Performance Computing

Download Full-text

Workflow Scheduling for Scientific Application in Homogeneous Cloud Environment

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i7.114 ◽

2017 ◽

Vol 7 (7) ◽

pp. 137

Author(s):

. Monika ◽

Pardeep Kumar ◽

Sanjay Tyagi

Keyword(s):

Cloud Computing ◽

Big Data ◽

Large Data ◽

Workflow Scheduling ◽

Cloud Environment ◽

Computing Environment ◽

Scientific Application ◽

Cloud Computing Environment ◽

Flow Of Information

In Cloud computing environment QoS i.e. Quality-of-Service and cost is the key element that to be take care of. As, today in the era of big data, the data must be handled properly while satisfying the request. In such case, while handling request of large data or for scientific applications request, flow of information must be sustained. In this paper, a brief introduction of workflow scheduling is given and also a detailed survey of various scheduling algorithms is performed using various parameter.

Download Full-text