Chapter 12: Grid-Based Clustering Algorithms

2021 ◽  
Vol 11 (4) ◽  
pp. 319-330
Author(s):  
Artur Starczewski ◽  
Magdalena M. Scherer ◽  
Wojciech Książek ◽  
Maciej Dębski ◽  
Lipo Wang

Abstract Data clustering is an important method used to discover naturally occurring structures in datasets. One of the most popular approaches is the grid-based concept of clustering algorithms. This kind of method is characterized by a fast processing time and it can also discover clusters of arbitrary shapes in datasets. These properties allow these methods to be used in many different applications. Researchers have created many versions of the clustering method using the grid-based approach. However, the key issue is the right choice of the number of grid cells. This paper proposes a novel grid-based algorithm which uses a method for an automatic determining of the number of grid cells. This method is based on the kdist function which computes the distance between each element of a dataset and its kth nearest neighbor. Experimental results have been obtained for several different datasets and they confirm a very good performance of the newly proposed method.


2018 ◽  
Vol 7 (3.4) ◽  
pp. 47 ◽  
Author(s):  
Akankshya Aparajita ◽  
Shrabanee Swagatika ◽  
Debabrata Singh

Clustering is used as an important procedure in the process of data mining, where information of large datasets are transformed into meaningful and concise data. It performs activities like pattern representation, using of clustering algorithms and their validation, data abstraction and finally result generated. Clustering has many categories of algorithms such as partition-based, hierarchical-based, density-based, grid-based etc. Partition-based is the centroid-based clustering. Hierarchical-based clustering is link-based. Density-based is clustering is focused on area of higher density in the dataset. Grid-based clustering relies on size of the grid. In this paper, we discussed different clustering techniques as well as, a detailed review on the partition-based and hierarchical-based algorithms. Finally we compare clustering algorithms on the basis of attributes like time complexity, capacity of handling large datasets, scalability, sensitivity to outliers and noise, and also discussed result after solving a particular dataset implemented in cloud computing environment.  


2012 ◽  
Vol 263-266 ◽  
pp. 2234-2237 ◽  
Author(s):  
Amineh Amini ◽  
Teh Ying Wah

Clustering is one of the prominent classes in the mining data streams. Among various clustering algorithms that have been developed, density-based method has the ability to discover arbitrary shape clusters, and to detect the outliers. Recently, various algorithms adopted density-based methods for clustering data streams. In this paper, we look into three remarkable algorithms in two groups of micro-clustering and grid-based including DenStream, D-Stream, and MR-Stream. We compare the algorithms based on evaluating algorithm performance and clustering quality metrics.


Author(s):  
Yanchang Zhao ◽  
Longbing Cao ◽  
Huaifeng Zhang ◽  
Chengqi Zhang

Clustering is one of the most important techniques in data mining. This chapter presents a survey of popular approaches for data clustering, including well-known clustering techniques, such as partitioning clustering, hierarchical clustering, density-based clustering and grid-based clustering, and recent advances in clustering, such as subspace clustering, text clustering and data stream clustering. The major challenges and future trends of data clustering will also be introduced in this chapter. The remainder of this chapter is organized as follows. The background of data clustering will be introduced in Section 2, including the definition of clustering, categories of clustering techniques, features of good clustering algorithms, and the validation of clustering. Section 3 will present main approaches for clustering, which range from the classic partitioning and hierarchical clustering to recent approaches of bi-clustering and semisupervised clustering. Challenges and future trends will be discussed in Section 4, followed by the conclusions in the last section.


Author(s):  
Mohana Priya K ◽  
Pooja Ragavi S ◽  
Krishna Priya G

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%


2019 ◽  
Author(s):  
Elvar Jónsson ◽  
Asmus Ougaard Dohn ◽  
Hannes Jonsson

This work describes a general energy functional formulation of a polarizable embedding QM/MM scheme, as well as an implementation where a real-space Grid-based Projector Augmented Wave (GPAW) DFT method is coupled with a potential function for H<sub>2</sub>O based on a Single Center Multipole Expansion (SCME) of the electrostatics, including anisotropic dipole and quadrupole polarizability.


Sign in / Sign up

Export Citation Format

Share Document