Introduction to Clustering

Author(s):  
Raymond Greenlaw ◽  
Sanpawat Kantabutra

This chapter provides the reader with an introduction to clustering algorithms and applications. A number of important well-known clustering methods are surveyed. The authors present a brief history of the development of the field of clustering, discuss various types of clustering, and mention some of the current research directions in the field of clustering. Algorithms are described for top-down and bottom-up hierarchical clustering, as are algorithms for K-Means clustering and for K-Medians clustering. The technique of representative points is also presented. Given the large data sets involved with clustering, the need to apply parallel computing to clustering arises, so they discuss issues related to parallel clustering as well. Throughout the chapter references are provided to works that contain a large number of experimental results. A comparison of the various clustering methods is given in tabular format. They conclude the chapter with a summary and an extensive list of references.

2013 ◽  
Vol 3 (2) ◽  
pp. 1-29 ◽  
Author(s):  
Raymond Greenlaw ◽  
Sanpawat Kantabutra

This article is a survey into clustering applications and algorithms. A number of important well-known clustering methods are discussed. The authors present a brief history of the development of the field of clustering, discuss various types of clustering, and mention some of the current research directions in the field of clustering. More specifically, top-down and bottom-up hierarchical clustering are described. Additionally, K-Means and K-Medians clustering algorithms are also shown. The concept of representative points is introduced and the technique of discovering them is presented. Immense data sets in clustering often necessitate parallel computation. The authors discuss issues involving parallel clustering as well. Clustering deals with a large number of experimental results. The authors provide references to these works throughout the article. A table for comparing various clustering methods is given in the end. The authors give a summary and an extensive list of references, including some of the latest works in the field, to conclude the article.


2021 ◽  
Author(s):  
Sebastiaan Valkiers ◽  
Max Van Houcke ◽  
Kris Laukens ◽  
Pieter Meysman

The T-cell receptor (TCR) determines the specificity of a T-cell towards an epitope. As of yet, the rules for antigen recognition remain largely undetermined. Current methods for grouping TCRs according to their epitope specificity remain limited in performance and scalability. Multiple methodologies have been developed, but all of them fail to efficiently cluster large data sets exceeding 1 million sequences. To account for this limitation, we developed clusTCR, a rapid TCR clustering alternative that efficiently scales up to millions of CDR3 amino acid sequences. Benchmarking comparisons revealed similar accuracy of clusTCR with other TCR clustering methods. clusTCR offers a drastic improvement in clustering speed, which allows clustering of millions of TCR sequences in just a few minutes through efficient similarity searching and sequence hashing.clusTCR was written in Python 3. It is available as an anaconda package (https://anaconda.org/svalkiers/clustcr) and on github (https://github.com/svalkiers/clusTCR).


Author(s):  
B. K. Tripathy ◽  
Hari Seetha ◽  
M. N. Murty

Data clustering plays a very important role in Data mining, machine learning and Image processing areas. As modern day databases have inherent uncertainties, many uncertainty-based data clustering algorithms have been developed in this direction. These algorithms are fuzzy c-means, rough c-means, intuitionistic fuzzy c-means and the means like rough fuzzy c-means, rough intuitionistic fuzzy c-means which base on hybrid models. Also, we find many variants of these algorithms which improve them in different directions like their Kernelised versions, possibilistic versions, and possibilistic Kernelised versions. However, all the above algorithms are not effective on big data for various reasons. So, researchers have been trying for the past few years to improve these algorithms in order they can be applied to cluster big data. The algorithms are relatively few in comparison to those for datasets of reasonable size. It is our aim in this chapter to present the uncertainty based clustering algorithms developed so far and proposes a few new algorithms which can be developed further.


Author(s):  
Helen K. Black ◽  
John T. Groce ◽  
Charles E. Harmon

Chapter One offers a brief history of the rise in awareness of the vast numbers of informal, family caregivers caring for aged, demented, and impaired loved ones in the home. The importance of informal caregivers to the healthcare system, both financially and emotionally, emerged in studies exploring the numbers of home caregivers and the nature of their care work. Early studies also focused on the sense of burden caregivers experienced due to caregiving. Since the 1980s, caregiving studies have been a constant in research, and have become increasingly complex in the use of large data sets and advanced technology to study the number of caregivers, their characteristics and labors, and the outcomes of caregiving on their emotional and physical health. Few studies have focused solely on the experience of caregiving in African-American elder male caregivers, and in the way we accomplish here.


2007 ◽  
Vol 17 (01) ◽  
pp. 71-103 ◽  
Author(s):  
NARGESS MEMARSADEGHI ◽  
DAVID M. MOUNT ◽  
NATHAN S. NETANYAHU ◽  
JACQUELINE LE MOIGNE

Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.


Sign in / Sign up

Export Citation Format

Share Document