Individual Movements and Geographical Data Mining. Clustering Algorithms for Highlighting Hotspots in Personal Navigation Routes

2019 ◽

pp. 163-168

Author(s):

P. Tamijiselvy ◽

N. Kavitha ◽

K. M. Keerthana ◽

D. Menakha

Keyword(s):

Data Mining ◽

Low Dose ◽

Early Stage ◽

Clustering Algorithms ◽

Automatic Detection ◽

Chest Ct ◽

Data Mining Algorithm ◽

Fuzzy C Means ◽

Data Mining Algorithms ◽

Using Data

The degree of aortic calcification has been appeared to be a risk pointer for vascular occasions including cardiovascular events. The created strategy is fully automated data mining algorithm to segment and measure calcification using Low-dose Chest CT in smokers of age 50 to 70 .The identification of subjects with increased cardiovascular risk can be detected by using data mining algorithms. This paper presents a method for automatic detection of coronary artery calcifications in low-dose chest CT scans using effective clustering algorithms with three phases as Pre-Processing, Segmentation and clustering. Fuzzy C Means algorithm provides accuracy of 80.23% demonstrate that Fuzzy C means detects the Cardio Vascular Disease at early stage.

Download Full-text

Improved minimum-minimum roughness algorithm for clustering categorical data

International Journal of ADVANCED AND APPLIED SCIENCES ◽

10.21833/ijaas.2021.10.006 ◽

2021 ◽

Vol 8 (10) ◽

pp. 43-50

Author(s):

Truong et al. ◽

Keyword(s):

Machine Learning ◽

Data Mining ◽

Hierarchical Clustering ◽

Categorical Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Experimental Results ◽

Data Sets ◽

Top Down ◽

Hierarchical Clustering Algorithm

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.

Download Full-text

The comparative study of text documents clustering algorithms

Environment Conservation Journal ◽

10.36953/ecj.2015.se1614 ◽

2015 ◽

Vol 16 (SE) ◽

pp. 133-138

Author(s):

Mohammad Eiman Jamnezhad ◽

Reza Fattahi

Keyword(s):

Data Mining ◽

Dna Analysis ◽

Clustering Algorithms ◽

Research Area ◽

Large Set ◽

Text Documents ◽

Web Documents ◽

Significant Research ◽

The Comparative Study ◽

F Measure

Clustering is one of the most significant research area in the field of data mining and considered as an important tool in the fast developing information explosion era.Clustering systems are used more and more often in text mining, especially in analyzing texts and to extracting knowledge they contain. Data are grouped into clusters in such a way that the data of the same group are similar and those in other groups are dissimilar. It aims to minimizing intra-class similarity and maximizing inter-class dissimilarity. Clustering is useful to obtain interesting patterns and structures from a large set of data. It can be applied in many areas, namely, DNA analysis, marketing studies, web documents, and classification. This paper aims to study and compare three text documents clustering, namely, k-means, k-medoids, and SOM through F-measure.

Download Full-text

Canonical PSO Based K-Means Clustering Approach for Real Datasets

International Scholarly Research Notices ◽

10.1155/2014/414013 ◽

2014 ◽

Vol 2014 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Lopamudra Dey ◽

Sanjay Chakraborty

Keyword(s):

Data Mining ◽

Air Pollution ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Life ◽

Cluster Validity ◽

Validity Assessment ◽

Different Types ◽

Clustering Approach ◽

Validity Measure

“Clustering” the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms.

Download Full-text

Evaluation of Clustering Methods for Adaptive Learning Systems

Artificial Intelligence Applications in Distance Education - Advances in Mobile and Distance Learning ◽

10.4018/978-1-4666-6276-6.ch014 ◽

2015 ◽

pp. 237-260 ◽

Cited By ~ 1

Author(s):

Wilhelmiina Hämäläinen ◽

Ville Kumpulainen ◽

Maxim Mozgovoy

Keyword(s):

Data Mining ◽

Adaptive Learning ◽

Clustering Algorithms ◽

Educational Data Mining ◽

Optimal Choice ◽

Learning Systems ◽

Learning Tools ◽

Clustering Methods ◽

Central Task ◽

Adaptive Learning Systems

Clustering student data is a central task in the educational data mining and design of intelligent learning tools. The problem is that there are thousands of clustering algorithms but no general guidelines about which method to choose. The optimal choice is of course problem- and data-dependent and can seldom be found without trying several methods. Still, the purposes of clustering students and the typical features of educational data make certain clustering methods more suitable or attractive. In this chapter, the authors evaluate the main clustering methods from this perspective. Based on the analysis, the authors suggest the most promising clustering methods for different situations.

Download Full-text

Ensemble Clustering Data Mining and Databases

Encyclopedia of Information Science and Technology, Fourth Edition ◽

10.4018/978-1-5225-2255-3.ch170 ◽

2018 ◽

pp. 1962-1973

Author(s):

Slawomir T. Wierzchon

Keyword(s):

Data Mining ◽

Data Structure ◽

Em Algorithm ◽

Normal Distribution ◽

Clustering Algorithms ◽

Consensus Clustering ◽

New Directions ◽

Consensus Procedure ◽

Basic Approaches ◽

Clustering Data

Standard clustering algorithms employ fixed assumptions about data structure. For instance, the k-means algorithm is applicable for spherical and linearly separable data clouds. When the data come from multidimensional normal distribution – so-called EM algorithm can be applied. But in practice the assumptions underlying given set of observations are too complex to fit into a single assumption. We can split these assumptions into manageable hypothesis justifying the use of particular clustering algorithms. Then we must aggregate partial results into a meaningful description of our data. The consensus clustering do this task. In this article we clarify the idea of consensus clustering, and we present conceptual frames for such a compound analysis. Next the basic approaches to implement consensus procedure are given. Finally, some new directions in this field are mentioned.

Download Full-text

A Data Distribution View of Clustering Algorithms

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch059 ◽

2011 ◽

pp. 374-381 ◽

Cited By ~ 1

Author(s):

Junjie Wu ◽

Jian Chen ◽

Hui Xiong

Keyword(s):

Data Mining ◽

Cluster Analysis ◽

Clustering Analysis ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Data Distribution ◽

Point Of View ◽

Group Method ◽

Data Sets ◽

Distribution Point

Cluster analysis (Jain & Dubes, 1988) provides insight into the data by dividing the objects into groups (clusters), such that objects in a cluster are more similar to each other than objects in other clusters. Cluster analysis has long played an important role in a wide variety of fields, such as psychology, bioinformatics, pattern recognition, information retrieval, machine learning, and data mining. Many clustering algorithms, such as K-means and Unweighted Pair Group Method with Arithmetic Mean (UPGMA), have been wellestablished. A recent research focus on clustering analysis is to understand the strength and weakness of various clustering algorithms with respect to data factors. Indeed, people have identified some data characteristics that may strongly affect clustering analysis including high dimensionality and sparseness, the large size, noise, types of attributes and data sets, and scales of attributes (Tan, Steinbach, & Kumar, 2005). However, further investigation is expected to reveal whether and how the data distributions can have the impact on the performance of clustering algorithms. Along this line, we study clustering algorithms by answering three questions: 1. What are the systematic differences between the distributions of the resultant clusters by different clustering algorithms? 2. How can the distribution of the “true” cluster sizes make impact on the performances of clustering algorithms? 3. How to choose an appropriate clustering algorithm in practice? The answers to these questions can guide us for the better understanding and the use of clustering methods. This is noteworthy, since 1) in theory, people seldom realized that there are strong relationships between the clustering algorithms and the cluster size distributions, and 2) in practice, how to choose an appropriate clustering algorithm is still a challenging task, especially after an algorithm boom in data mining area. This chapter thus tries to fill this void initially. To this end, we carefully select two widely used categories of clustering algorithms, i.e., K-means and Agglomerative Hierarchical Clustering (AHC), as the representative algorithms for illustration. In the chapter, we first show that K-means tends to generate the clusters with a relatively uniform distribution on the cluster sizes. Then we demonstrate that UPGMA, one of the robust AHC methods, acts in an opposite way to K-means; that is, UPGMA tends to generate the clusters with high variation on the cluster sizes. Indeed, the experimental results indicate that the variations of the resultant cluster sizes by K-means and UPGMA, measured by the Coefficient of Variation (CV), are in the specific intervals, say [0.3, 1.0] and [1.0, 2.5] respectively. Finally, we put together K-means and UPGMA for a further comparison, and propose some rules for the better choice of the clustering schemes from the data distribution point of view.

Download Full-text

Comparative Analysis of Clustering Algorithms with Heart Disease Datasets Using Data Mining Weka Tool

Advances in Intelligent Systems and Computing - Soft Computing and Signal Processing ◽

10.1007/978-981-13-3600-3_11 ◽

2019 ◽

pp. 111-117

Author(s):

Sarangam Kodati ◽

R. Vivekanandam ◽

G. Ravi

Keyword(s):

Data Mining ◽

Heart Disease ◽

Comparative Analysis ◽

Clustering Algorithms ◽

Using Data

Download Full-text

Clustering algorithms for area geographical entities in spatial data mining

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery ◽

10.1109/fskd.2010.5569368 ◽

2010 ◽

Cited By ~ 1

Author(s):

Guang-xue Chen ◽

Xiao-zhou Li ◽

Qi-feng Chen ◽

Xiao-zhou Li

Keyword(s):

Data Mining ◽

Spatial Data ◽

Clustering Algorithms ◽

Spatial Data Mining

Download Full-text

BioMedR: an R/CRAN package for integrated data analysis pipeline in biomedical study

Briefings in Bioinformatics ◽

10.1093/bib/bbz150 ◽

2019 ◽

Cited By ~ 2

Author(s):

Jie Dong ◽

Min-Feng Zhu ◽

Yong-Huan Yun ◽

Ai-Ping Lu ◽

Ting-Jun Hou ◽

...

Keyword(s):

Data Mining ◽

Clustering Algorithms ◽

R Package ◽

Integrated Analysis ◽

Analysis Pipeline ◽

Molecular Fingerprints ◽

Useful Knowledge ◽

Data Mining Algorithms ◽

Mining Methods ◽

Mining Algorithms

Abstract Background With the increasing development of biotechnology and information technology, publicly available data in chemistry and biology are undergoing explosive growth. Such wealthy information in these resources needs to be extracted and then transformed to useful knowledge by various data mining methods. However, a main computational challenge is how to effectively represent or encode molecular objects under investigation such as chemicals, proteins, DNAs and even complicated interactions when data mining methods are employed. To further explore these complicated data, an integrated toolkit to represent different types of molecular objects and support various data mining algorithms is urgently needed. Results We developed a freely available R/CRAN package, called BioMedR, for molecular representations of chemicals, proteins, DNAs and pairwise samples of their interactions. The current version of BioMedR could calculate 293 molecular descriptors and 13 kinds of molecular fingerprints for small molecules, 9920 protein descriptors based on protein sequences and six types of generalized scale-based descriptors for proteochemometric modeling, more than 6000 DNA descriptors from nucleotide sequences and six types of interaction descriptors using three different combining strategies. Moreover, this package realized five similarity calculation methods and four powerful clustering algorithms as well as several useful auxiliary tools, which aims at building an integrated analysis pipeline for data acquisition, data checking, descriptor calculation and data modeling. Conclusion BioMedR provides a comprehensive and uniform R package to link up different representations of molecular objects with each other and will benefit cheminformatics/bioinformatics and other biomedical users. It is available at: https://CRAN.R-project.org/package=BioMedR and https://github.com/wind22zhu/BioMedR/.

Download Full-text

Individual Movements and Geographical Data Mining. Clustering Algorithms for Highlighting Hotspots in Personal Navigation Routes

An Efficient Clustering Approach for Automatic Detection of Calcification in Low Dose Chest CT

Improved minimum-minimum roughness algorithm for clustering categorical data

The comparative study of text documents clustering algorithms

Canonical PSO Based K-Means Clustering Approach for Real Datasets

Evaluation of Clustering Methods for Adaptive Learning Systems

Ensemble Clustering Data Mining and Databases

A Data Distribution View of Clustering Algorithms

Comparative Analysis of Clustering Algorithms with Heart Disease Datasets Using Data Mining Weka Tool

Clustering algorithms for area geographical entities in spatial data mining

BioMedR: an R/CRAN package for integrated data analysis pipeline in biomedical study

Export Citation Format