Cluster Optimization for Boundary Points using Distributive Progressive Feature Selection Algorithm

A group of different data objects is classified as similar objects is known as clusters. It is the process of finding homogeneous data items like patterns, documents etc. and then group the homogenous data items togetherothers groupsmay have dissimilar data items. Most of the clustering methods are either crisp or fuzzy and moreover member allocation to the respective clusters is strictly based on similarity measures and membership functions.Both of the methods have limitations in terms of membership. One strictly decides a sample must belong to single cluster and other anyway fuzzy i.e probability. Finally, Quality and Purity like measure are applied to understand how well clusters are created. But there is a grey area in between i.e. ‘Boundary Points’ and ‘Moderately Far’ points from the cluster centre. We considered the cluster quality [18], processing time and relevant features identification as basis for our problem statement and implemented Zone based clustering by using map reducer concept. I have implemented the process to find far points from different clusters and generate a new cluster, repeat the above process until cluster quantity is stabilized. By using this processwe can improve the cluster quality and processing time also.

Download Full-text

PRIVACY PRESERVING CLUSTERING BASED ON LINEAR APPROXIMATION OF FUNCTION

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v12i5.2914 ◽

2013 ◽

Vol 12 (5) ◽

pp. 3443-3451

Author(s):

Rajesh Pasupuleti ◽

Narsimha Gugulothu

Keyword(s):

Linear Approximation ◽

Clustering Algorithms ◽

Similarity Measures ◽

Privacy Preserving ◽

Distance Measures ◽

Clustering Methods ◽

Sensitive Data ◽

Processing Information ◽

Data Objects ◽

Approximation Of Function

Clustering analysis initiativesÂ a new direction in data mining that has major impact in various domains including machine learning, pattern recognition, image processing, information retrieval and bioinformatics. Current clustering techniques address some of theÂ requirements not adequately and failed in standardizing clustering algorithms to support for all real applications. Many clustering methods mostly depend on user specified parametric methods and initial seeds of clusters are randomly selected byÂ user.Â In this paper, we proposed new clustering method based on linear approximation of function by getting over all idea of behavior knowledge of clustering function, then pick the initial seeds of clusters as the points on linear approximation line and perform clustering operations, unlike grouping data objects into clusters by using distance measures, similarity measures and statistical distributions in traditional clustering methods. We have shown experimental results as clusters based on linear approximation yields goodÂ results in practice with an example ofÂ business data are provided.Â It alsoÂ explains privacy preserving clusters of sensitive data objects.

Download Full-text

Using Clustering as a Tool: Mixed Methods in Qualitative Data Analysis

The Qualitative Report ◽

10.46743/2160-3715/2015.2201 ◽

2015 ◽

Author(s):

Laura Macia

Keyword(s):

Cluster Analysis ◽

Mixed Methods ◽

Data Analysis ◽

Qualitative Data ◽

Similarity Measures ◽

Cluster Solution ◽

Clustering Methods ◽

Qualitative Data Analysis ◽

Research Project ◽

Detailed Explanation

In this article I discuss cluster analysis as an exploratory tool to support the identification of associations within qualitative data. While not appropriate for all qualitative projects, cluster analysis can be particularly helpful in identifying patterns where numerous cases are studied. I use as illustration a research project on Latino grievances to offer a detailed explanation of the main steps in cluster analysis, providing specific considerations for its use with qualitative data. I specifically describe the issues of data transformation, the choice of clustering methods and similarity measures, the identification of a cluster solution, and the interpretation of the data in a qualitative context.

Download Full-text

Adaptive virtual MIMO single cluster optimization in a small cell

2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence ◽

10.1109/confluence.2017.7943198 ◽

2017 ◽

Cited By ~ 1

Author(s):

Triantafyllos Kanakis ◽

Michael Opoku Agyeman ◽

Anastasios Bakaoukas

Keyword(s):

Small Cell ◽

Virtual Mimo ◽

Cluster Optimization ◽

Single Cluster

Download Full-text

Comparison of similarity measures and clustering methods for time-series medical data mining

10.1117/12.487508 ◽

2003 ◽

Cited By ~ 1

Author(s):

Shoji Hirano ◽

Shisaku Tsumoto

Keyword(s):

Data Mining ◽

Time Series ◽

Similarity Measures ◽

Medical Data ◽

Medical Data Mining ◽

Clustering Methods

Download Full-text

Implementasi Algoritma TF-IDF Pada Pengukuran Kesamaan Dokumen

JuSiTik : Jurnal Sistem dan Teknologi Informasi Komunikasi ◽

10.32524/jusitik.v1i1.218 ◽

2017 ◽

Vol 1 (1) ◽

pp. 53 ◽

Cited By ~ 1

Author(s):

Sri Andayani ◽

Ady Ryansyah

Keyword(s):

Similarity Measure ◽

Processing Time ◽

Similarity Measures ◽

Cosine Similarity ◽

Vector Representation ◽

Document Collection ◽

Pdf Format ◽

Word Format

Documents similarity measure is a time consuming problem. The large amount of documents and the large number of pages per document are causing the similarity measures to becomes a complicated and hard job to do manually. In this research, a system that can automatically measuring similarity between documents is built by implementing TF-IDF. Measurements are carried by first creating a vector representation of documents being compared. This vector representation containing the weight of each term in the documents. After that, the similarity value are calculated using cosine similarity. The finished system can carry out comparison of documents in pdf or word format. Document comparison can be done using all the chapters in the report, or just a few selected chapters that are considered significant. Based on experiment, it can be concluded that TF-IDF needs at least three documents to be available in the document collection being processes. The test of correlation shows that for document in pdf format, there is a significant correlation between the amount of characters in the document with the processing time.

Download Full-text

Evaluation of the Gower coefficient modifications in hierarchical clustering

Advances in Methodology and Statistics ◽

10.51936/eqvy9516 ◽

2017 ◽

Vol 14 (1) ◽

Author(s):

Zdeněk Šulc ◽

Martin Matějka ◽

Jiří Procházka ◽

Hana Řezanková

Keyword(s):

Hierarchical Clustering ◽

Mixed Type ◽

Similarity Measures ◽

Rand Index ◽

Clustering Methods ◽

Single Linkage ◽

Linkage Methods ◽

Hierarchical Clustering Methods ◽

Nominal Variables

This paper thoroughly examines three recently introduced modifications of the Gower coefficient, which were determined for data with mixed-type variables in hierarchical clustering. On the contrary to the original Gower coefficient, which only recognizes if two categories match or not in the case of nominal variables, the examined modifications offer three different approaches to measuring the similarity between categories. The examined dissimilarity measures are compared and evaluated regarding the quality of their clusters measured by three internal indices (Dunn, silhouette, McClain) and regarding their classification abilities measured by the Rand index. The comparison is performed on 810 generated datasets. In the analysis, the performance of the similarity measures is evaluated by different data characteristics (the number of variables, the number of categories, the distance of clusters, etc.) and by different hierarchical clustering methods (average, complete, McQuitty and single linkage methods). As a result, two modifications are recommended for the use in practice.

Download Full-text

A Density-Based Method for the Identification of Non-Disjoint Clusters With Arbitrary and Non-Spherical Shapes

Computer Science ◽

10.7494/csci.2021.22.2.4002 ◽

2021 ◽

Vol 22 (2) ◽

Author(s):

Chiheb Eddine Ben Ncir

Keyword(s):

Unsupervised Learning ◽

Clustering Methods ◽

Clustering Method ◽

Overlapping Clustering ◽

Overlapping Clusters ◽

Data Object ◽

Complex Shapes ◽

Important Challenge ◽

Data Objects

Overlapping clustering is an important challenge in unsupervised learning applications while it allows for each data object to belong to more than one group. Several clustering methods were proposed to deal with this requirement by using several usual clustering approaches. Although the ability of these methods to detect non-disjoint partitioning, they fail when data contain groups with arbitrary and non-spherical shapes. We propose in this work a new density based overlapping clustering method, referred to as OC-DD, which is able to detect overlapping clusters even having non-spherical and complex shapes. The proposed method is based on the density and distances to detect dense regions in data while allowing for some data objects to belong to more than one group.Experiments performed on articial and real multi-labeled datasets have shown the effectiveness of the proposed method compared to the existing ones.

Download Full-text

PERFORMANCE EVALUATION OF ANAGRAM ORTHOGRPHIC SIMILARITY MEASURES

JOURNAL OF RESEARCH AND REVIEW IN SCIENCE ◽

10.36108/jrrslasu/7102/40(0152) ◽

2017 ◽

Vol 4 (1) ◽

Author(s):

Hanat Raji-lawal

Keyword(s):

Processing Time ◽

Working Memory Capacity ◽

Similarity Measures ◽

Real Data ◽

Cognitive Test ◽

Test Automation ◽

Brute Force ◽

Test Tool ◽

Orthographic Similarity ◽

Character Position

Introduction: Anagram solving task involves a retrieving process from previously acquired knowledge, this serves as a suitable memory cognition test. Automation of this process can give a very good memory cognitive test tool, the method behind this automation is anagram orthographic similarity measure. Aim: The purpose of this research is to study existing anagram orthographic similarity measures, to deduce their strengths and weaknesses, for further improvement. Materials and Methods: Experiments were carried out on the measures, using real data. Their behaviour on different orthographic string set was observed. Result: Experiment revealed that brute force has a very poor processing time, while sorting and neighbourhood frequency does not have issues with processing time. Conclusion: The research revealed that existing anagram orthographic similarity measures are not suitable for character position verification and evaluation of syllabic complexity which are essential measures of working memory capacity.

Download Full-text

Shape Codification Indexing and Retrieval Using the Quad-Tree Structure

International Journal of Computer Vision and Image Processing ◽

10.4018/ijcvip.2013010101 ◽

2013 ◽

Vol 3 (1) ◽

pp. 1-21

Author(s):

Saliha Aouat

Keyword(s):

Processing Time ◽

Similarity Measures ◽

Tree Structure ◽

Retrieval Process ◽

New Approach ◽

3D Objects ◽

Quad Tree ◽

Indexing And Retrieval ◽

Different Shapes ◽

Different Levels

The author presents in this paper a new approach for indexing and Content-based image retrieval based on the Quad-tree structure. The 3D objects are represented by their silhouettes and codified following the filling rate of each quadrant at different levels of the quad-tree subdivision. The author proposes a modified linear codification for silhouettes, this method improves the processing time because, in opposite to the traditional algorithms, the author’s algorithm has not a processing time that is proportional to the number of pixels in the image. As the same descriptor may characterize a set of different shapes, the author proposes also, efficient similarity measures to distinguish different objects having the same index in order to apply the approach to the retrieval process.

Download Full-text

A Relative Performance of Dissimilarity Measures for Matching Relational Web Access Patterns Between User Sessions

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Handbook of Research on Pattern Engineering System Development for Big Data Analytics ◽

10.4018/978-1-5225-3870-7.ch010 ◽

2018 ◽

pp. 153-176

Author(s):

Dilip Singh Sisodia

Keyword(s):

Clustering Algorithms ◽

Similarity Measures ◽

Comparative Performance ◽

Dissimilarity Measures ◽

User Session ◽

Relational Clustering ◽

Cluster Distance ◽

Web Access ◽

Access Patterns ◽

Cluster Quality

Customized web services are offered to users by grouping them according to their access patterns. Clustering techniques are very useful in grouping users and analyzing web access patterns. Clustering can be an object clustering performed on feature vectors or relational clustering performed on relational data. The relational clustering is preferred over object clustering for web users' sessions because of high dimensionality and sparsity of web users' data. However, relational clustering of web users depends on underlying dissimilarity measures used. Therefore, correct dissimilarity measure for matching relational web access patterns between user sessions is very important. In this chapter, the various dissimilarity measures used in relational clustering of web users' data are discussed. The concept of an augmented user session is also discussed to derive different augmented session dissimilarity measures. The discussed session dissimilarity measures are used with relational fuzzy clustering algorithms. The comparative performance binary session similarity and augmented session similarity measures are evaluated using intra-cluster and inter-cluster distance-based cluster quality ratio. The results suggested the augmented session dissimilarity measures in general, and intuitive augmented session (dis)similarity measure, in particular, performed better than the other measures.

Download Full-text