Featureless Data Clustering

Handbook of Research on Text and Web Mining Technologies ◽

10.4018/978-1-59904-990-8.ch009 ◽

2010 ◽

pp. 141-164 ◽

Cited By ~ 2

Author(s):

Wilson Wong

Keyword(s):

Data Clustering ◽

Dominant Role ◽

Clustering Algorithms ◽

Adaptive Clustering ◽

Feature Based ◽

Clustering Approach ◽

Semantic Computation ◽

The Many ◽

Time Required ◽

Existing Data

Feature-based semantic measurements have played a dominant role in conventional data clustering algorithms for many existing applications. However, the applicability of existing data clustering approaches to a wider range of applications is limited due to issues such as complexity involved in semantic computation, long pre-processing time required for feature preparation, and poor extensibility of semantic measurement due to non-incremental feature source. This chapter first summarises the many commonly used clustering algorithms and feature-based semantic measurements, and then highlights the shortcomings to make way for the proposal of an adaptive clustering approach based on featureless semantic measurements. The chapter concludes with experiments demonstrating the performance and wide applicability of the proposed clustering approach.

Download Full-text

Analysis of existing data clustering algorithms. Advantages and disadvantages

Connectivity ◽

10.31673/2412-9070.2020.061719 ◽

2020 ◽

Vol 143 (1) ◽

pp. 17-19

Author(s):

Ye. S. Tykhonov ◽

◽

K. V. Tykhonova

Keyword(s):

Data Clustering ◽

Clustering Algorithms ◽

Advantages And Disadvantages ◽

Existing Data

Download Full-text

Adaptive Clustering Techniques and Their Applications

Advances in Data Mining and Database Management - Intelligent Multidimensional Data Clustering and Analysis ◽

10.4018/978-1-5225-1776-4.ch015 ◽

2017 ◽

pp. 380-397

Author(s):

Deepthi P. Hudedagaddi ◽

B. K. Tripathy

Keyword(s):

Data Clustering ◽

Clustering Algorithms ◽

Efficient Technique ◽

Clustering Techniques ◽

Fuzzy C Means ◽

Adaptive Clustering ◽

Intuitionistic Fuzzy ◽

Large Scope ◽

Fuzzy C Means Clustering ◽

Clustering Data

With the increasing volume of data, developing techniques to handle it has become the need of the hour. One such efficient technique is clustering. Data clustering is under vigorous development. The goal of clustering is to determine the intrinsic grouping in a set of unlabeled data. Several data clustering algorithms have been developed in this regard. Data is uncertain and vague. Hence uncertain and hybrid based clustering algorithms like fuzzy c means, intuitionistic fuzzy c means, rough c means, rough intuitionistic fuzzy c means are being used. However, with the application and nature of data, clustering algorithms which adapt to the need are being used. These are nothing but the variations in existing techniques to match a particular scenario. The area of adaptive clustering algorithms is unexplored to a very large extent and hence has a large scope of research. Adaptive clustering algorithms are useful in areas where the situations keep on changing. Some of the adaptive fuzzy c means clustering algorithms are detailed in this chapter.

Download Full-text

An Introduction to Clustering Algorithms in Big Data

Encyclopedia of Information Science and Technology, Fifth Edition - Advances in Information Quality and Management ◽

10.4018/978-1-7998-3479-3.ch040 ◽

2021 ◽

pp. 559-576

Author(s):

Rajit Nair ◽

Amit Bhagat

Keyword(s):

Big Data ◽

Single Machine ◽

Data Clustering ◽

Clustering Algorithms ◽

Time Limit ◽

Computation Cost ◽

Different Types ◽

Clustering Approach ◽

Future Path ◽

Parallel Clustering

In big data, clustering is the process through which analysis is performed. Since the data is big, it is very difficult to perform clustering approach. Big data is mainly termed as petabytes and zeta bytes of data and high computation cost is needed for the implementation of clusters. In this chapter, the authors show how clustering can be performed on big data and what are the different types of clustering approach. The challenge during clustering approach is to find observations within the time limit. The chapter also covers the possible future path for more advanced clustering algorithms. The chapter will cover single machine clustering and multiple machines clustering, which also includes parallel clustering.

Download Full-text

Fractional Fuzzy Clustering and Particle Whale Optimization-Based MapReduce Framework for Big Data Clustering

Journal of Intelligent Systems ◽

10.1515/jisys-2018-0117 ◽

2019 ◽

Vol 29 (1) ◽

pp. 1496-1513 ◽

Cited By ~ 1

Author(s):

Omkaresh Kulkarni ◽

Sudarson Jena ◽

C. H. Sanjay

Keyword(s):

Big Data ◽

Data Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Mapreduce Framework ◽

Swarm Optimization ◽

Skin Segmentation ◽

Kernel Clustering ◽

Whale Optimization ◽

Clustering Approach

Abstract The recent advancements in information technology and the web tend to increase the volume of data used in day-to-day life. The result is a big data era, which has become a key issue in research due to the complexity in the analysis of big data. This paper presents a technique called FPWhale-MRF for big data clustering using the MapReduce framework (MRF), by proposing two clustering algorithms. In FPWhale-MRF, the mapper function estimates the cluster centroids using the Fractional Tangential-Spherical Kernel clustering algorithm, which is developed by integrating the fractional theory into a Tangential-Spherical Kernel clustering approach. The reducer combines the mapper outputs to find the optimal centroids using the proposed Particle-Whale (P-Whale) algorithm, for the clustering. The P-Whale algorithm is proposed by combining Whale Optimization Algorithm with Particle Swarm Optimization, for effective clustering such that its performance is improved. Two datasets, namely localization and skin segmentation datasets, are used for the experimentation and the performance is evaluated regarding two performance evaluation metrics: clustering accuracy and DB-index. The maximum accuracy attained by the proposed FPWhale-MRF technique is 87.91% and 90% for the localization and skin segmentation datasets, respectively, thus proving its effectiveness in big data clustering.

Download Full-text

Speaker specific feature based clustering and its applications in language independent forensic speaker recognition

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i4.pp3508-3518 ◽

2020 ◽

Vol 10 (4) ◽

pp. 3508

Author(s):

Satyanand Singh ◽

Pragya Singh

Keyword(s):

Speaker Recognition ◽

Clustering Algorithms ◽

Quantitative Measure ◽

Fuzzy C Means ◽

Reference Space ◽

Feature Based ◽

Ultimate Outcome ◽

Clustering Approach ◽

Recognition Efficiency

Forensic speaker recognition (FSR) is the process of determining whether the source of a questioned voice recording (trace) is a specific individual (suspected speaker). The role of the forensic expert is to testify by using, if possible, a quantitative measure of this value to the value of the voice evidence. Using this information as an aid in their judgments and decisions are up to the judge and/or the jury. Most existing methods measure inter-utterance similarities directly based on spectrum-based characteristics, the resulting clusters may not be well related to speaker’s, but rather to different acoustic classes. This research addresses this deficiency by projecting language-independent utterances into a reference space equipped to cover the standard voice features underlying the entire utterance set. The resulting projection vectors naturally represent the language-independent voice-like relationships among all the utterances and are therefore more robust against non-speaker interference. Then a clustering approach is proposed based on the peak approximation in order to maximize the similarities between language-independent utterances within all clusters. This method uses a K-medoid, Fuzzy C-means, Gustafson and Kessel and Gath-Geva algorithm to evaluate the cluster to which each utterance should be allocated, overcoming the disadvantage of traditional hierarchical clustering that the ultimate outcome can only hit the optimum recognition efficiency. The recognition efficiency of K-medoid, Fuzzy C-means, Gustafson and Kessel and Gath-Geva clustering algorithms are 95.2%, 97.3%, 98.5% and 99.7% and EER are 3.62%, 2.91 %, 2.82%, and 2.61% respectively. The EER improvement of the Gath-Geva technique based FSRsystem compared with Gustafson and Kessel and Fuzzy C-means is 8.04% and 11.49% respectively

Download Full-text

Big Data Clustering And Its Applications Examination

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1466.0982s1119 ◽

2019 ◽

Vol 8 (2S11) ◽

pp. 3687-3693

Keyword(s):

Data Mining ◽

Big Data ◽

Data Clustering ◽

Clustering Algorithms ◽

Large Data ◽

Data Sets ◽

Clustering Methods ◽

Time Saving ◽

Data Set ◽

The Many

Clustering is a type of mining process where the data set is categorized into various sub classes. Clustering process is very much essential in classification, grouping, and exploratory pattern of analysis, image segmentation and decision making. And we can explain about the big data as very large data sets which are examined computationally to show techniques and associations and also which is associated to the human behavior and their interactions. Big data is very essential for several organisations but in few cases very complex to store and it is also time saving. Hence one of the ways of overcoming these issues is to develop the many clustering methods, moreover it suffers from the large complexity. Data mining is a type of technique where the useful information is extracted, but the data mining models cannot utilized for the big data because of inherent complexity. The main scope here is to introducing a overview of data clustering divisions for the big data And also explains here few of the related work for it. This survey concentrates on the research of several clustering algorithms which are working basically on the elements of big data. And also the short overview of clustering algorithms which are grouped under partitioning, hierarchical, grid based and model based are seenClustering is major data mining and it is used for analyzing the big data.the problems for applying clustering patterns to big data and also we phase new issues come up with big data

Download Full-text

A General Framework for Mixed and Incomplete Data Clustering Based on Swarm Intelligence Algorithms

Mathematics ◽

10.3390/math9070786 ◽

2021 ◽

Vol 9 (7) ◽

pp. 786

Author(s):

Yenny Villuendas-Rey ◽

Eley Barroso-Cubas ◽

Oscar Camacho-Nieto ◽

Cornelio Yáñez-Márquez

Keyword(s):

Swarm Intelligence ◽

Data Clustering ◽

Incomplete Data ◽

Missing Values ◽

Clustering Algorithms ◽

Bat Algorithm ◽

Hybrid Features ◽

Bee Colony ◽

Learning Tasks ◽

Clustering Data

Swarm intelligence has appeared as an active field for solving numerous machine-learning tasks. In this paper, we address the problem of clustering data with missing values, where the patterns are described by mixed (or hybrid) features. We introduce a generic modification to three swarm intelligence algorithms (Artificial Bee Colony, Firefly Algorithm, and Novel Bat Algorithm). We experimentally obtain the adequate values of the parameters for these three modified algorithms, with the purpose of applying them in the clustering task. We also provide an unbiased comparison among several metaheuristics based clustering algorithms, concluding that the clusters obtained by our proposals are highly representative of the “natural structure” of data.

Download Full-text

A Hard C-Means Clustering Algorithm Incorporating Membership KL Divergence and Local Data Information for Noisy Image Segmentation

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800141850012x ◽

2017 ◽

Vol 32 (04) ◽

pp. 1850012 ◽

Cited By ~ 5

Author(s):

R. R. Gharieb ◽

G. Gendy ◽

H. Selim

Keyword(s):

Image Segmentation ◽

Membership Function ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cluster Center ◽

Local Data ◽

Cluster Membership ◽

Kl Divergence ◽

Clustering Approach ◽

Center Distance

In this paper, the standard hard C-means (HCM) clustering approach to image segmentation is modified by incorporating weighted membership Kullback–Leibler (KL) divergence and local data information into the HCM objective function. The membership KL divergence, used for fuzzification, measures the proximity between each cluster membership function of a pixel and the locally-smoothed value of the membership in the pixel vicinity. The fuzzification weight is a function of the pixel to cluster-centers distances. The used pixel to a cluster-center distance is composed of the original pixel data distance plus a fraction of the distance generated from the locally-smoothed pixel data. It is shown that the obtained membership function of a pixel is proportional to the locally-smoothed membership function of this pixel multiplied by an exponentially distributed function of the minus pixel distance relative to the minimum distance provided by the nearest cluster-center to the pixel. Therefore, since incorporating the locally-smoothed membership and data information in addition to the relative distance, which is more tolerant to additive noise than the absolute distance, the proposed algorithm has a threefold noise-handling process. The presented algorithm, named local data and membership KL divergence based fuzzy C-means (LDMKLFCM), is tested by synthetic and real-world noisy images and its results are compared with those of several FCM-based clustering algorithms.

Download Full-text

The many faces of disability in evidence for policy and practice: embracing complexity

Evidence & Policy A Journal of Research Debate and Practice ◽

10.1332/174426421x16147909420727 ◽

2021 ◽

Author(s):

Carol Rivas ◽

Ikuko Tomomatsu ◽

David Gough

Keyword(s):

Minority Groups ◽

Policy And Practice ◽

Deficit Model ◽

Research Recruitment ◽

Combined Application ◽

Models Of Disability ◽

People With Disability ◽

The Many ◽

Evidence For Policy ◽

Existing Data

Background: This special issue examines the relationship between disability, evidence, and policy.Key points: Several themes cut across the included papers. Despite the development of models of disability that recognise its socially constructed nature, dis/ableism impedes the involvement of people with disability in evidence production and use. The resultant incomplete representations of disability are biased towards its deproblematisation. Existing data often homogenise the heterogeneous. Functioning and impairment categories are used for surveys, research recruitment and policy enactments, that exclude many. Existing data may crudely evidence some systematic inequalities, but the successful and appropriate development and enactment of disability policies requires more contextual data. Categories and labels drawn from a deficit model affect social constructions of identity, and have been used socially and politically to justify the disenfranchisement of people with disability. Well rehearsed within welfare systems, this results in disempowered and devalued objects of policy, and, as described in one Brazilian paper, the systematic breakup of indigenous families. Several studies show the dangers of policy developed without evidence and impact assessments from and with the intended beneficiaries.Conclusions and implications: There is a need to mitigate barriers to inclusive participation, to enable people with disability to collaborate as equals with other policy actors. The combined application of different policy models and ontologies, currently in tension, might better harness their respective strengths and encourage greater transparency and deliberation regarding the flaws inherent in each. Learning should be shared across minority groups.

Download Full-text

Identifying University Cash Flow Pattern Recognition: A Data Clustering Approach

International Journal of Database Theory and Application ◽

10.14257/ijdta.2016.9.4.15 ◽

2016 ◽

Vol 9 (4) ◽

pp. 161-172 ◽

Cited By ~ 1

Author(s):

Yixuan Ma ◽

Zhenji Zhang

Keyword(s):

Pattern Recognition ◽

Flow Pattern ◽

Cash Flow ◽

Data Clustering ◽

Clustering Approach

Download Full-text