Featureless Data Clustering

Author(s):  
Wilson Wong

Feature-based semantic measurements have played a dominant role in conventional data clustering algorithms for many existing applications. However, the applicability of existing data clustering approaches to a wider range of applications is limited due to issues such as complexity involved in semantic computation, long pre-processing time required for feature preparation, and poor extensibility of semantic measurement due to non-incremental feature source. This chapter first summarises the many commonly used clustering algorithms and feature-based semantic measurements, and then highlights the shortcomings to make way for the proposal of an adaptive clustering approach based on featureless semantic measurements. The chapter concludes with experiments demonstrating the performance and wide applicability of the proposed clustering approach.

Author(s):  
Deepthi P. Hudedagaddi ◽  
B. K. Tripathy

With the increasing volume of data, developing techniques to handle it has become the need of the hour. One such efficient technique is clustering. Data clustering is under vigorous development. The goal of clustering is to determine the intrinsic grouping in a set of unlabeled data. Several data clustering algorithms have been developed in this regard. Data is uncertain and vague. Hence uncertain and hybrid based clustering algorithms like fuzzy c means, intuitionistic fuzzy c means, rough c means, rough intuitionistic fuzzy c means are being used. However, with the application and nature of data, clustering algorithms which adapt to the need are being used. These are nothing but the variations in existing techniques to match a particular scenario. The area of adaptive clustering algorithms is unexplored to a very large extent and hence has a large scope of research. Adaptive clustering algorithms are useful in areas where the situations keep on changing. Some of the adaptive fuzzy c means clustering algorithms are detailed in this chapter.


Author(s):  
Rajit Nair ◽  
Amit Bhagat

In big data, clustering is the process through which analysis is performed. Since the data is big, it is very difficult to perform clustering approach. Big data is mainly termed as petabytes and zeta bytes of data and high computation cost is needed for the implementation of clusters. In this chapter, the authors show how clustering can be performed on big data and what are the different types of clustering approach. The challenge during clustering approach is to find observations within the time limit. The chapter also covers the possible future path for more advanced clustering algorithms. The chapter will cover single machine clustering and multiple machines clustering, which also includes parallel clustering.


2019 ◽  
Vol 29 (1) ◽  
pp. 1496-1513 ◽  
Author(s):  
Omkaresh Kulkarni ◽  
Sudarson Jena ◽  
C. H. Sanjay

Abstract The recent advancements in information technology and the web tend to increase the volume of data used in day-to-day life. The result is a big data era, which has become a key issue in research due to the complexity in the analysis of big data. This paper presents a technique called FPWhale-MRF for big data clustering using the MapReduce framework (MRF), by proposing two clustering algorithms. In FPWhale-MRF, the mapper function estimates the cluster centroids using the Fractional Tangential-Spherical Kernel clustering algorithm, which is developed by integrating the fractional theory into a Tangential-Spherical Kernel clustering approach. The reducer combines the mapper outputs to find the optimal centroids using the proposed Particle-Whale (P-Whale) algorithm, for the clustering. The P-Whale algorithm is proposed by combining Whale Optimization Algorithm with Particle Swarm Optimization, for effective clustering such that its performance is improved. Two datasets, namely localization and skin segmentation datasets, are used for the experimentation and the performance is evaluated regarding two performance evaluation metrics: clustering accuracy and DB-index. The maximum accuracy attained by the proposed FPWhale-MRF technique is 87.91% and 90% for the localization and skin segmentation datasets, respectively, thus proving its effectiveness in big data clustering.


Author(s):  
Satyanand Singh ◽  
Pragya Singh

Forensic speaker recognition (FSR) is the process of determining whether the source of a questioned voice recording (trace) is a specific individual (suspected speaker). The role of the forensic expert is to testify by using, if possible, a quantitative measure of this value to the value of the voice evidence. Using this information as an aid in their judgments and decisions are up to the judge and/or the jury. Most existing methods measure inter-utterance similarities directly based on spectrum-based characteristics, the resulting clusters may not be well related to speaker’s, but rather to different acoustic classes. This research addresses this deficiency by projecting language-independent utterances into a reference space equipped to cover the standard voice features underlying the entire utterance set. The resulting projection vectors naturally represent the language-independent voice-like relationships among all the utterances and are therefore more robust against non-speaker interference. Then a clustering approach is proposed based on the peak approximation in order to maximize the similarities between language-independent utterances within all clusters. This method uses a K-medoid, Fuzzy C-means, Gustafson and Kessel and Gath-Geva algorithm to evaluate the cluster to which each utterance should be allocated, overcoming the disadvantage of traditional hierarchical clustering that the ultimate outcome can only hit the optimum recognition efficiency. The recognition efficiency of K-medoid, Fuzzy C-means, Gustafson and Kessel and Gath-Geva clustering algorithms are 95.2%, 97.3%, 98.5% and 99.7% and EER are 3.62%, 2.91 %, 2.82%, and 2.61% respectively. The EER improvement of the Gath-Geva technique based FSRsystem compared with Gustafson and Kessel and Fuzzy C-means is 8.04% and 11.49% respectively


2019 ◽  
Vol 8 (2S11) ◽  
pp. 3687-3693

Clustering is a type of mining process where the data set is categorized into various sub classes. Clustering process is very much essential in classification, grouping, and exploratory pattern of analysis, image segmentation and decision making. And we can explain about the big data as very large data sets which are examined computationally to show techniques and associations and also which is associated to the human behavior and their interactions. Big data is very essential for several organisations but in few cases very complex to store and it is also time saving. Hence one of the ways of overcoming these issues is to develop the many clustering methods, moreover it suffers from the large complexity. Data mining is a type of technique where the useful information is extracted, but the data mining models cannot utilized for the big data because of inherent complexity. The main scope here is to introducing a overview of data clustering divisions for the big data And also explains here few of the related work for it. This survey concentrates on the research of several clustering algorithms which are working basically on the elements of big data. And also the short overview of clustering algorithms which are grouped under partitioning, hierarchical, grid based and model based are seenClustering is major data mining and it is used for analyzing the big data.the problems for applying clustering patterns to big data and also we phase new issues come up with big data


Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 786
Author(s):  
Yenny Villuendas-Rey ◽  
Eley Barroso-Cubas ◽  
Oscar Camacho-Nieto ◽  
Cornelio Yáñez-Márquez

Swarm intelligence has appeared as an active field for solving numerous machine-learning tasks. In this paper, we address the problem of clustering data with missing values, where the patterns are described by mixed (or hybrid) features. We introduce a generic modification to three swarm intelligence algorithms (Artificial Bee Colony, Firefly Algorithm, and Novel Bat Algorithm). We experimentally obtain the adequate values of the parameters for these three modified algorithms, with the purpose of applying them in the clustering task. We also provide an unbiased comparison among several metaheuristics based clustering algorithms, concluding that the clusters obtained by our proposals are highly representative of the “natural structure” of data.


Author(s):  
R. R. Gharieb ◽  
G. Gendy ◽  
H. Selim

In this paper, the standard hard C-means (HCM) clustering approach to image segmentation is modified by incorporating weighted membership Kullback–Leibler (KL) divergence and local data information into the HCM objective function. The membership KL divergence, used for fuzzification, measures the proximity between each cluster membership function of a pixel and the locally-smoothed value of the membership in the pixel vicinity. The fuzzification weight is a function of the pixel to cluster-centers distances. The used pixel to a cluster-center distance is composed of the original pixel data distance plus a fraction of the distance generated from the locally-smoothed pixel data. It is shown that the obtained membership function of a pixel is proportional to the locally-smoothed membership function of this pixel multiplied by an exponentially distributed function of the minus pixel distance relative to the minimum distance provided by the nearest cluster-center to the pixel. Therefore, since incorporating the locally-smoothed membership and data information in addition to the relative distance, which is more tolerant to additive noise than the absolute distance, the proposed algorithm has a threefold noise-handling process. The presented algorithm, named local data and membership KL divergence based fuzzy C-means (LDMKLFCM), is tested by synthetic and real-world noisy images and its results are compared with those of several FCM-based clustering algorithms.


Author(s):  
Carol Rivas ◽  
Ikuko Tomomatsu ◽  
David Gough

Background: This special issue examines the relationship between disability, evidence, and policy.Key points: Several themes cut across the included papers. Despite the development of models of disability that recognise its socially constructed nature, dis/ableism impedes the involvement of people with disability in evidence production and use. The resultant incomplete representations of disability are biased towards its deproblematisation. Existing data often homogenise the heterogeneous. Functioning and impairment categories are used for surveys, research recruitment and policy enactments, that exclude many. Existing data may crudely evidence some systematic inequalities, but the successful and appropriate development and enactment of disability policies requires more contextual data. Categories and labels drawn from a deficit model affect social constructions of identity, and have been used socially and politically to justify the disenfranchisement of people with disability. Well rehearsed within welfare systems, this results in disempowered and devalued objects of policy, and, as described in one Brazilian paper, the systematic breakup of indigenous families. Several studies show the dangers of policy developed without evidence and impact assessments from and with the intended beneficiaries.Conclusions and implications: There is a need to mitigate barriers to inclusive participation, to enable people with disability to collaborate as equals with other policy actors. The combined application of different policy models and ontologies, currently in tension, might better harness their respective strengths and encourage greater transparency and deliberation regarding the flaws inherent in each. Learning should be shared across minority groups.


Sign in / Sign up

Export Citation Format

Share Document