A STEP TOWARDS THE MAJORITY-BASED CLUSTERING VALIDATION DECISION FUSION METHOD

A variety of clustering validation indices (CVIs) aimed at validating the results of clustering analysis and determining which clustering algorithm performs best. Different validation indices may be appropriate for different clustering algorithms or partition dissimilarity measures; however, the best suitable index to use in practice remains unknown. A single CVI is generally unable to handle the wide variability and scalability of the data and cope successfully with all the contexts. Therefore, one of the popular approaches is to use a combination of multiple CVIs and fuse their votes into the final decision. The aim of this work is to analyze the majority-based decision fusion method. Thus, the experimental work consisted of designing and implementing the NbClust majority-based decision fusion method and then evaluating the CVIs performance with different clustering algorithms and dissimilarity measures in order to discover the best validation configuration. Moreover, the author proposed to enhance the standard majority-based decision fusion method with straightforward rules for the maximum efficiency of the validation procedure. The result showed that the designed enhanced method with an invasive validation configuration could cope with almost all data sets (99%) with different experimental factors (density, dimensionality, number of clusters, etc.).

Download Full-text

Implementation of Energy HEED (ER-HEED) Protocol Using Super Cluster Head for WSN

Asian Journal of Computer Science and Technology ◽

10.51983/ajcst-2018.7.s1.1790 ◽

2018 ◽

Vol 7 (S1) ◽

pp. 119-122

Author(s):

G. Pattabirani ◽

K. Selvakumar

Keyword(s):

Energy Consumption ◽

Network Lifetime ◽

Energy Efficient ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cluster Head ◽

Hybrid Energy ◽

Almost All ◽

Energy Efficient Clustering ◽

Super Cluster Head

Wireless Sensor Network (WSN) is used in almost all applications in developing environment. This is due to their ability and easy implementation through several applications. The most important criteria in WSN are to minimize the energy consumption and improve the network lifetime. Clustering algorithms are considered as one of the effective way to improve the network lifetime in WSN. Hybrid, Energy-Efficient and Distributed (HEED) clustering approach uses energy-efficient clustering algorithm. This paper proposes an Enhanced Rotational HEED (ER-HEED) protocol using super cluster head for minimizing energy consumption and to improve the network lifetime. The proposed work is carried out in two stages, first stage, super cluster head is introduced. In second stage, the node with maximum threshold is chosen as a cluster head on rotation within in the cluster. The results show that the ER-HEED performs well when compared with HEED and LEACH.

Download Full-text

An Empirical Comparison of Latest Data Clustering Algorithms with State-of-the-Art

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v5.i2.pp410-415 ◽

2017 ◽

Vol 5 (2) ◽

pp. 410 ◽

Cited By ~ 4

Author(s):

Xianjin Shi ◽

Wanwan Wang ◽

Chongsheng Zhang

Keyword(s):

Data Clustering ◽

Spectral Clustering ◽

Clustering Algorithm ◽

State Of The Art ◽

Clustering Algorithms ◽

Density Peak ◽

The Past ◽

Overall Performance ◽

Public Datasets ◽

Clustering Validation

Over the past few decades, a great many data clustering algorithms have been developed, including K-Means, DBSCAN, Bi-Clustering and Spectral clustering, etc. In recent years, two new data clustering algorithms have been proposed, which are affinity propagation (AP, 2007) and density peak based clustering (DP, 2014). In this work, we empirically compare the performance of these two latest data clustering algorithms with state-of-the-art, using 6 external and 2 internal clustering validation metrics. Our experimental results on 16 public datasets show that, the two latest clustering algorithms, AP and DP, do not always outperform DBSCAN. Therefore, to find the best clustering algorithm for a specific dataset, all of AP, DP and DBSCAN should be considered. Moreover, we find that the comparison of different clustering algorithms is closely related to the clustering evaluation metrics adopted. For instance, when using the Silhouette clustering validation metric, the overall performance of K-Means is as good as AP and DP. This work has important reference values for researchers and engineers who need to select appropriate clustering algorithms for their specific applications.

Download Full-text

ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional data

PeerJ Computer Science ◽

10.7717/peerj-cs.155 ◽

2018 ◽

Vol 4 ◽

pp. e155 ◽

Cited By ~ 3

Author(s):

Mohith Manjunath ◽

Yi Zhang ◽

Yeonsung Kim ◽

Steve H. Yeo ◽

Omar Sobh ◽

...

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Clustering Methods ◽

Web Interface ◽

Web Resource ◽

Interactive Visualizations ◽

Data Points ◽

Clustering Data ◽

Clustering Validation ◽

Intuitive Manner

Background Clustering is one of the most common techniques in data analysis and seeks to group together data points that are similar in some measure. Although there are many computer programs available for performing clustering, a single web resource that provides several state-of-the-art clustering methods, interactive visualizations and evaluation of clustering results is lacking. Methods ClusterEnG (acronym for Clustering Engine for Genomics) provides a web interface for clustering data and interactive visualizations including 3D views, data selection and zoom features. Eighteen clustering validation measures are also presented to aid the user in selecting a suitable algorithm for their dataset. ClusterEnG also aims at educating the user about the similarities and differences between various clustering algorithms and provides tutorials that demonstrate potential pitfalls of each algorithm. Conclusions The web resource will be particularly useful to scientists who are not conversant with computing but want to understand the structure of their data in an intuitive manner. The validation measures facilitate the process of choosing a suitable clustering algorithm among the available options. ClusterEnG is part of a bigger project called KnowEnG (Knowledge Engine for Genomics) and is available at http://education.knoweng.org/clustereng.

Download Full-text

A NOVEL DECISION FUSION METHOD BASED ON MULTI-SENSOR BEHAVIOR AND ITS APPLICATION FOR NETWORKED TARGET IDENTIFICATION

International Journal of Information Acquisition ◽

10.1142/s0219878907001277 ◽

2007 ◽

Vol 04 (03) ◽

pp. 185-192

Author(s):

ALI J. RASHIDI

Keyword(s):

Decision Making ◽

Target Detection ◽

Target Identification ◽

Decision Fusion ◽

The Other ◽

Final Decision ◽

Fusion Method ◽

Detection And Identification ◽

Long Time ◽

Behavior Based

In this paper, we would focus on submitting a new decision fusion method based on multiple sensors' behaviors applying to target detection and identification in a network of distributed sensors. Each sensor has its own reliability, error rate and output data. Hence, in a processing and decision-making center in which target data are received from different sensors and sources, correctness and speed of final decision-making depend on data fusion method. The extraction, modeling and weighing of long-time and temporary behavior functions of each data source and using precise and fast decision making/fusion method are the main purpose of this article. After the introduction, we try to consider the data fusion method in decision level, such as voting schemes, rank based method and Bayesian inference. Hence, in a distributed target detection and identification system, we explain the specific and the functional features model of each source using long-time and temporary behavior functions. So we introduce the behavior based method as a new decision fusion method based on long-time and temporary behaviors of local decision makers. Therefore, we will observe that the behavior based method results, which pointed both to the temporal and the long time behaviors of the input decision makings, are very much nearer to reality and its correctness in target identification is much higher than the other methods. Examples are given corresponding to the target detection and identification systems to compare the new method with the other methods are shown that the behavior based method has its own exclusive capability in target detection, identification and producing final decision without ambiguity.

Download Full-text

Entropy-Based Multiview Data Clustering Analysis in the Era of Industry 4.0

Wireless Communications and Mobile Computing ◽

10.1155/2021/9963133 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Yi Gu ◽

Kang Li

Keyword(s):

Industry 4.0 ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Complex Data ◽

Single View ◽

The Face ◽

Fuzzy C Means Clustering ◽

Fuzzy Index ◽

Multiview Clustering ◽

Almost All

In the era of Industry 4.0, single-view clustering algorithm is difficult to play a role in the face of complex data, i.e., multiview data. In recent years, an extension of the traditional single-view clustering is multiview clustering technology, which is becoming more and more popular. Although the multiview clustering algorithm has better effectiveness than the single-view clustering algorithm, almost all the current multiview clustering algorithms usually have two weaknesses as follows. (1) The current multiview collaborative clustering strategy lacks theoretical support. (2) The weight of each view is averaged. To solve the above-mentioned problems, we used the Havrda-Charvat entropy and fuzzy index to construct a new collaborative multiview fuzzy c-means clustering algorithm using fuzzy weighting called Co-MVFCM. The corresponding results show that the Co-MVFCM has the best clustering performance among all the comparison clustering algorithms.

Download Full-text

Handling WSD using Hierarchical Clustering Algorithm with sentences

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset1841120 ◽

2018 ◽

pp. 83-88

Author(s):

Mohana Priya K ◽

Pooja Ragavi S ◽

Krishna Priya G

Keyword(s):

Hierarchical Clustering ◽

Similarity Measure ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cosine Similarity Measure ◽

Hierarchical Clustering Algorithm ◽

Multiple Levels ◽

Pos Tagger ◽

Sentence Clustering ◽

The Right

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%

Download Full-text

DRSA: a non-hierarchical clustering algorithm using k-NN graph and its application in vegetation classification

Vegetation of Russia ◽

10.31111/vegrus/2015.27.125 ◽

2015 ◽

pp. 125-138 ◽

Cited By ~ 2

Author(s):

I. V. Goncharenko

Keyword(s):

Cluster Analysis ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Clustering Algorithms ◽

Protein Structures ◽

Hierarchical Cluster ◽

Vegetation Classification ◽

K Nearest Neighbor ◽

Neighbor Graph ◽

Nearest Neighbor Graph

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classiﬁcation was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.

Download Full-text

User Power Behavior Similarity Clustering Based on Unsupervised Extreme Learning Machine Algorithm

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2352096512666191004130655 ◽

2020 ◽

Vol 13 (5) ◽

pp. 641-649

Author(s):

Yuancheng Li ◽

Yaqi Cui ◽

Xiaolong Zhang

Keyword(s):

Extreme Learning Machine ◽

Clustering Algorithm ◽

Characteristic Curve ◽

Clustering Algorithms ◽

Data Sets ◽

Residential Areas ◽

Processing Power ◽

Learning Machine ◽

Advanced Metering ◽

Matlab Programming

Background: Advanced Metering Infrastructure (AMI) for the smart grid is growing rapidly which results in the exponential growth of data collected and transmitted in the device. By clustering this data, it can give the electricity company a better understanding of the personalized and differentiated needs of the user. Objective: The existing clustering algorithms for processing data generally have some problems, such as insufficient data utilization, high computational complexity and low accuracy of behavior recognition. Methods: In order to improve the clustering accuracy, this paper proposes a new clustering method based on the electrical behavior of the user. Starting with the analysis of user load characteristics, the user electricity data samples were constructed. The daily load characteristic curve was extracted through improved extreme learning machine clustering algorithm and effective index criteria. Moreover, clustering analysis was carried out for different users from industrial areas, commercial areas and residential areas. The improved extreme learning machine algorithm, also called Unsupervised Extreme Learning Machine (US-ELM), is an extension and improvement of the original Extreme Learning Machine (ELM), which realizes the unsupervised clustering task on the basis of the original ELM. Results: Four different data sets have been experimented and compared with other commonly used clustering algorithms by MATLAB programming. The experimental results show that the US-ELM algorithm has higher accuracy in processing power data. Conclusion: The unsupervised ELM algorithm can greatly reduce the time consumption and improve the effectiveness of clustering.

Download Full-text

Pinball Loss Twin Support Vector Clustering

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3409264 ◽

2021 ◽

Vol 17 (2s) ◽

pp. 1-23

Author(s):

M. Tanveer ◽

Tarun Gupta ◽

Miten Shah ◽

Keyword(s):

Loss Function ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Structural Mri ◽

Twin Support Vector Machine ◽

Support Vector ◽

Support Vector Clustering ◽

Hinge Loss ◽

Pinball Loss ◽

Vector Clustering

Twin Support Vector Clustering (TWSVC) is a clustering algorithm inspired by the principles of Twin Support Vector Machine (TWSVM). TWSVC has already outperformed other traditional plane based clustering algorithms. However, TWSVC uses hinge loss, which maximizes shortest distance between clusters and hence suffers from noise-sensitivity and low re-sampling stability. In this article, we propose Pinball loss Twin Support Vector Clustering (pinTSVC) as a clustering algorithm. The proposed pinTSVC model incorporates the pinball loss function in the plane clustering formulation. Pinball loss function introduces favorable properties such as noise-insensitivity and re-sampling stability. The time complexity of the proposed pinTSVC remains equivalent to that of TWSVC. Extensive numerical experiments on noise-corrupted benchmark UCI and artificial datasets have been provided. Results of the proposed pinTSVC model are compared with TWSVC, Twin Bounded Support Vector Clustering (TBSVC) and Fuzzy c-means clustering (FCM). Detailed and exhaustive comparisons demonstrate the better performance and generalization of the proposed pinTSVC for noise-corrupted datasets. Further experiments and analysis on the performance of the above-mentioned clustering algorithms on structural MRI (sMRI) images taken from the ADNI database, face clustering, and facial expression clustering have been done to demonstrate the effectiveness and feasibility of the proposed pinTSVC model.

Download Full-text

An Enhanced Spectral Clustering Algorithm with S-Distance

Symmetry ◽

10.3390/sym13040596 ◽

2021 ◽

Vol 13 (4) ◽

pp. 596

Author(s):

Krishna Kumar Sharma ◽

Ayan Seal ◽

Enrique Herrera-Viedma ◽

Ondrej Krejcar

Keyword(s):

Spectral Clustering ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Clustering Algorithms ◽

Rank Test ◽

Customer Churn ◽

Signed Rank ◽

Signed Rank Test ◽

Spectral Clustering Algorithm ◽

Industrial Databases

Calculating and monitoring customer churn metrics is important for companies to retain customers and earn more profit in business. In this study, a churn prediction framework is developed by modified spectral clustering (SC). However, the similarity measure plays an imperative role in clustering for predicting churn with better accuracy by analyzing industrial data. The linear Euclidean distance in the traditional SC is replaced by the non-linear S-distance (Sd). The Sd is deduced from the concept of S-divergence (SD). Several characteristics of Sd are discussed in this work. Assays are conducted to endorse the proposed clustering algorithm on four synthetics, eight UCI, two industrial databases and one telecommunications database related to customer churn. Three existing clustering algorithms—k-means, density-based spatial clustering of applications with noise and conventional SC—are also implemented on the above-mentioned 15 databases. The empirical outcomes show that the proposed clustering algorithm beats three existing clustering algorithms in terms of its Jaccard index, f-score, recall, precision and accuracy. Finally, we also test the significance of the clustering results by the Wilcoxon’s signed-rank test, Wilcoxon’s rank-sum test, and sign tests. The relative study shows that the outcomes of the proposed algorithm are interesting, especially in the case of clusters of arbitrary shape.

Download Full-text