Fault Diagnosis by Multisensor Data: A Data-Driven Approach Based on Spectral Clustering and Pairwise Constraints

This paper deals with clustering based on feature selection of multisensor data in high-dimensional space. Spectral clustering algorithms are efficient tools in signal processing for grouping datasets sampled by multisensor systems for fault diagnosis. The effectiveness of spectral clustering stems from constructing an embedding space based on an affinity matrix. This matrix shows the pairwise similarity of the data points. Clustering is then obtained by determining the spectral decomposition of the Laplacian graph. In the manufacturing field, clustering is an essential strategy for fault diagnosis. In this study, an enhanced spectral clustering approach is presented, which is augmented with pairwise constraints, and that results in efficient identification of fault scenarios. The effectiveness of the proposed approach is described using a real case study about a diesel injection control system for fault detection.

Download Full-text

A Low Dimensional Embedding Method for Combining Clusterings

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.201-203.2517 ◽

2011 ◽

Vol 201-203 ◽

pp. 2517-2520

Author(s):

Sen Xu ◽

Tian Zhou ◽

Hua Long Yu

Keyword(s):

Machine Learning ◽

Spectral Clustering ◽

Dimensional Space ◽

Clustering Algorithms ◽

Critical Problem ◽

Embedding Method ◽

Superior Result ◽

Multiple Clusterings ◽

Low Dimensional

Clustering combination has recently become a hotspot in machine learning, while its critical problem lies on how to combine multiple clusterings to yield a final superior result. In this paper, a low dimensional embedding method is proposed. It first obtains the low dimensional embeddings of hyperedges by performing spectral clustering algorithms and then obtains the low dimensional embeddings of objects indirectly by composition of mappings and finally performs K-means algorithm to cluster the objects according to their coordinates in the low dimensional space. Experimentally the proposed method is shown to perform well.

Download Full-text

An Enhanced Spectral Clustering Algorithm with S-Distance

Symmetry ◽

10.3390/sym13040596 ◽

2021 ◽

Vol 13 (4) ◽

pp. 596

Author(s):

Krishna Kumar Sharma ◽

Ayan Seal ◽

Enrique Herrera-Viedma ◽

Ondrej Krejcar

Keyword(s):

Spectral Clustering ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Clustering Algorithms ◽

Rank Test ◽

Customer Churn ◽

Signed Rank ◽

Signed Rank Test ◽

Spectral Clustering Algorithm ◽

Industrial Databases

Calculating and monitoring customer churn metrics is important for companies to retain customers and earn more profit in business. In this study, a churn prediction framework is developed by modified spectral clustering (SC). However, the similarity measure plays an imperative role in clustering for predicting churn with better accuracy by analyzing industrial data. The linear Euclidean distance in the traditional SC is replaced by the non-linear S-distance (Sd). The Sd is deduced from the concept of S-divergence (SD). Several characteristics of Sd are discussed in this work. Assays are conducted to endorse the proposed clustering algorithm on four synthetics, eight UCI, two industrial databases and one telecommunications database related to customer churn. Three existing clustering algorithms—k-means, density-based spatial clustering of applications with noise and conventional SC—are also implemented on the above-mentioned 15 databases. The empirical outcomes show that the proposed clustering algorithm beats three existing clustering algorithms in terms of its Jaccard index, f-score, recall, precision and accuracy. Finally, we also test the significance of the clustering results by the Wilcoxon’s signed-rank test, Wilcoxon’s rank-sum test, and sign tests. The relative study shows that the outcomes of the proposed algorithm are interesting, especially in the case of clusters of arbitrary shape.

Download Full-text

A data-driven approach for fault diagnosis in HVAC chiller systems

2015 IEEE Conference on Control Applications (CCA) ◽

10.1109/cca.2015.7320737 ◽

2015 ◽

Cited By ~ 4

Author(s):

Alessandro Beghi ◽

Riccardo Brignoli ◽

Luca Cecchinato ◽

Gabriele Menegazzo ◽

Mirco Rampazzo

Keyword(s):

Fault Diagnosis ◽

Data Driven ◽

Data Driven Approach

Download Full-text

Research on Spectral Clustering

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.687-691.1350 ◽

2014 ◽

Vol 687-691 ◽

pp. 1350-1353

Author(s):

Li Li Fu ◽

Yong Li Liu ◽

Li Jing Hao

Keyword(s):

Spectral Clustering ◽

Clustering Algorithm ◽

Theoretical Foundation ◽

Clustering Algorithms ◽

Spectral Graph Theory ◽

Graph Partition ◽

Mining Areas ◽

Spectral Graph ◽

Definition Of ◽

Spectral Clustering Algorithm

Spectral clustering algorithm is a kind of clustering algorithm based on spectral graph theory. As spectral clustering has deep theoretical foundation as well as the advantage in dealing with non-convex distribution, it has received much attention in machine learning and data mining areas. The algorithm is easy to implement, and outperforms traditional clustering algorithms such as K-means algorithm. This paper aims to give some intuitions on spectral clustering. We describe different graph partition criteria, the definition of spectral clustering, and clustering steps, etc. Finally, in order to solve the disadvantage of spectral clustering, some improvements are introduced briefly.

Download Full-text

T-SNE visualization of large-scale neural recordings

10.1101/087395 ◽

2016 ◽

Cited By ~ 5

Author(s):

George Dimitriadis ◽

Joana Neto ◽

Adam R. Kampff

Keyword(s):

Large Scale ◽

New Technologies ◽

Dimensional Space ◽

Clustering Algorithms ◽

Brain Regions ◽

Data Sets ◽

Neural Recordings ◽

Sorting Problem ◽

Feature Spaces ◽

High Degree

AbstractElectrophysiology is entering the era of ‘Big Data’. Multiple probes, each with hundreds to thousands of individual electrodes, are now capable of simultaneously recording from many brain regions. The major challenge confronting these new technologies is transforming the raw data into physiologically meaningful signals, i.e. single unit spikes. Sorting the spike events of individual neurons from a spatiotemporally dense sampling of the extracellular electric field is a problem that has attracted much attention [22, 23], but is still far from solved. Current methods still rely on human input and thus become unfeasible as the size of the data sets grow exponentially.Here we introduce the t-student stochastic neighbor embedding (t-sne) dimensionality reduction method [27] as a visualization tool in the spike sorting process. T-sne embeds the n-dimensional extracellular spikes (n = number of features by which each spike is decomposed) into a low (usually two) dimensional space. We show that such embeddings, even starting from different feature spaces, form obvious clusters of spikes that can be easily visualized and manually delineated with a high degree of precision. We propose that these clusters represent single units and test this assertion by applying our algorithm on labeled data sets both from hybrid [23] and paired juxtacellular/extracellular recordings [15]. We have released a graphical user interface (gui) written in python as a tool for the manual clustering of the t-sne embedded spikes and as a tool for an informed overview and fast manual curration of results from other clustering algorithms. Furthermore, the generated visualizations offer evidence in favor of the use of probes with higher density and smaller electrodes. They also graphically demonstrate the diverse nature of the sorting problem when spikes are recorded with different methods and arise from regions with different background spiking statistics.

Download Full-text

Subspace Clustering of High Dimensional Data Using Differential Evolution

Nature-Inspired Algorithms for Big Data Frameworks - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-5852-1.ch003 ◽

2019 ◽

pp. 47-74 ◽

Cited By ~ 1

Author(s):

Parul Agarwal ◽

Shikha Mehta

Keyword(s):

Differential Evolution ◽

Distance Measure ◽

Dimensional Space ◽

Clustering Algorithms ◽

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional ◽

Dbscan Clustering ◽

Evolution Algorithms ◽

Self Adaptive

Subspace clustering approaches cluster high dimensional data in different subspaces. It means grouping the data with different relevant subsets of dimensions. This technique has become very effective as a distance measure becomes ineffective in a high dimensional space. This chapter presents a novel evolutionary approach to a bottom up subspace clustering SUBSPACE_DE which is scalable to high dimensional data. SUBSPACE_DE uses a self-adaptive DBSCAN algorithm to perform clustering in data instances of each attribute and maximal subspaces. Self-adaptive DBSCAN clustering algorithms accept input from differential evolution algorithms. The proposed SUBSPACE_DE algorithm is tested on 14 datasets, both real and synthetic. It is compared with 11 existing subspace clustering algorithms. Evaluation metrics such as F1_Measure and accuracy are used. Performance analysis of the proposed algorithms is considerably better on a success rate ratio ranking in both accuracy and F1_Measure. SUBSPACE_DE also has potential scalability on high dimensional datasets.

Download Full-text

Projected Clustering for Biological Data Analysis

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch247 ◽

2011 ◽

pp. 1617-1622

Author(s):

Ping Deng ◽

Qingkai Ma ◽

Weili Wu

Keyword(s):

Nearest Neighbor ◽

Dimensional Space ◽

Clustering Algorithms ◽

Biological Data ◽

High Dimensional ◽

Projected Clustering ◽

Cluster Data ◽

Biological Data Analysis ◽

Data Points ◽

Entire Dataset

Clustering can be considered as the most important unsupervised learning problem. It has been discussed thoroughly by both statistics and database communities due to its numerous applications in problems such as classification, machine learning, and data mining. A summary of clustering techniques can be found in (Berkhin, 2002). Most known clustering algorithms such as DBSCAN (Easter, Kriegel, Sander, & Xu, 1996) and CURE (Guha, Rastogi, & Shim, 1998) cluster data points based on full dimensions. When the dimensional space grows higher, the above algorithms lose their efficiency and accuracy because of the so-called “curse of dimensionality”. It is shown in (Beyer, Goldstein, Ramakrishnan, & Shaft, 1999) that computing the distance based on full dimensions is not meaningful in high dimensional space since the distance of a point to its nearest neighbor approaches the distance to its farthest neighbor as dimensionality increases. Actually, natural clusters might exist in subspaces. Data points in different clusters may be correlated with respect to different subsets of dimensions. In order to solve this problem, feature selection (Kohavi & Sommerfield, 1995) and dimension reduction (Raymer, Punch, Goodman, Kuhn, & Jain, 2000) have been proposed to find the closely correlated dimensions for all the data and the clusters in such dimensions. Although both methods reduce the dimensionality of the space before clustering, the case where clusters may exist in different subspaces of full dimensions is not handled well. Projected clustering has been proposed recently to effectively deal with high dimensionalities. Finding clusters and their relevant dimensions are the objectives of projected clustering algorithms. Instead of projecting the entire dataset on the same subspace, projected clustering focuses on finding specific projection for each cluster such that the similarity is reserved as much as possible.

Download Full-text

Knowledge-based and Data-driven Approach based Fault Diagnosis for Power-Electronics Energy Conversion System

2019 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm) ◽

10.1109/smartgridcomm.2019.8909719 ◽

2019 ◽

Cited By ~ 1

Author(s):

Chuang Liu ◽

Lei Kou ◽

Guo-wei Cai ◽

Jia-ning Zhou ◽

Yi-qun Meng ◽

...

Keyword(s):

Fault Diagnosis ◽

Energy Conversion ◽

Power Electronics ◽

Data Driven ◽

Knowledge Based ◽

Conversion System ◽

Data Driven Approach ◽

Energy Conversion System

Download Full-text

Research and development of spectral clustering algorithms

International Journal of Collaborative Intelligence ◽

10.1504/ijci.2016.084114 ◽

2016 ◽

Vol 1 (4) ◽

pp. 275

Author(s):

Ling Ding

Keyword(s):

Research And Development ◽

Spectral Clustering ◽

Clustering Algorithms

Download Full-text

A pareto ensemble based spectral clustering framework

Complex & Intelligent Systems ◽

10.1007/s40747-020-00215-7 ◽

2020 ◽

Author(s):

Juanjuan Luo ◽

Huadong Ma ◽

Dongqing Zhou

Keyword(s):

Phase I ◽

Phase Ii ◽

Spectral Clustering ◽

Clustering Algorithms ◽

Divide And Conquer ◽

Nonzero Entry ◽

Similarity Matrix ◽

Diversity Preservation ◽

Two Phases ◽

Matrix Construction

Abstract Similarity matrix has a significant effect on the performance of the spectral clustering, and how to determine the neighborhood in the similarity matrix effectively is one of its main difficulties. In this paper, a “divide and conquer” strategy is proposed to model the similarity matrix construction task by adopting Multiobjective evolutionary algorithm (MOEA). The whole procedure is divided into two phases, phase I aims to determine the nonzero entries of the similarity matrix, and Phase II aims to determine the value of the nonzero entries of the similarity matrix. In phase I, the main contribution is that we model the task as a biobjective dynamic optimization problem, which optimizes the diversity and the similarity at the same time. It makes each individual determine one nonzero entry for each sample, and the encoding length decreases to O(N) in contrast with the non-ensemble multiobjective spectral clustering. In addition, a specific initialization operator and diversity preservation strategy are proposed during this phase. In phase II, three ensemble strategies are designed to determine the value of the nonzero value of the similarity matrix. Furthermore, this Pareto ensemble framework is extended to semi-supervised clustering by transforming the semi-supervised information to constraints. In contrast with the previous multiobjective evolutionary-based spectral clustering algorithms, the proposed Pareto ensemble-based framework makes a balance between time cost and the clustering accuracy, which is demonstrated in the experiments section.

Download Full-text