scholarly journals Fault Diagnosis by Multisensor Data: A Data-Driven Approach Based on Spectral Clustering and Pairwise Constraints

Sensors ◽  
2020 ◽  
Vol 20 (24) ◽  
pp. 7065
Author(s):  
Massimo Pacella ◽  
Gabriele Papadia

This paper deals with clustering based on feature selection of multisensor data in high-dimensional space. Spectral clustering algorithms are efficient tools in signal processing for grouping datasets sampled by multisensor systems for fault diagnosis. The effectiveness of spectral clustering stems from constructing an embedding space based on an affinity matrix. This matrix shows the pairwise similarity of the data points. Clustering is then obtained by determining the spectral decomposition of the Laplacian graph. In the manufacturing field, clustering is an essential strategy for fault diagnosis. In this study, an enhanced spectral clustering approach is presented, which is augmented with pairwise constraints, and that results in efficient identification of fault scenarios. The effectiveness of the proposed approach is described using a real case study about a diesel injection control system for fault detection.

2011 ◽  
Vol 201-203 ◽  
pp. 2517-2520
Author(s):  
Sen Xu ◽  
Tian Zhou ◽  
Hua Long Yu

Clustering combination has recently become a hotspot in machine learning, while its critical problem lies on how to combine multiple clusterings to yield a final superior result. In this paper, a low dimensional embedding method is proposed. It first obtains the low dimensional embeddings of hyperedges by performing spectral clustering algorithms and then obtains the low dimensional embeddings of objects indirectly by composition of mappings and finally performs K-means algorithm to cluster the objects according to their coordinates in the low dimensional space. Experimentally the proposed method is shown to perform well.


Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 596
Author(s):  
Krishna Kumar Sharma ◽  
Ayan Seal ◽  
Enrique Herrera-Viedma ◽  
Ondrej Krejcar

Calculating and monitoring customer churn metrics is important for companies to retain customers and earn more profit in business. In this study, a churn prediction framework is developed by modified spectral clustering (SC). However, the similarity measure plays an imperative role in clustering for predicting churn with better accuracy by analyzing industrial data. The linear Euclidean distance in the traditional SC is replaced by the non-linear S-distance (Sd). The Sd is deduced from the concept of S-divergence (SD). Several characteristics of Sd are discussed in this work. Assays are conducted to endorse the proposed clustering algorithm on four synthetics, eight UCI, two industrial databases and one telecommunications database related to customer churn. Three existing clustering algorithms—k-means, density-based spatial clustering of applications with noise and conventional SC—are also implemented on the above-mentioned 15 databases. The empirical outcomes show that the proposed clustering algorithm beats three existing clustering algorithms in terms of its Jaccard index, f-score, recall, precision and accuracy. Finally, we also test the significance of the clustering results by the Wilcoxon’s signed-rank test, Wilcoxon’s rank-sum test, and sign tests. The relative study shows that the outcomes of the proposed algorithm are interesting, especially in the case of clusters of arbitrary shape.


Author(s):  
Alessandro Beghi ◽  
Riccardo Brignoli ◽  
Luca Cecchinato ◽  
Gabriele Menegazzo ◽  
Mirco Rampazzo

2014 ◽  
Vol 687-691 ◽  
pp. 1350-1353
Author(s):  
Li Li Fu ◽  
Yong Li Liu ◽  
Li Jing Hao

Spectral clustering algorithm is a kind of clustering algorithm based on spectral graph theory. As spectral clustering has deep theoretical foundation as well as the advantage in dealing with non-convex distribution, it has received much attention in machine learning and data mining areas. The algorithm is easy to implement, and outperforms traditional clustering algorithms such as K-means algorithm. This paper aims to give some intuitions on spectral clustering. We describe different graph partition criteria, the definition of spectral clustering, and clustering steps, etc. Finally, in order to solve the disadvantage of spectral clustering, some improvements are introduced briefly.


2016 ◽  
Author(s):  
George Dimitriadis ◽  
Joana Neto ◽  
Adam R. Kampff

AbstractElectrophysiology is entering the era of ‘Big Data’. Multiple probes, each with hundreds to thousands of individual electrodes, are now capable of simultaneously recording from many brain regions. The major challenge confronting these new technologies is transforming the raw data into physiologically meaningful signals, i.e. single unit spikes. Sorting the spike events of individual neurons from a spatiotemporally dense sampling of the extracellular electric field is a problem that has attracted much attention [22, 23], but is still far from solved. Current methods still rely on human input and thus become unfeasible as the size of the data sets grow exponentially.Here we introduce the t-student stochastic neighbor embedding (t-sne) dimensionality reduction method [27] as a visualization tool in the spike sorting process. T-sne embeds the n-dimensional extracellular spikes (n = number of features by which each spike is decomposed) into a low (usually two) dimensional space. We show that such embeddings, even starting from different feature spaces, form obvious clusters of spikes that can be easily visualized and manually delineated with a high degree of precision. We propose that these clusters represent single units and test this assertion by applying our algorithm on labeled data sets both from hybrid [23] and paired juxtacellular/extracellular recordings [15]. We have released a graphical user interface (gui) written in python as a tool for the manual clustering of the t-sne embedded spikes and as a tool for an informed overview and fast manual curration of results from other clustering algorithms. Furthermore, the generated visualizations offer evidence in favor of the use of probes with higher density and smaller electrodes. They also graphically demonstrate the diverse nature of the sorting problem when spikes are recorded with different methods and arise from regions with different background spiking statistics.


Author(s):  
Parul Agarwal ◽  
Shikha Mehta

Subspace clustering approaches cluster high dimensional data in different subspaces. It means grouping the data with different relevant subsets of dimensions. This technique has become very effective as a distance measure becomes ineffective in a high dimensional space. This chapter presents a novel evolutionary approach to a bottom up subspace clustering SUBSPACE_DE which is scalable to high dimensional data. SUBSPACE_DE uses a self-adaptive DBSCAN algorithm to perform clustering in data instances of each attribute and maximal subspaces. Self-adaptive DBSCAN clustering algorithms accept input from differential evolution algorithms. The proposed SUBSPACE_DE algorithm is tested on 14 datasets, both real and synthetic. It is compared with 11 existing subspace clustering algorithms. Evaluation metrics such as F1_Measure and accuracy are used. Performance analysis of the proposed algorithms is considerably better on a success rate ratio ranking in both accuracy and F1_Measure. SUBSPACE_DE also has potential scalability on high dimensional datasets.


Author(s):  
Ping Deng ◽  
Qingkai Ma ◽  
Weili Wu

Clustering can be considered as the most important unsupervised learning problem. It has been discussed thoroughly by both statistics and database communities due to its numerous applications in problems such as classification, machine learning, and data mining. A summary of clustering techniques can be found in (Berkhin, 2002). Most known clustering algorithms such as DBSCAN (Easter, Kriegel, Sander, & Xu, 1996) and CURE (Guha, Rastogi, & Shim, 1998) cluster data points based on full dimensions. When the dimensional space grows higher, the above algorithms lose their efficiency and accuracy because of the so-called “curse of dimensionality”. It is shown in (Beyer, Goldstein, Ramakrishnan, & Shaft, 1999) that computing the distance based on full dimensions is not meaningful in high dimensional space since the distance of a point to its nearest neighbor approaches the distance to its farthest neighbor as dimensionality increases. Actually, natural clusters might exist in subspaces. Data points in different clusters may be correlated with respect to different subsets of dimensions. In order to solve this problem, feature selection (Kohavi & Sommerfield, 1995) and dimension reduction (Raymer, Punch, Goodman, Kuhn, & Jain, 2000) have been proposed to find the closely correlated dimensions for all the data and the clusters in such dimensions. Although both methods reduce the dimensionality of the space before clustering, the case where clusters may exist in different subspaces of full dimensions is not handled well. Projected clustering has been proposed recently to effectively deal with high dimensionalities. Finding clusters and their relevant dimensions are the objectives of projected clustering algorithms. Instead of projecting the entire dataset on the same subspace, projected clustering focuses on finding specific projection for each cluster such that the similarity is reserved as much as possible.


Author(s):  
Juanjuan Luo ◽  
Huadong Ma ◽  
Dongqing Zhou

Abstract Similarity matrix has a significant effect on the performance of the spectral clustering, and how to determine the neighborhood in the similarity matrix effectively is one of its main difficulties. In this paper, a “divide and conquer” strategy is proposed to model the similarity matrix construction task by adopting Multiobjective evolutionary algorithm (MOEA). The whole procedure is divided into two phases, phase I aims to determine the nonzero entries of the similarity matrix, and Phase II aims to determine the value of the nonzero entries of the similarity matrix. In phase I, the main contribution is that we model the task as a biobjective dynamic optimization problem, which optimizes the diversity and the similarity at the same time. It makes each individual determine one nonzero entry for each sample, and the encoding length decreases to O(N) in contrast with the non-ensemble multiobjective spectral clustering. In addition, a specific initialization operator and diversity preservation strategy are proposed during this phase. In phase II, three ensemble strategies are designed to determine the value of the nonzero value of the similarity matrix. Furthermore, this Pareto ensemble framework is extended to semi-supervised clustering by transforming the semi-supervised information to constraints. In contrast with the previous multiobjective evolutionary-based spectral clustering algorithms, the proposed Pareto ensemble-based framework makes a balance between time cost and the clustering accuracy, which is demonstrated in the experiments section.


Sign in / Sign up

Export Citation Format

Share Document