SVDD-Based Pattern Denoising

The support vector data description (SVDD) is one of the best-known one-class support vector learning methods, in which one tries the strategy of using balls defined on the feature space in order to distinguish a set of normal data from all other possible abnormal objects. The major concern of this letter is to extend the main idea of SVDD to pattern denoising. Combining the geodesic projection to the spherical decision boundary resulting from the SVDD, together with solving the preimage problem, we propose a new method for pattern denoising. We first solve SVDD for the training data and then for each noisy test pattern, obtain its denoised feature by moving its feature vector along the geodesic on the manifold to the nearest decision boundary of the SVDD ball. Finally we find the location of the denoised pattern by obtaining the pre-image of the denoised feature. The applicability of the proposed method is illustrated by a number of toy and real-world data sets.

Download Full-text

Towards Application of One-Class Classification Methods to Medical Data

The Scientific World JOURNAL ◽

10.1155/2014/730712 ◽

2014 ◽

Vol 2014 ◽

pp. 1-7 ◽

Cited By ~ 10

Author(s):

Itziar Irigoien ◽

Basilio Sierra ◽

Concepción Arenas

Keyword(s):

State Of The Art ◽

Gaussian Mixture ◽

Support Vector ◽

Support Vector Data Description ◽

Data Sets ◽

Biomedical Data ◽

Vector Data ◽

Target Class ◽

Tumor Recognition ◽

One Class Classification

In the problem of one-class classification (OCC) one of the classes, the target class, has to be distinguished from all other possible objects, considered as nontargets. In many biomedical problems this situation arises, for example, in diagnosis, image based tumor recognition or analysis of electrocardiogram data. In this paper an approach to OCC based on a typicality test is experimentally compared with reference state-of-the-art OCC techniques—Gaussian, mixture of Gaussians, naive Parzen, Parzen, and support vector data description—using biomedical data sets. We evaluate the ability of the procedures using twelve experimental data sets with not necessarily continuous data. As there are few benchmark data sets for one-class classification, all data sets considered in the evaluation have multiple classes. Each class in turn is considered as the target class and the units in the other classes are considered as new units to be classified. The results of the comparison show the good performance of the typicality approach, which is available for high dimensional data; it is worth mentioning that it can be used for any kind of data (continuous, discrete, or nominal), whereas state-of-the-art approaches application is not straightforward when nominal variables are present.

Download Full-text

A STUDY ON COMBINING IMAGE REPRESENTATIONS FOR IMAGE CLASSIFICATION AND RETRIEVAL

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001404003459 ◽

2004 ◽

Vol 18 (05) ◽

pp. 867-890 ◽

Cited By ~ 27

Author(s):

CARMEN LAI ◽

DAVID M. J. TAX ◽

ROBERT P. W. DUIN ◽

ELŻBIETA PĘKALSKA ◽

PAVEL PACLÍK

Keyword(s):

Image Classification ◽

Feature Space ◽

Support Vector ◽

Support Vector Data Description ◽

Vector Data ◽

Retrieval Performance ◽

Mahalanobis Distances ◽

Similar Images ◽

Image Representations ◽

Cloud Of Points

A flexible description of images is offered by a cloud of points in a feature space. In the context of image retrieval such clouds can be represented in a number of ways. Two approaches are here considered. The first approach is based on the assumption of a normal distribution, hence homogeneous clouds, while the second one focuses on the boundary description, which is more suitable for multimodal clouds. The images are then compared either by using the Mahalanobis distance or by the support vector data description (SVDD), respectively. The paper investigates some possibilities of combining the image clouds based on the idea that responses of several cloud descriptions may convey a pattern, specific for semantically similar images. A ranking of image dissimilarities is used as a comparison for two image databases targeting image classification and retrieval problems. We show that combining of the SVDD descriptions improves the retrieval performance with respect to ranking, on the contrary to the Mahalanobis case. Surprisingly, it turns out that the ranking of the Mahalanobis distances works well also for inhomogeneous images.

Download Full-text

THE USE OF MACHINE LEARNING METHODS FOR BINARY CLASSIFICATION OF THE WORKING CONDITION OF BEARINGS USING THE SIGNALS OF VIBRATION ACCELERATION

Bulletin of National Technical University KhPI Series System Analysis Control and Information Technologies ◽

10.20998/2079-0023.2021.02.03 ◽

2021 ◽

pp. 15-22

Author(s):

Ruslan Babudzhan ◽

Konstantyn Isaienkov ◽

Danilo Krasiy ◽

Oleksii Vodka ◽

Ivan Zadorozhny ◽

...

Keyword(s):

Machine Learning ◽

Binary Classification ◽

Fractal Dimensions ◽

Feature Space ◽

Training Data ◽

Supervised Machine Learning ◽

Support Vector ◽

Data Sets ◽

Vibration Acceleration ◽

K Nearest Neighbors

The paper investigates the relationship between vibration acceleration of bearings with their operational state. To determine these dependencies, a testbench was built and 112 experiments were carried out with different bearings: 100 bearings that developed an internal defect during operation and 12bearings without a defect. From the obtained records, a dataset was formed, which was used to build classifiers. Dataset is freely available. A methodfor classifying new and used bearings was proposed, which consists in searching for dependencies and regularities of the signal using descriptive functions: statistical, entropy, fractal dimensions and others. In addition to processing the signal itself, the frequency domain of the bearing operationsignal was also used to complement the feature space. The paper considered the possibility of generalizing the classification for its application on thosesignals that were not obtained in the course of laboratory experiments. An extraneous dataset was found in the public domain. This dataset was used todetermine how accurate a classifier was when it was trained and tested on significantly different signals. Training and validation were carried out usingthe bootstrapping method to eradicate the effect of randomness, given the small amount of training data available. To estimate the quality of theclassifiers, the F1-measure was used as the main metric due to the imbalance of the data sets. The following supervised machine learning methodswere chosen as classifier models: logistic regression, support vector machine, random forest, and K nearest neighbors. The results are presented in theform of plots of density distribution and diagrams.

Download Full-text

A pruned support vector data description-based outlier detection method: Applied to robust process monitoring

Transactions of the Institute of Measurement and Control ◽

10.1177/0142331220905951 ◽

2020 ◽

Vol 42 (11) ◽

pp. 2113-2126 ◽

Cited By ~ 2

Author(s):

Ping Yuan ◽

Zhizhong Mao ◽

Biao Wang

Keyword(s):

Process Monitoring ◽

Support Vector ◽

Support Vector Data Description ◽

Data Sets ◽

Vector Data ◽

Training Set ◽

Data Set ◽

Data Description ◽

One Class Classifier ◽

Comparative Results

Support vector data description (SVDD) is a boundary-based one-class classifier that has been widely used for process monitoring during recent years. However, in some applications where databases are often contaminated by outliers, the performance of SVDD would become deteriorated, leading to low detection rate. To this end, this paper proposes a pruned SVDD model in order to improve its robustness. In contrast to other robust SVDD models that are developed from the algorithmic level, we prune the basic SVDD from a data level. The rationale is to exclude outlier examples from the final training set as many as possible. Specifically, three different SVDD models are constructed successively with different training sets. The first model is used to extract target points by means of rejecting more suspect outlier examples. The second model is constructed using those extracted target points, and is used to recover some false outlier examples labeled by the first model. We build the third (final) model with the final training set consisting of target examples by the first model and false outlier examples by the second model. We validate our proposed method on 20 benchmark data sets and TE data set. Comparative results show that our pruned model could improve the robustness of SVDD more efficiently.

Download Full-text

KERNEL WHITENING FOR ONE-CLASS CLASSIFICATION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800140300240x ◽

2003 ◽

Vol 17 (03) ◽

pp. 333-347 ◽

Cited By ~ 37

Author(s):

DAVID M. J. TAX ◽

PIOTR JUSZCZAK

Keyword(s):

Feature Space ◽

Support Vector ◽

Support Vector Data Description ◽

Good Representation ◽

Vector Data ◽

Data Description ◽

Unit Variance ◽

Good Distinction ◽

One Class Classification ◽

Target Data

In one-class classification one tries to describe a class of target data and to distinguish it from all other possible outlier objects. Obvious applications are areas where outliers are very diverse or very difficult or expensive to measure, such as in machine diagnostics or in medical applications. In order to have a good distinction between the target objects and the outliers, good representation of the data is essential. The performance of many one-class classifiers critically depends on the scaling of the data and is often harmed by data distributions in (nonlinear) subspaces. This paper presents a simple preprocessing method which actively tries to map the data to a spherical symmetric cluster and is almost insensitive to data distributed in subspaces. It uses techniques from Kernel PCA to rescale the data in a kernel feature space to unit variance. This transformed data can now be described very well by the Support Vector Data Description, which basically fits a hypersphere around the data. The paper presents the methods and some preliminary experimental results.

Download Full-text

Helicopter main reduction planetary gear fault diagnosis method based on SVDD

International Journal of Applied Electromagnetics and Mechanics ◽

10.3233/jae-209316 ◽

2020 ◽

Vol 64 (1-4) ◽

pp. 137-145

Author(s):

Yubin Xia ◽

Dakai Liang ◽

Guo Zheng ◽

Jingling Wang ◽

Jie Zeng

Keyword(s):

Fault Diagnosis ◽

Planetary Gear ◽

Gaussian Kernel ◽

Support Vector ◽

Support Vector Data Description ◽

Vector Data ◽

Energy Characteristics ◽

Gear Fault ◽

Channel Information ◽

Diagnosis Method

Aiming at the irregularity of the fault characteristics of the helicopter main reducer planetary gear, a fault diagnosis method based on support vector data description (SVDD) is proposed. The working condition of the helicopter is complex and changeable, and the fault characteristics of the planetary gear also show irregularity with the change of working conditions. It is impossible to diagnose the fault by the regularity of a single fault feature; so a method of SVDD based on Gaussian kernel function is used. By connecting the energy characteristics and fault characteristics of the helicopter main reducer running state signal and performing vector quantization, the planetary gear of the helicopter main reducer is characterized, and simultaneously couple the multi-channel information, which can accurately characterize the operational state of the planetary gear’s state.

Download Full-text

MK-FSVM-SVDD: A Multiple Kernel-based Fuzzy SVM Model for Predicting DNA-binding Proteins via Support Vector Data Description

Current Bioinformatics ◽

10.2174/1574893615999200607173829 ◽

2020 ◽

Vol 15 ◽

Author(s):

Yi Zou ◽

Hongjie Wu ◽

Xiaoyi Guo ◽

Li Peng ◽

Yijie Ding ◽

...

Keyword(s):

Dna Binding ◽

Binding Proteins ◽

Detection Efficiency ◽

Dna Binding Proteins ◽

Support Vector ◽

Support Vector Data Description ◽

Vector Data ◽

Data Description ◽

Multiple Kernel ◽

Svm Model

Background: Detecting DNA-binding proetins (DBPs) based on biological and chemical methods is time consuming and expensive. Objective: In recent years, the rise of computational biology methods based on Machine Learning (ML) has greatly improved the detection efficiency of DBPs. Method: In this study, Multiple Kernel-based Fuzzy SVM Model with Support Vector Data Description (MK-FSVM-SVDD) is proposed to predict DBPs. Firstly, sex features are extracted from protein sequence. Secondly, multiple kernels are constructed via these sequence feature. Than, multiple kernels are integrated by Centered Kernel Alignment-based Multiple Kernel Learning (CKA-MKL). Next, fuzzy membership scores of training samples are calculated with Support Vector Data Description (SVDD). FSVM is trained and employed to detect new DBPs. Results: Our model is test on several benchmark datasets. Compared with other methods, MK-FSVM-SVDD achieves best Matthew's Correlation Coefficient (MCC) on PDB186 (0.7250) and PDB2272 (0.5476). Conclusion: We can conclude that MK-FSVM-SVDD is more suitable than common SVM, as the classifier for DNA-binding proteins identification.

Download Full-text

Sparsity‐aware support vector data description reinforced by expectation maximization

Expert Systems ◽

10.1111/exsy.12794 ◽

2021 ◽

Author(s):

Mahdie Eghdami ◽

Hadi Sadoghi Yazdi ◽

Neshat Salehi

Keyword(s):

Expectation Maximization ◽

Support Vector ◽

Support Vector Data Description ◽

Vector Data ◽

Data Description

Download Full-text

A novel key performance indicator oriented process monitoring method based on multiple information extraction and support vector data description

The Canadian Journal of Chemical Engineering ◽

10.1002/cjce.24227 ◽

2021 ◽

Author(s):

Xueyi Zhang ◽

Liang Ma ◽

Kaixiang Peng

Keyword(s):

Information Extraction ◽

Process Monitoring ◽

Performance Indicator ◽

Support Vector ◽

Support Vector Data Description ◽

Key Performance Indicator ◽

Vector Data ◽

Monitoring Method ◽

Data Description

Download Full-text

MULFE: Multi-Label Learning via Label-Specific Feature Space Ensemble

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3451392 ◽

2021 ◽

Vol 16 (1) ◽

pp. 1-24

Author(s):

Yaojin Lin ◽

Qinghua Hu ◽

Jinghua Liu ◽

Xingquan Zhu ◽

Xindong Wu

Keyword(s):

Empirical Studies ◽

Feature Space ◽

Training Data ◽

Data Sets ◽

Learning Framework ◽

Feature Spaces ◽

Public Data ◽

Margin Distribution ◽

Label Correlations ◽

Label Correlation

In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed to explore label-specific features from the training data, and uses feature highly customized to the multi-label set for learning. While such feature embedding methods have demonstrated good performance, the creation of the feature embedding space is only based on a single label, without considering label correlations in the data. In this article, we propose to combine multiple label-specific feature spaces, using label correlation, for multi-label learning. The proposed algorithm, mu lti- l abel-specific f eature space e nsemble (MULFE), takes consideration label-specific features, label correlation, and weighted ensemble principle to form a learning framework. By conducting clustering analysis on each label’s negative and positive instances, MULFE first creates features customized to each label. After that, MULFE utilizes the label correlation to optimize the margin distribution of the base classifiers which are induced by the related label-specific feature spaces. By combining multiple label-specific features, label correlation based weighting, and ensemble learning, MULFE achieves maximum margin multi-label classification goal through the underlying optimization framework. Empirical studies on 10 public data sets manifest the effectiveness of MULFE.

Download Full-text