Dimensionality Reduction Method of Training Sample Set for SVDD Based on Statistical Information

2012 ◽  
Vol 220-223 ◽  
pp. 2097-2101 ◽  
Author(s):  
Mian Wu ◽  
Bin Chen ◽  
Bao Cheng Gao ◽  
Xiao Bin Cheng ◽  
Zhao Li Yan

In order to solve problem of high time complexity of support vector data description (SVDD) in training process, a low complexity SVDD algorithm is proposed by introducing statistical information grid (STING).The algorithm applies STING division to sample set space using support vector distribution characteristics and kernel distances between samples and sphere’s centre. Based on position information of sample points, it rejects non-support vectors and obtained simplified sample set. The results show that proposed method can reduce training scale and time without decreasing classification accuracy.

2015 ◽  
Vol 2015 ◽  
pp. 1-10
Author(s):  
Hui Yi ◽  
Zehui Mao ◽  
Bin Jiang ◽  
Cuimei Bo ◽  
Yufang Liu ◽  
...  

Faulty samples are much harder to acquire than normal samples, especially in complicated systems. This leads to incompleteness for training sample types and furthermore a decrease of diagnostic accuracy. In this paper, the relationship between sample-type incompleteness and the classifier-based diagnostic accuracy is discussed first. Then, a support vector data description-based approach, which has taken the effects of sample-type incompleteness into consideration, is proposed to refine the construction of fault regions and increase the diagnostic accuracy for the condition of incomplete sample types. The effectiveness of the proposed method was validated on both a Gaussian distributed dataset and a practical dataset. Satisfactory results have been obtained.


2014 ◽  
Vol 716-717 ◽  
pp. 860-863
Author(s):  
Xiao Yu Zhang ◽  
Zhen Wei Wei ◽  
Xiao Lin

Focusing on the problem about the higher dimensionality of sample set in the intrusion detection, propose an optimized method of support vector data description (SVDD) based on particle swarm optimization (PSO) and apply it to the intrusion detection of network exception. This method adopts PSO to eliminate the superfluous parameters in SVDD and carries out dimension reduction to data; then, establish the super sphere model to detect the network intrusion data and output the results of intrusion detection. Carry out the simulation experiment based on the standard detection data set of KDD CUP' 99, and the result shows that this method, comparing with the traditional SVDD, can effectively improve the detection ratio with a smaller amount of calculation.


2020 ◽  
Vol 64 (1-4) ◽  
pp. 137-145
Author(s):  
Yubin Xia ◽  
Dakai Liang ◽  
Guo Zheng ◽  
Jingling Wang ◽  
Jie Zeng

Aiming at the irregularity of the fault characteristics of the helicopter main reducer planetary gear, a fault diagnosis method based on support vector data description (SVDD) is proposed. The working condition of the helicopter is complex and changeable, and the fault characteristics of the planetary gear also show irregularity with the change of working conditions. It is impossible to diagnose the fault by the regularity of a single fault feature; so a method of SVDD based on Gaussian kernel function is used. By connecting the energy characteristics and fault characteristics of the helicopter main reducer running state signal and performing vector quantization, the planetary gear of the helicopter main reducer is characterized, and simultaneously couple the multi-channel information, which can accurately characterize the operational state of the planetary gear’s state.


2020 ◽  
Vol 15 ◽  
Author(s):  
Yi Zou ◽  
Hongjie Wu ◽  
Xiaoyi Guo ◽  
Li Peng ◽  
Yijie Ding ◽  
...  

Background: Detecting DNA-binding proetins (DBPs) based on biological and chemical methods is time consuming and expensive. Objective: In recent years, the rise of computational biology methods based on Machine Learning (ML) has greatly improved the detection efficiency of DBPs. Method: In this study, Multiple Kernel-based Fuzzy SVM Model with Support Vector Data Description (MK-FSVM-SVDD) is proposed to predict DBPs. Firstly, sex features are extracted from protein sequence. Secondly, multiple kernels are constructed via these sequence feature. Than, multiple kernels are integrated by Centered Kernel Alignment-based Multiple Kernel Learning (CKA-MKL). Next, fuzzy membership scores of training samples are calculated with Support Vector Data Description (SVDD). FSVM is trained and employed to detect new DBPs. Results: Our model is test on several benchmark datasets. Compared with other methods, MK-FSVM-SVDD achieves best Matthew's Correlation Coefficient (MCC) on PDB186 (0.7250) and PDB2272 (0.5476). Conclusion: We can conclude that MK-FSVM-SVDD is more suitable than common SVM, as the classifier for DNA-binding proteins identification.


2021 ◽  
Author(s):  
JianXi Yang ◽  
Fei Yang ◽  
Likai Zhang ◽  
Ren Li ◽  
Shixin Jiang ◽  
...  

2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Itziar Irigoien ◽  
Basilio Sierra ◽  
Concepción Arenas

In the problem of one-class classification (OCC) one of the classes, the target class, has to be distinguished from all other possible objects, considered as nontargets. In many biomedical problems this situation arises, for example, in diagnosis, image based tumor recognition or analysis of electrocardiogram data. In this paper an approach to OCC based on a typicality test is experimentally compared with reference state-of-the-art OCC techniques—Gaussian, mixture of Gaussians, naive Parzen, Parzen, and support vector data description—using biomedical data sets. We evaluate the ability of the procedures using twelve experimental data sets with not necessarily continuous data. As there are few benchmark data sets for one-class classification, all data sets considered in the evaluation have multiple classes. Each class in turn is considered as the target class and the units in the other classes are considered as new units to be classified. The results of the comparison show the good performance of the typicality approach, which is available for high dimensional data; it is worth mentioning that it can be used for any kind of data (continuous, discrete, or nominal), whereas state-of-the-art approaches application is not straightforward when nominal variables are present.


Sign in / Sign up

Export Citation Format

Share Document