EVALUATE DISSIMILARITY OF SAMPLES IN FEATURE SPACE FOR IMPROVING KPCA

Author(s):  
XU YONG ◽  
DAVID ZHANG ◽  
JIAN YANG ◽  
JIN ZHONG ◽  
JINGYU YANG

Since in the feature space the eigenvector is a linear combination of all the samples from the training sample set, the computational efficiency of KPCA-based feature extraction falls as the training sample set grows. In this paper, we propose a novel KPCA-based feature extraction method that assumes that an eigenvector can be expressed approximately as a linear combination of a subset of the training sample set ("nodes"). The new method selects maximally dissimilar samples as nodes. This allows the eigenvector to contain the maximum amount of information of the training sample set. By using the distance metric of training samples in the feature space to evaluate their dissimilarity, we devised a very simple and quite efficient algorithm to identify the nodes and to produce the sparse KPCA. The experimental result shows that the proposed method also obtains a high classification accuracy.

2013 ◽  
Vol 347-350 ◽  
pp. 2241-2245
Author(s):  
Xiao Yuan Jing ◽  
Xiang Long Ge ◽  
Yong Fang Yao ◽  
Feng Nan Yu

When the number of labeled training samples is very small, the sample information people can use would be very little and the recognition rates of traditional image recognition methods are not satisfactory. However, there is often some related information contained in other databases that is helpful to feature extraction. Thus, it is considered to take full advantage of the data information in other databases by transfer learning. In this paper, the idea of transferring the samples is employed and further we propose a feature extraction approach based on sample set reconstruction. We realize the approach by reconstructing the training sample set using the difference information among the samples of other databases. Experimental results on three widely used face databases AR, FERET, CAS-PEAL are presented to demonstrate the efficacy of the proposed approach in classification performance.


2018 ◽  
Vol 11 (2) ◽  
pp. 95 ◽  
Author(s):  
Fransisca J Pontoh ◽  
Jayanti Yusmah Sari ◽  
Amil A Ilham ◽  
Ingrid Nurtanio

Nowadays, dorsal hand vein recognition is one of the most recent multispectral biometrics technologies used for the person identification/authentication. Looking into another biometrics system, dorsal hand vein biometrics system has been popular because of the privilege: false duplicity, hygienic, static, and convenient. The most challenging phase in a biometric system is feature extraction phase. In this research, feature extraction method called Local Line Binary Pattern (LLBP) has been explored and implemented. We have used this method to our 300 dorsal hand vein images obtained from 50 persons using a low-cost infrared webcam. In recognition step, the adaptation fuzzy k-NN classifier is to evaluate the efficiency of the proposed approach is feasible and effective for dorsal hand vein recognition. The experimental result showed that LLBP method is reliable for feature extraction on dorsal hand vein recognition with a recognition accuracy up to 98%.


Computation ◽  
2019 ◽  
Vol 7 (3) ◽  
pp. 39 ◽  
Author(s):  
Laura Sani ◽  
Riccardo Pecori ◽  
Monica Mordonini ◽  
Stefano Cagnoni

The so-called Relevance Index (RI) metrics are a set of recently-introduced indicators based on information theory principles that can be used to analyze complex systems by detecting the main interacting structures within them. Such structures can be described as subsets of the variables which describe the system status that are strongly statistically correlated with one another and mostly independent of the rest of the system. The goal of the work described in this paper is to apply the same principles to pattern recognition and check whether the RI metrics can also identify, in a high-dimensional feature space, attribute subsets from which it is possible to build new features which can be effectively used for classification. Preliminary results indicating that this is possible have been obtained using the RI metrics in a supervised way, i.e., by separately applying such metrics to homogeneous datasets comprising data instances which all belong to the same class, and iterating the procedure over all possible classes taken into consideration. In this work, we checked whether this would also be possible in a totally unsupervised way, i.e., by considering all data available at the same time, independently of the class to which they belong, under the hypothesis that the peculiarities of the variable sets that the RI metrics can identify correspond to the peculiarities by which data belonging to a certain class are distinguishable from data belonging to different classes. The results we obtained in experiments made with some publicly available real-world datasets show that, especially when coupled to tree-based classifiers, the performance of an RI metrics-based unsupervised feature extraction method can be comparable to or better than other classical supervised or unsupervised feature selection or extraction methods.


2021 ◽  
Vol 13 (3) ◽  
pp. 34-46
Author(s):  
Shiqi Wu ◽  
Bo Wang ◽  
Jianxiang Zhao ◽  
Mengnan Zhao ◽  
Kun Zhong ◽  
...  

Nowadays, source camera identification, which aims to identify the source camera of images, is quite important in the field of forensics. There is a problem that cannot be ignored that the existing methods are unreliable and even out of work in the case of the small training sample. To solve this problem, a virtual sample generation-based method is proposed in this paper, combined with the ensemble learning. In this paper, after constructing sub-sets of LBP features, the authors generate a virtual sample-based on the mega-trend-diffusion (MTD) method, which calculates the diffusion range of samples according to the trend diffusion theory, and then randomly generates virtual sample according to uniform distribution within this range. In the aspect of the classifier, an ensemble learning scheme is proposed to train multiple SVM-based classifiers to improve the accuracy of image source identification. The experimental results demonstrate that the proposed method achieves higher average accuracy than the state-of-the-art, which uses a small number of samples as the training sample set.


2020 ◽  
Vol 26 (6) ◽  
pp. 734-746
Author(s):  
Mariusz Topolski

The vast majority of medical problems are characterised by the relatively high spatial dimensionality of the task, which becomes problematic for many classic pattern recognition algorithms due to the well-known phenomenon of the curse of dimensionality. This creates the need to develop methods of space reduction, divided into strategies for the selection and extraction of features. The most commonly used tool of the second group is the PCA, which, unlike selection methods, does not select a subset of the original set of features and performs its mathematical transformation into a less dimensional form. However, natural downside of this algorithm is the fact that class context is not present in supervised learning tasks. This work proposes a feature extraction algorithm using the approach of the pca method, trying not only to reduce the feature space, but also trying to separate the class distributions in the available learning set. The problematic issue of the work was the creation of a method of feature extraction describing the prognosis for a chronic lymphocytic leukemia type B-CLL, which will be at least as good, or even better than when compared to other quality extractions. The purpose of the research was accomplished for binary and three-class cases in the event in which for verification of extraction quality, five algorithms of machine learning were applied. The obtained results were compared with the application of paired samples Wilcoxon test.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Rong Wang ◽  
Cong Tian ◽  
Lin Yan

The Internet of Things (IoT), cloud, and fog computing paradigms provide a powerful large-scale computing infrastructure for a variety of data and computation-intensive applications. These cutting-edge computing infrastructures, however, are nevertheless vulnerable to serious security and privacy risks. One of the most important countermeasures against cybersecurity threats is intrusion detection and prevention systems, which monitor devices, networks, and systems for malicious activity and policy violations. The detection and prevention systems range from antivirus software to hierarchical systems that monitor the traffic of whole backbone networks. At the moment, the primary defensive solutions are based on malware feature extraction. Most known feature extraction algorithms use byte N-gram patterns or binary strings to represent log files or other static information. The information taken from program files is expressed using word embedding (GloVe) and a new feature extraction method proposed in this article. As a result, the relevant vector space model (VSM) will incorporate more information about unknown programs. We utilize convolutional neural network (CNN) to analyze the feature maps represented by word embedding and apply Softmax to fit the probability of a malicious program. Eventually, we consider a program to be malicious if the probability is greater than 0.5; otherwise, it is a benign program. Experimental result shows that our approach achieves a level of accuracy higher than 98%.


Sign in / Sign up

Export Citation Format

Share Document