Random Projection, Margins, Kernels, and Feature-Selection

Author(s):  
Avrim Blum
2015 ◽  
Vol 781 ◽  
pp. 125-128 ◽  
Author(s):  
Yonchanok Khaokaew ◽  
Tanapat Anusas-Amornkul ◽  
Koonlachat Meesublak

In recent years, anomaly based intrusion detection techniques are continuously developed and a support vector machine (SVM) is one of the technique. However, it requires training time and storage if there are lots of numbers of features. In this paper, a hybrid feature selection, using Correlation based on Feature Selection and Motif Discovery using Random Projection techniques, is proposed to reduce the number of features from 41 to 3 features with KDD'99 dataset. It is compared with a regular SVM technique with 41 features. The results show that the accuracy rate is also high at 98% and the training time is less than the regular SVM almost by half.


2014 ◽  
Vol 2014 ◽  
pp. 1-14 ◽  
Author(s):  
Mohammad Amin Shayegan ◽  
Saeed Aghabozorgi ◽  
Ram Gopal Raj

Dimensionality reduction (feature selection) is an important step in pattern recognition systems. Although there are different conventional approaches for feature selection, such as Principal Component Analysis, Random Projection, and Linear Discriminant Analysis, selecting optimal, effective, and robust features is usually a difficult task. In this paper, a new two-stage approach for dimensionality reduction is proposed. This method is based on one-dimensional and two-dimensional spectrum diagrams of standard deviation and minimum to maximum distributions for initial feature vector elements. The proposed algorithm is validated in an OCR application, by using two big standard benchmark handwritten OCR datasets, MNIST and Hoda. In the beginning, a 133-element feature vector was selected from the most used features, proposed in the literature. Finally, the size of initial feature vector was reduced from 100% to 59.40% (79 elements) for the MNIST dataset, and to 43.61% (58 elements) for the Hoda dataset, in order. Meanwhile, the accuracies of OCR systems are enhanced 2.95% for the MNIST dataset, and 4.71% for the Hoda dataset. The achieved results show an improvement in the precision of the system in comparison to the rival approaches, Principal Component Analysis and Random Projection. The proposed technique can also be useful for generating decision rules in a pattern recognition system using rule-based classifiers.


2017 ◽  
Vol 2017 ◽  
pp. 1-12
Author(s):  
Sang-Pil Kim ◽  
Myeong-Sun Gil ◽  
Hajin Kim ◽  
Mi-Jung Choi ◽  
Yang-Sae Moon ◽  
...  

Recently, the risk of information disclosure is increasing significantly. Accordingly, privacy-preserving data mining (PPDM) is being actively studied to obtain accurate mining results while preserving the data privacy. We here focus on secure similar document detection (SSDD), which identifies similar documents of two parties when each party does not disclose its own sensitive documents to the another party. In this paper, we propose an efficient two-step protocol that exploits a feature selection as a lower-dimensional transformation, and we present discriminative feature selections to maximize the performance of the protocol. The proposed protocol consists of two steps: thefilteringstep and thepostprocessingstep. For the feature selection, we first consider the simplest one, random projection (RP), and propose its two-step solution,SSDD-RP. We then present two discriminative feature selections and their solutions:SSDD-LFwhich selects a few dimensions locally frequent in the current querying vector andSSDD-GFwhich selects ones globally frequent in the set of all document vectors. We finally propose a hybrid one,SSDD-HF, which takes advantage of bothSSDD-LFandSSDD-GF. We empirically show that the proposed two-step protocol significantly outperforms the previous one-step protocol by three or four orders of magnitude.


2019 ◽  
Vol 30 (5) ◽  
pp. 1581-1586 ◽  
Author(s):  
Qi Wang ◽  
Jia Wan ◽  
Feiping Nie ◽  
Bo Liu ◽  
Chenggang Yan ◽  
...  

Author(s):  
Lindsey M. Kitchell ◽  
Francisco J. Parada ◽  
Brandi L. Emerick ◽  
Tom A. Busey

2012 ◽  
Vol 19 (2) ◽  
pp. 97-111 ◽  
Author(s):  
Muhammad Ahmad ◽  
Syungyoung Lee ◽  
Ihsan Ul Haq ◽  
Qaisar Mushtaq

Author(s):  
Manpreet Kaur ◽  
Chamkaur Singh

Educational Data Mining (EDM) is an emerging research area help the educational institutions to improve the performance of their students. Feature Selection (FS) algorithms remove irrelevant data from the educational dataset and hence increases the performance of classifiers used in EDM techniques. This paper present an analysis of the performance of feature selection algorithms on student data set. .In this papers the different problems that are defined in problem formulation. All these problems are resolved in future. Furthermore the paper is an attempt of playing a positive role in the improvement of education quality, as well as guides new researchers in making academic intervention.


2012 ◽  
Vol 57 (3) ◽  
pp. 829-835 ◽  
Author(s):  
Z. Głowacz ◽  
J. Kozik

The paper describes a procedure for automatic selection of symptoms accompanying the break in the synchronous motor armature winding coils. This procedure, called the feature selection, leads to choosing from a full set of features describing the problem, such a subset that would allow the best distinguishing between healthy and damaged states. As the features the spectra components amplitudes of the motor current signals were used. The full spectra of current signals are considered as the multidimensional feature spaces and their subspaces are tested. Particular subspaces are chosen with the aid of genetic algorithm and their goodness is tested using Mahalanobis distance measure. The algorithm searches for such a subspaces for which this distance is the greatest. The algorithm is very efficient and, as it was confirmed by research, leads to good results. The proposed technique is successfully applied in many other fields of science and technology, including medical diagnostics.


Sign in / Sign up

Export Citation Format

Share Document