A new approach to feature selection

Author(s):  
Matthias Scherf
2019 ◽  
Vol 26 (2) ◽  
pp. 221-243
Author(s):  
Samir Elloumi

AbstractTextual Feature Selection (TFS) aims to extract relevant parts or segments from text as being the most relevant ones w.r.t. the information it expresses. The selected features are useful for automatic indexing, summarization, document categorization, knowledge discovery, so on. Regarding the huge amount of electronic textual data daily published, many challenges related to the semantic aspect as well as the processing efficiency are addressed. In this paper, we propose a new approach for TFS based on Formal Concept Analysis background. Mainly, we propose to extract textual features by exploring the regularities in a formal context where isolated points exist. We introduce the notion ofN-composite isolated points as a set ofNwords to be considered as a unique textual feature. We show that a reduced value ofN(between 1 and 3) allows extracting significant textual features compared with existing approaches even for non-completely covering an initial formal context.


2012 ◽  
Vol 38 (3) ◽  
pp. 222-233 ◽  
Author(s):  
Yen-Liang Chen ◽  
Yu-Ting Chiu

A vector space model (VSM) composed of selected important features is a common way to represent documents, including patent documents. Patent documents have some special characteristics that make it difficult to apply traditional feature selection methods directly: (a) it is difficult to find common terms for patent documents in different categories; and (b) the class label of a patent document is hierarchical rather than flat. Hence, in this article we propose a new approach that includes a hierarchical feature selection (HFS) algorithm which can be used to select more representative features with greater discriminative ability to present a set of patent documents with hierarchical class labels. The performance of the proposed method is evaluated through application to two documents sets with 2400 and 9600 patent documents, where we extract candidate terms from their titles and abstracts. The experimental results reveal that a VSM whose features are selected by a proportional selection process gives better coverage, while a VSM whose features are selected with a weighted-summed selection process gives higher accuracy.


2004 ◽  
Vol 13 (04) ◽  
pp. 791-800 ◽  
Author(s):  
HOLGER FRÖHLICH ◽  
OLIVIER CHAPELLE ◽  
BERNHARD SCHÖLKOPF

The problem of feature selection is a difficult combinatorial task in Machine Learning and of high practical relevance, e.g. in bioinformatics. Genetic Algorithms (GAs) offer a natural way to solve this problem. In this paper we present a special Genetic Algorithm, which especially takes into account the existing bounds on the generalization error for Support Vector Machines (SVMs). This new approach is compared to the traditional method of performing cross-validation and to other existing algorithms for feature selection.


1973 ◽  
Vol 5 (4) ◽  
pp. 335-352 ◽  
Author(s):  
Josef Kittler ◽  
Peter C. Young

2017 ◽  
pp. 108-115
Author(s):  
Є.В. БОДЯНСЬКИЙ ◽  
І.Г. ПЕРОВА ◽  
Г.В. СТОЙКА

Feature Selection task is one of most complicated and actual in Data Mining area. Any approaches for it solving are based on non-mathematical and presentative hypothesis. New approach for evaluation of medical features information quantity, based on optimal combination of Feature Selection and Feature Extraction methods. This approach permits to produce optimal reduced number of features with linguistic interpreting of each ones. Hybrid system of Feature Selection/Extraction is proposed. This system is numerically simple, can produce Feature Selection/ Extraction with any number of features using standard method of principal component analysis and calculating distance between first principal component and all medical features.


Author(s):  
Mohammad Ali Ghaderi ◽  
Nasser Yazdani ◽  
Behzad Moshiri ◽  
Maryam Tayefeh Mahmoudi

Sign in / Sign up

Export Citation Format

Share Document