Semi-Supervised Feature Selection with Adaptive Discriminant Analysis

BETTER ALTERNATIVES FOR STEPWISE DISCRIMINANT ANALYSIS

Acta Universitatis Lodziensis Folia oeconomica ◽

10.18778/0208-6018.311.02 ◽

2015 ◽

Vol 1 (311) ◽

Author(s):

Katarzyna Stąpor

Keyword(s):

Feature Selection ◽

Discriminant Analysis ◽

Tabu Search ◽

Stepwise Discriminant Analysis ◽

Selection Methods ◽

Discrimination Power ◽

Statistical Software ◽

Software Packages ◽

Benchmark Datasets

Discriminant Analysis can best be defined as a technique which allows the classification of an individual into several dictinctive populations on the basis of a set of measurements. Stepwise discriminant analysis (SDA) is concerned with selecting the most important variables whilst retaining the highest discrimination power possible. The process of selecting a smaller number of variables is often necessary for a variety number of reasons. In the existing statistical software packages SDA is based on the classic feature selection methods. Many problems with such stepwise procedures have been identified. In this work the new method based on the metaheuristic strategy tabu search will be presented together with the experimental results conducted on the selected benchmark datasets. The results are promising.

Download Full-text

An Optimal Categorization of Feature Selection Methods for Knowledge Discovery

Data Mining ◽

10.4018/978-1-4666-2455-9.ch005 ◽

2013 ◽

pp. 92-106

Author(s):

Harleen Kaur ◽

Ritu Chauhan ◽

M. Alam

Keyword(s):

Data Mining ◽

Feature Selection ◽

Discriminant Analysis ◽

Medical Data ◽

Stepwise Discriminant Analysis ◽

Selection Methods ◽

Medical Databases ◽

Active Research ◽

Potential Improvement ◽

Large Effort

With the continuous availability of massive experimental medical data has given impetus to a large effort in developing mathematical, statistical and computational intelligent techniques to infer models from medical databases. Feature selection has been an active research area in pattern recognition, statistics, and data mining communities. However, there have been relatively few studies on preprocessing data used as input for data mining systems in medical data. In this chapter, the authors focus on several feature selection methods as to their effectiveness in preprocessing input medical data. They evaluate several feature selection algorithms such as Mutual Information Feature Selection (MIFS), Fast Correlation-Based Filter (FCBF) and Stepwise Discriminant Analysis (STEPDISC) with machine learning algorithm naive Bayesian and Linear Discriminant analysis techniques. The experimental analysis of feature selection technique in medical databases has enable the authors to find small number of informative features leading to potential improvement in medical diagnosis by reducing the size of data set, eliminating irrelevant features, and decreasing the processing time.

Download Full-text

Experimental identification of hard data sets for classification and feature selection methods with insights on method selection

Data & Knowledge Engineering ◽

10.1016/j.datak.2018.09.002 ◽

2018 ◽

Vol 118 ◽

pp. 41-51 ◽

Cited By ~ 3

Author(s):

Cuiju Luan ◽

Guozhu Dong

Keyword(s):

Feature Selection ◽

Data Sets ◽

Selection Methods ◽

Hard Data ◽

Experimental Identification ◽

Method Selection

Download Full-text

CLASSIFYING TEMPORAL MICROARRAY DATA BY SELECTING INFORMATIVE GENES

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720013410060 ◽

2013 ◽

Vol 11 (03) ◽

pp. 1341006

Author(s):

QIANG LOU ◽

ZORAN OBRADOVIC

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Microarray Data ◽

Data Sets ◽

Temporal Data ◽

Expression Data ◽

Selection Methods ◽

Temporal Gene Expression ◽

Single Matrix

In order to more accurately predict an individual's health status, in clinical applications it is often important to perform analysis of high-dimensional gene expression data that varies with time. A major challenge in predicting from such temporal microarray data is that the number of biomarkers used as features is typically much larger than the number of labeled subjects. One way to address this challenge is to perform feature selection as a preprocessing step and then apply a classification method on selected features. However, traditional feature selection methods cannot handle multivariate temporal data without applying techniques that flatten temporal data into a single matrix in advance. In this study, a feature selection filter that can directly select informative features from temporal gene expression data is proposed. In our approach, we measure the distance between multivariate temporal data from two subjects. Based on this distance, we define the objective function of temporal margin based feature selection to maximize each subject's temporal margin in its own relevant subspace. The experimental results on synthetic and two real flu data sets provide evidence that our method outperforms the alternatives, which flatten the temporal data in advance.

Download Full-text

Robust Feature Selection on Incomplete Data

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/443 ◽

2018 ◽

Cited By ~ 1

Author(s):

Wei Zheng ◽

Xiaofeng Zhu ◽

Yonghua Zhu ◽

Shichao Zhang

Keyword(s):

Feature Selection ◽

Incomplete Data ◽

High Dimensional ◽

Data Sets ◽

Selection Methods ◽

Limited Ability ◽

Training Samples ◽

Indicator Matrix ◽

Selection Framework ◽

Incomplete Datasets

Feature selection is an indispensable preprocessing procedure for high-dimensional data analysis,but previous feature selection methods usually ignore sample diversity (i.e., every sample has individual contribution for the model construction) andhave limited ability to deal with incomplete datasets where a part of training samples have unobserved data. To address these issues, in this paper, we firstly propose a robust feature selectionframework to relieve the influence of outliers, andthen introduce an indicator matrix to avoid unobserved data to take participation in numerical computation of feature selection so that both our proposed feature selection framework and exiting feature selection frameworks are available to conductfeature selection on incomplete data sets. We further propose a new optimization algorithm to optimize the resulting objective function as well asprove our algorithm to converge fast. Experimental results on both real and artificial incompletedata sets demonstrated that our proposed methodoutperformed the feature selection methods undercomparison in terms of clustering performance.

Download Full-text

A Preference Model on Adaptive Affinity Propagation

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v8i3.pp1805-1813 ◽

2018 ◽

Vol 8 (3) ◽

pp. 1805 ◽

Cited By ~ 1

Author(s):

Rina Refianti ◽

Achmad Benny Mutiara ◽

Asep Juarna ◽

Adang Suhendra

Keyword(s):

Data Clustering ◽

Message Passing ◽

Clustering Algorithms ◽

Experimental Results ◽

Data Sets ◽

Similarity Matrix ◽

Preference Model ◽

New Model ◽

Data Points ◽

Scanning Algorithm

In recent years, two new data clustering algorithms have been proposed. One of them isAffinity Propagation (AP). AP is a new data clustering technique that use iterative message passing and consider all data points as potential exemplars. Two important inputs of AP are a similarity matrix (SM) of the data and the parameter ”preference” p. Although the original AP algorithm has shown much success in data clustering, it still suffer from one limitation: it is not easy to determine the value of the parameter ”preference” p which can result an optimal clustering solution. To resolve this limitation, we propose a new model of the parameter ”preference” p, i.e. it is modeled based on the similarity distribution. Having the SM and p, Modified Adaptive AP (MAAP) procedure is running. MAAP procedure means that we omit the adaptive p-scanning algorithm as in original Adaptive-AP (AAP) procedure. Experimental results on random non-partition and partition data sets show that (i) the proposed algorithm, MAAP-DDP, is slower than original AP for random non-partition dataset, (ii) for random 4-partition dataset and real datasets the proposed algorithm has succeeded to identify clusters according to the number of dataset’s true labels with the execution times that are comparable with those original AP. Beside that the MAAP-DDP algorithm demonstrates more feasible and effective than original AAP procedure.

Download Full-text

An Optimal Categorization of Feature Selection Methods for Knowledge Discovery

Visual Analytics and Interactive Technologies ◽

10.4018/978-1-60960-102-7.ch006 ◽

2011 ◽

pp. 94-108 ◽

Cited By ~ 4

Author(s):

Harleen Kaur ◽

Ritu Chauhan ◽

M. Alam

Keyword(s):

Data Mining ◽

Feature Selection ◽

Discriminant Analysis ◽

Medical Data ◽

Stepwise Discriminant Analysis ◽

Selection Methods ◽

Medical Databases ◽

Active Research ◽

Potential Improvement ◽

Large Effort

With the continuous availability of massive experimental medical data has given impetus to a large effort in developing mathematical, statistical and computational intelligent techniques to infer models from medical databases. Feature selection has been an active research area in pattern recognition, statistics, and data mining communities. However, there have been relatively few studies on preprocessing data used as input for data mining systems in medical data. In this chapter, the authors focus on several feature selection methods as to their effectiveness in preprocessing input medical data. They evaluate several feature selection algorithms such as Mutual Information Feature Selection (MIFS), Fast Correlation-Based Filter (FCBF) and Stepwise Discriminant Analysis (STEPDISC) with machine learning algorithm naive Bayesian and Linear Discriminant analysis techniques. The experimental analysis of feature selection technique in medical databases has enable the authors to find small number of informative features leading to potential improvement in medical diagnosis by reducing the size of data set, eliminating irrelevant features, and decreasing the processing time.

Download Full-text

SPARSITY SCORE: A NOVEL GRAPH-PRESERVING FEATURE SELECTION METHOD

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001414500098 ◽

2014 ◽

Vol 28 (04) ◽

pp. 1450009 ◽

Cited By ~ 16

Author(s):

MINGXIA LIU ◽

DAOQIANG ZHANG

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Original Data ◽

Selection Method ◽

Compact Representation ◽

Data Sets ◽

Selection Methods ◽

Clustering And Classification ◽

Filter Type ◽

Selection Framework

As thousands of features are available in many pattern recognition and machine learning applications, feature selection remains an important task to find the most compact representation of the original data. In the literature, although a number of feature selection methods have been developed, most of them focus on optimizing specific objective functions. In this paper, we first propose a general graph-preserving feature selection framework where graphs to be preserved vary in specific definitions, and show that a number of existing filter-type feature selection algorithms can be unified within this framework. Then, based on the proposed framework, a new filter-type feature selection method called sparsity score (SS) is proposed. This method aims to preserve the structure of a pre-defined l1 graph that is proven robust to data noise. Here, the modified sparse representation based on an l1-norm minimization problem is used to determine the graph adjacency structure and corresponding affinity weight matrix simultaneously. Furthermore, a variant of SS called supervised SS (SuSS) is also proposed, where the l1 graph to be preserved is constructed by using only data points from the same class. Experimental results of clustering and classification tasks on a series of benchmark data sets show that the proposed methods can achieve better performance than conventional filter-type feature selection methods.

Download Full-text

Multi-Label Feature Selection Method Based on Dynamic Weight

10.21203/rs.3.rs-604646/v1 ◽

2021 ◽

Author(s):

Ping Zhang ◽

Jiyao Sheng ◽

Wanfu Gao ◽

Juncheng Hu ◽

Yonghao Li

Keyword(s):

Feature Selection ◽

Dynamic Change ◽

Feature Selection Method ◽

Selection Method ◽

Data Sets ◽

Selection Methods ◽

Real World Data ◽

Amount Of Information ◽

The Difference ◽

Classification Information

Abstract Multi-label feature selection attracts considerable attention from multi-label learning. Information-theory based multi-label feature selection methods intend to select the most informative features and reduce the uncertain amount of information of labels. Previous methods regard the uncertain amount of information of labels as constant. In fact, as the classification information of the label set is captured by features, the remaining uncertainty of each label is changing dynamically. In this paper, we categorize labels into two groups: one contains the labels with few remaining uncertainty, which means that most of classification information with respect to the labels has been obtained by the already-selected features; another group contains the labels with extensive remaining uncertainty, which means that the classification information of these labels is neglected by already-selected features. Feature selection aims to select the new features with highly relevant to the labels in the second group. Existing methods do not distinguish the difference between two label groups and ignore the dynamic change amount of information of labels. To this end, a Relevancy Ratio is designed to clarify the dynamic change amount of information of each label under the condition of the already-selected features. Afterwards, a Weighted Feature Relevancy is defined to evaluate the candidate features. Finally, a new multi-label Feature Selection method based on Weighted Feature Relevancy (WFRFS) is proposed. The experiments obtain encouraging results of WFRFS in comparison to six multi-label feature selection methods on thirteen real-world data sets.

Download Full-text

Clustering Based Classification and Analysis of Data

International Journal of Computer and Communication Technology ◽

10.47893/ijcct.2014.1259 ◽

2014 ◽

pp. 280-283

Author(s):

NEERAJ SAHU ◽

D. S. RAJPUT ◽

R. S. THAKUR ◽

G. S. THAKUR

Keyword(s):

Feature Selection ◽

Document Classification ◽

Experimental Results ◽

Data Sets ◽

Group Data ◽

Document Collection ◽

Text Preprocessing

This paper presents Clustering Based Document classification and analysis of data. The proposed Clustering Based classification and analysis of data approach is based on Unsupervised and Supervised Document Classification. In this paper Unsupervised Document and Supervised Document Classification are used. In this approach Document collection, Text Preprocessing, Feature Selection, Indexing, Clustering Process and Results Analysis steps are used. Twenty News group data sets [20] are used in the Experiments. For experimental results analysis evaluated using the Analytical SAS 9.0 Software is used. The Experimental Results show the proposed approach out performs.

Download Full-text