An Efficient Ensemble Learning Method for Gene Microarray Classification

The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.

Download Full-text

Support Vector Machine as Feature Selection Method in Classifier Ensembles

International Journal of Modern Education and Computer Science ◽

10.5815/ijmecs.2014.04.01 ◽

2014 ◽

Vol 6 (4) ◽

pp. 1-8

Author(s):

Jasmina Đ. Novakovic

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Classifier Ensembles

Download Full-text

A fuzzy gaussian rank aggregation ensemble feature selection method for microarray data

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-190134 ◽

2021 ◽

Vol 24 (4) ◽

pp. 289-301

Author(s):

B. Venkatesh ◽

J. Anuradha

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Classification Accuracy ◽

Performance Metrics ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Binary Particle Swarm Optimization ◽

Selection Methods

In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.

Download Full-text

Design of Text Categorization System Based on SVM

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.532-533.1191 ◽

2012 ◽

Vol 532-533 ◽

pp. 1191-1195 ◽

Cited By ~ 1

Author(s):

Zhen Yan Liu ◽

Wei Ping Wang ◽

Yong Wang

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Text Categorization ◽

Feature Selection Method ◽

Extraction Methods ◽

Support Vector ◽

Text Representation ◽

Text Feature ◽

Categorization System ◽

Classifier Training

This paper introduces the design of a text categorization system based on Support Vector Machine (SVM). It analyzes the high dimensional characteristic of text data, the reason why SVM is suitable for text categorization. According to system data flow this system is constructed. This system consists of three subsystems which are text representation, classifier training and text classification. The core of this system is the classifier training, but text representation directly influences the currency of classifier and the performance of the system. Text feature vector space can be built by different kinds of feature selection and feature extraction methods. No research can indicate which one is the best method, so many feature selection and feature extraction methods are all developed in this system. For a specific classification task every feature selection method and every feature extraction method will be tested, and then a set of the best methods will be adopted.

Download Full-text

Feature Selection Method Based on Mutual Information and Support Vector Machine

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800142150021x ◽

2021 ◽

pp. 2150021

Author(s):

Gang Liu ◽

Chunlei Yang ◽

Sen Liu ◽

Chunbao Xiao ◽

Bin Song

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Mutual Information ◽

Classification Accuracy ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Standard Data ◽

Feature Dimension

A feature selection method based on mutual information and support vector machine (SVM) is proposed in order to eliminate redundant feature and improve classification accuracy. First, local correlation between features and overall correlation is calculated by mutual information. The correlation reflects the information inclusion relationship between features, so the features are evaluated and redundant features are eliminated with analyzing the correlation. Subsequently, the concept of mean impact value (MIV) is defined and the influence degree of input variables on output variables for SVM network based on MIV is calculated. The importance weights of the features described with MIV are sorted by descending order. Finally, the SVM classifier is used to implement feature selection according to the classification accuracy of feature combination which takes MIV order of feature as a reference. The simulation experiments are carried out with three standard data sets of UCI, and the results show that this method can not only effectively reduce the feature dimension and high classification accuracy, but also ensure good robustness.

Download Full-text

A Hybrid Feature Selection Method Based on Symmetrical Uncertainty and Support Vector Machine for High-Dimensional Data Classification

Intelligent Information and Database Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-319-54472-4_67 ◽

2017 ◽

pp. 721-727 ◽

Cited By ~ 2

Author(s):

Yongjun Piao ◽

Keun Ho Ryu

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

High Dimensional Data ◽

Feature Selection Method ◽

Data Classification ◽

Selection Method ◽

High Dimensional ◽

Support Vector ◽

Symmetrical Uncertainty

Download Full-text

A Machine Learning Approach to EEG-based Prediction of Human Affective States Using Recursive Feature Elimination Method

MATEC Web of Conferences ◽

10.1051/matecconf/202133504001 ◽

2021 ◽

Vol 335 ◽

pp. 04001

Author(s):

Didar Dadebayev ◽

Goh Wei Wei ◽

Tan Ee Xion

Keyword(s):

Feature Selection ◽

Fractal Dimension ◽

Emotion Recognition ◽

Affective Computing ◽

Feature Selection Method ◽

Recursive Feature Elimination ◽

Support Vector ◽

Emotional States ◽

Affective States ◽

Emotion Classification

Emotion recognition, as a branch of affective computing, has attracted great attention in the last decades as it can enable more natural brain-computer interface systems. Electroencephalography (EEG) has proven to be an effective modality for emotion recognition, with which user affective states can be tracked and recorded, especially for primitive emotional events such as arousal and valence. Although brain signals have been shown to correlate with emotional states, the effectiveness of proposed models is somewhat limited. The challenge is improving accuracy, while appropriate extraction of valuable features might be a key to success. This study proposes a framework based on incorporating fractal dimension features and recursive feature elimination approach to enhance the accuracy of EEG-based emotion recognition. The fractal dimension and spectrum-based features to be extracted and used for more accurate emotional state recognition. Recursive Feature Elimination will be used as a feature selection method, whereas the classification of emotions will be performed by the Support Vector Machine (SVM) algorithm. The proposed framework will be tested with a widely used public database, and results are expected to demonstrate higher accuracy and robustness compared to other studies. The contributions of this study are primarily about the improvement of the EEG-based emotion classification accuracy. There is a potential restriction of how generic the results can be as different EEG dataset might yield different results for the same framework. Therefore, experimenting with different EEG dataset and testing alternative feature selection schemes can be very interesting for future work.

Download Full-text

Feature Selection Method for L1-norm Twin Support Vector Regression

10.1109/iccais52680.2021.9624538 ◽

2021 ◽

Author(s):

Fan Wang ◽

Qing Wu ◽

Yanlin Fu

Keyword(s):

Feature Selection ◽

Support Vector Regression ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

L1 Norm ◽

Twin Support Vector Regression

Download Full-text

Computer-Aided Detection of Polyps in CT Colonography by Means of Feature Selection and Massive-Training Support Vector Regression

Machine Learning in Computer-Aided Diagnosis - Advances in Bioinformatics and Biomedical Engineering ◽

10.4018/978-1-4666-0059-1.ch009 ◽

2012 ◽

pp. 178-201

Author(s):

Jian-Wu Xu ◽

Kenji Suzuki

Keyword(s):

Feature Selection ◽

Support Vector Regression ◽

Ct Colonography ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Computer Aided Detection ◽

Training Time ◽

Computer Aided ◽

Wilks Lambda

One of the major challenges in current Computer-Aided Detection (CADe) of polyps in CT Colonography (CTC) is to improve the specificity without sacrificing the sensitivity. If a large number of False Positive (FP) detections of polyps are produced by the scheme, radiologists might lose their confidence in the use of CADe. In this chapter, the authors used a nonlinear regression model operating on image voxels and a nonlinear classification model with extracted image features based on Support Vector Machines (SVMs). They investigated the feasibility of a Support Vector Regression (SVR) in the massive-training framework, and the authors developed a Massive-Training SVR (MTSVR) in order to reduce the long training time associated with the Massive-Training Artificial Neural Network (MTANN) for reduction of FPs in CADe of polyps in CTC. In addition, the authors proposed a feature selection method directly coupled with an SVM classifier to maximize the CADe system performance. They compared the proposed feature selection method with the conventional stepwise feature selection based on Wilks’ lambda with a linear discriminant analysis classifier. The FP reduction system based on the proposed feature selection method was able to achieve a 96.0% by-polyp sensitivity with an FP rate of 4.1 per patient. The performance is better than that of the stepwise feature selection based on Wilks’ lambda (which yielded the same sensitivity with 18.0 FPs/patient). To test the performance of the proposed MTSVR, the authors compared it with the original MTANN in the distinction between actual polyps and various types of FPs in terms of the training time reduction and FP reduction performance. The authors’ CTC database consisted of 240 CTC datasets obtained from 120 patients in the supine and prone positions. With MTSVR, they reduced the training time by a factor of 190, while achieving a performance (by-polyp sensitivity of 94.7% with 2.5 FPs/patient) comparable to that of the original MTANN (which has the same sensitivity with 2.6 FPs/patient).

Download Full-text

Discriminative Feature Selection Based on Imbalance SVDD for Fault Detection of Semiconductor Manufacturing Processes

Journal of Circuits System and Computers ◽

10.1142/s0218126616501437 ◽

2016 ◽

Vol 25 (11) ◽

pp. 1650143 ◽

Cited By ~ 5

Author(s):

Jian Wang ◽

Jian Feng ◽

Zhiyan Han

Keyword(s):

Feature Selection ◽

Fault Detection ◽

Semiconductor Industry ◽

Feature Selection Method ◽

Class Imbalance ◽

Support Vector ◽

Support Vector Data Description ◽

The Real ◽

Discriminative Feature ◽

New Feature

Feature selection has become a key step of fault detection. Unfortunately, the class imbalance in the modern semiconductor industry makes feature selection quite challenging. This paper analyzes the challenges and indicates the limitations of the traditional supervised and unsupervised feature selection methods. To cope with the limitations, a new feature selection method named imbalanced support vector data description-radius-recursive feature selection (ISVDD-radius-RFE) is proposed. When selecting features, the ISVDD-radius-RFE has three advantages: (1) ISVDD-radius-RFE is designed to find the most representative feature by finding the real shape of normal samples. (2) ISVDD-radius-RFE can represent the real shape of normal samples more correctly by introducing the discriminant information from fault samples. (3) ISVDD-radius-RFE is optimized for fault detection where the imbalance data is common. The kernel ISVDD-radius-RFE is also described in this paper. The proposed method is demonstrated through its application in the banana set and SECOM dataset. The experimental results confirm ISVDD-radius-RFE and kernel ISVDD-radius-RFE improve the performance of fault detection.

Download Full-text

An Embedded Feature Selection and Multi-Class Classification Method for Detection of the Progression from Mild Cognitive Impairment to Alzheimer's Disease

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2020.2888 ◽

2020 ◽

Vol 10 (2) ◽

pp. 370-379 ◽

Cited By ~ 1

Author(s):

Jie Cai ◽

Lingjing Hu ◽

Zhou Liu ◽

Ke Zhou ◽

Huailing Zhang

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Feature Selection ◽

Cognitive Impairment ◽

Mild Cognitive Impairment ◽

Treatment Plan ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Feature Subset

Background: Mild cognitive impairment (MCI) patients are a high-risk group for Alzheimer's disease (AD). Each year, the diagnosed of 10–15% of MCI patients are converted to AD (MCI converters, MCI_C), while some MCI patients remain relatively stable, and unconverted (MCI stable, MCI_S). MCI patients are considered the most suitable population for early intervention treatment for dementia, and magnetic resonance imaging (MRI) is clinically the most recommended means of imaging examination. Therefore, using MRI image features to reliably predict the conversion from MCI to AD can help physicians carry out an effective treatment plan for patients in advance so to prevent or slow down the development of dementia. Methods: We proposed an embedded feature selection method based on the least squares loss function and within-class scatter to select the optimal feature subset. The optimal subsets of features were used for binary classification (AD, MCI_C, MCI_S, normal control (NC) in pairs) based on a support vector machine (SVM), and the optimal 3-class features were used for 3-class classification (AD, MCI_C, MCI_S, NC in triples) based on one-versus-one SVMs (OVOSVMs). To ensure the insensitivity of the results to the random train/test division, a 10-fold cross-validation has been repeated for each classification. Results: Using our method for feature selection, only 7 features were selected from the original 90 features. With using the optimal subset in the SVM, we classified MCI_C from MCI_S with an accuracy, sensitivity, and specificity of 71.17%, 68.33% and 73.97%, respectively. In comparison, in the 3-class classification (AD vs. MCI_C vs. MCI_S) with OVOSVMs, our method selected 24 features, and the classification accuracy was 81.9%. The feature selection results were verified to be identical to the conclusions of the clinical diagnosis. Our feature selection method achieved the best performance, comparing with the existing methods using lasso and fused lasso for feature selection. Conclusion: The results of this study demonstrate the potential of the proposed approach for predicting the conversion from MCI to AD by identifying the affected brain regions undergoing this conversion.

Download Full-text