Feature Selection in High Dimension

Author(s):  
Sébastien Gadat ◽  
Sébastien Gadat

Variable selection for classification is a crucial paradigm in image analysis. Indeed, images are generally described by a large amount of features (pixels, edges …) although it is difficult to obtain a sufficiently large number of samples to draw reliable inference for classifications using the whole number of features. The authors describe in this chapter some simple and effective features selection methods based on filter strategy. They also provide some more sophisticated methods based on margin criterion or stochastic approximation techniques that achieve great performances of classification with a very small proportion of variables. Most of these “wrapper” methods are dedicated to a special case of classifier, except the Optimal features Weighting algorithm (denoted OFW in the sequel) which is a meta-algorithm and works with any classifier. A large part of this chapter will be dedicated to the description of the description of OFW and hybrid OFW algorithms. The authors illustrate also several other methods on practical examples of face detection problems.

IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Mohammed Qaraad ◽  
Souad Amjad ◽  
Ibrahim I.M. Manhrawy ◽  
Hanaa Fathi ◽  
Bayoumi A. Hassan ◽  
...  

Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 2910
Author(s):  
Kei Suzuki ◽  
Tipporn Laohakangvalvit ◽  
Ryota Matsubara ◽  
Midori Sugaya

In human emotion estimation using an electroencephalogram (EEG) and heart rate variability (HRV), there are two main issues as far as we know. The first is that measurement devices for physiological signals are expensive and not easy to wear. The second is that unnecessary physiological indexes have not been removed, which is likely to decrease the accuracy of machine learning models. In this study, we used single-channel EEG sensor and photoplethysmography (PPG) sensor, which are inexpensive and easy to wear. We collected data from 25 participants (18 males and 7 females) and used a deep learning algorithm to construct an emotion classification model based on Arousal–Valence space using several feature combinations obtained from physiological indexes selected based on our criteria including our proposed feature selection methods. We then performed accuracy verification, applying a stratified 10-fold cross-validation method to the constructed models. The results showed that model accuracies are as high as 90% to 99% by applying the features selection methods we proposed, which suggests that a small number of physiological indexes, even from inexpensive sensors, can be used to construct an accurate emotion classification model if an appropriate feature selection method is applied. Our research results contribute to the improvement of an emotion classification model with a higher accuracy, less cost, and that is less time consuming, which has the potential to be further applied to various areas of applications.


2020 ◽  
Author(s):  
Raquel Candido ◽  
Rafael Lama ◽  
Natália Chiari ◽  
Marcello Nogueira-Barbosa ◽  
Paulo Azevedo Marques ◽  
...  

Non-traumatic Vertebral Compression Fractures (VCFs) are generally caused by osteoporosis (benign VCFs) or metastatic cancer (malignant VCFs) and the success of the medical treatment strongly depends on a fast and correct classification of VCFs. Recently, methods for computer-aided diagnosis (CAD) based on machine learning have been proposed for classifying VCFs. In this work, we investigate the problem of clustering images of VCFs and the impact of feature selection by genetic algorithms, comparing the clustering i)with all features and ii)with feature selection through the purity results. The analysis of the clusters helps to understand the results of classifiers and difficulties of differentiating images of different classes by an expert. The results indicate that features selection improved the separability of clusters and purity. Feature selection also helps to understand which attributes are most important for analysing the images of vertebral bodies.


Author(s):  
Maofu Liu ◽  
Huijun Hu

The image shape feature can be described by the image Zernike moments. In this chapter, the authors point out the problem that the high dimension image Zernike moments shape feature vector can describe more detail of the original image but has too many elements making trouble for the next image analysis phases. Then the low dimension image Zernike moments shape feature vector should be improved and optimized to describe more detail of the original image. Therefore, the optimization algorithm based on evolutionary computation is designed and implemented in this chapter to solve this problem. The experimental results demonstrate the feasibility of the optimization algorithm.


2017 ◽  
Vol 2017 ◽  
pp. 1-10 ◽  
Author(s):  
Khairan D. Rajab

Phishing is one of the serious web threats that involves mimicking authenticated websites to deceive users in order to obtain their financial information. Phishing has caused financial damage to the different online stakeholders. It is massive in the magnitude of hundreds of millions; hence it is essential to minimize this risk. Classifying websites into “phishy” and legitimate types is a primary task in data mining that security experts and decision makers are hoping to improve particularly with respect to the detection rate and reliability of the results. One way to ensure the reliability of the results and to enhance performance is to identify a set of related features early on so the data dimensionality reduces and irrelevant features are discarded. To increase reliability of preprocessing, this article proposes a new feature selection method that combines the scores of multiple known methods to minimize discrepancies in feature selection results. The proposed method has been applied to the problem of website phishing classification to show its pros and cons in identifying relevant features. Results against a security dataset reveal that the proposed preprocessing method was able to derive new features datasets which when mined generate high competitive classifiers with reference to detection rate when compared to results obtained from other features selection methods.


Blood ◽  
1980 ◽  
Vol 56 (4) ◽  
pp. 696-700 ◽  
Author(s):  
GB Howe ◽  
KV Swettenham ◽  
HL Currey

Abstract Most methods of measuring neutrophil motility provide information mainly about the performance of a small proportion of the fastest moving cells. Application of a computer-linked image analysis technique, using the “Quantimet,” provides a convenient, automated method of measuring the motility of the whole cell population. This makes it possible to test whether changes in motility represent a homogeneous alteration affecting all cells or a change in the numbers or performance of a subset of cells. In this study the neutrophils from patients with uncomplicated rheumatoid arthritis were found to perform similarly to normals, while cells from patients with Felty's syndrome were markedly slower. This was an overall, homogeneous slowing of the whole cell population, not due to a loss of fast moving cells.


2010 ◽  
Vol 47 (02) ◽  
pp. 572-585 ◽  
Author(s):  
Netta Cohen ◽  
Jonathan Jordan ◽  
Margaritis Voliotis

We consider a preferential duplication model for growing random graphs, extending previous models of duplication graphs by selecting the vertex to be duplicated with probability proportional to its degree. We show that a special case of this model can be analysed using the same stochastic approximation as for vertex-reinforced random walks, and show that ‘trapping’ behaviour can occur, such that the descendants of a particular group of initial vertices come to dominate the graph.


2020 ◽  
Vol 14 (3) ◽  
pp. 269-279
Author(s):  
Hayet Djellali ◽  
Nacira Ghoualmi-Zine ◽  
Souad Guessoum

This paper investigates feature selection methods based on hybrid architecture using feature selection algorithm called Adapted Fast Correlation Based Feature selection and Support Vector Machine Recursive Feature Elimination (AFCBF-SVMRFE). The AFCBF-SVMRFE has three stages and composed of SVMRFE embedded method with Correlation based Features Selection. The first stage is the relevance analysis, the second one is a redundancy analysis, and the third stage is a performance evaluation and features restoration stage. Experiments show that the proposed method tested on different classifiers: Support Vector Machine SVM and K nearest neighbors KNN provide a best accuracy on various dataset. The SVM classifier outperforms KNN classifier on these data. The AFCBF-SVMRFE outperforms FCBF multivariate filter, SVMRFE, Particle swarm optimization PSO and Artificial bees colony ABC.


Sign in / Sign up

Export Citation Format

Share Document