A Research Travelogue on Feature Sub Set Selection Algorithms

2019 ◽  
Vol 8 (3) ◽  
pp. 35-37
Author(s):  
R. Ravikumar ◽  
M. Babu Reddy

In machine learning as the dimensionality of the data rises, the amount of data required to provide a reliable analysis grows exponentially. To perform dimensionality reduction on high-dimensional micro array data, many different feature selection and feature extraction methods exist and they are being widely used. All these methods aim to remove redundant and irrelevant features so that classification of new instances will be more accurate. Analyzing microarrays can be difficult due to the size of the data they provide. In addition the complicated relations among the different genes make analysis more difficult and removing excess features can improve the quality of the results. Feature selection has been an active and fruitful field of research area in pattern recognition, machine learning, statistics and data mining communities. The main objective of this paper is feature selection is to choose a subset of input variables by eliminating features.

2015 ◽  
Vol 2015 ◽  
pp. 1-13 ◽  
Author(s):  
Zena M. Hira ◽  
Duncan F. Gillies

We summarise various ways of performing dimensionality reduction on high-dimensional microarray data. Many different feature selection and feature extraction methods exist and they are being widely used. All these methods aim to remove redundant and irrelevant features so that classification of new instances will be more accurate. A popular source of data is microarrays, a biological platform for gathering gene expressions. Analysing microarrays can be difficult due to the size of the data they provide. In addition the complicated relations among the different genes make analysis more difficult and removing excess features can improve the quality of the results. We present some of the most popular methods for selecting significant features and provide a comparison between them. Their advantages and disadvantages are outlined in order to provide a clearer idea of when to use each one of them for saving computational time and resources.


Author(s):  
Manpreet Kaur ◽  
Chamkaur Singh

Educational Data Mining (EDM) is an emerging research area help the educational institutions to improve the performance of their students. Feature Selection (FS) algorithms remove irrelevant data from the educational dataset and hence increases the performance of classifiers used in EDM techniques. This paper present an analysis of the performance of feature selection algorithms on student data set. .In this papers the different problems that are defined in problem formulation. All these problems are resolved in future. Furthermore the paper is an attempt of playing a positive role in the improvement of education quality, as well as guides new researchers in making academic intervention.


Sensors ◽  
2019 ◽  
Vol 19 (4) ◽  
pp. 916 ◽  
Author(s):  
Wen Cao ◽  
Chunmei Liu ◽  
Pengfei Jia

Aroma plays a significant role in the quality of citrus fruits and processed products. The detection and analysis of citrus volatiles can be measured by an electronic nose (E-nose); in this paper, an E-nose is employed to classify the juice which is stored for different days. Feature extraction and classification are two important requirements for an E-nose. During the training process, a classifier can optimize its own parameters to achieve a better classification accuracy but cannot decide its input data which is treated by feature extraction methods, so the classification result is not always ideal. Label consistent KSVD (L-KSVD) is a novel technique which can extract the feature and classify the data at the same time, and such an operation can improve the classification accuracy. We propose an enhanced L-KSVD called E-LCKSVD for E-nose in this paper. During E-LCKSVD, we introduce a kernel function to the traditional L-KSVD and present a new initialization technique of its dictionary; finally, the weighted coefficients of different parts of its object function is studied, and enhanced quantum-behaved particle swarm optimization (EQPSO) is employed to optimize these coefficients. During the experimental section, we firstly find the classification accuracy of KSVD, and L-KSVD is improved with the help of the kernel function; this can prove that their ability of dealing nonlinear data is improved. Then, we compare the results of different dictionary initialization techniques and prove our proposed method is better. Finally, we find the optimal value of the weighted coefficients of the object function of E-LCKSVD that can make E-nose reach a better performance.


2021 ◽  
Vol 11 (15) ◽  
pp. 6983
Author(s):  
Maritza Mera-Gaona ◽  
Diego M. López ◽  
Rubiel Vargas-Canas

Identifying relevant data to support the automatic analysis of electroencephalograms (EEG) has become a challenge. Although there are many proposals to support the diagnosis of neurological pathologies, the current challenge is to improve the reliability of the tools to classify or detect abnormalities. In this study, we used an ensemble feature selection approach to integrate the advantages of several feature selection algorithms to improve the identification of the characteristics with high power of differentiation in the classification of normal and abnormal EEG signals. Discrimination was evaluated using several classifiers, i.e., decision tree, logistic regression, random forest, and Support Vecctor Machine (SVM); furthermore, performance was assessed by accuracy, specificity, and sensitivity metrics. The evaluation results showed that Ensemble Feature Selection (EFS) is a helpful tool to select relevant features from the EEGs. Thus, the stability calculated for the EFS method proposed was almost perfect in most of the cases evaluated. Moreover, the assessed classifiers evidenced that the models improved in performance when trained with the EFS approach’s features. In addition, the classifier of epileptiform events built using the features selected by the EFS method achieved an accuracy, sensitivity, and specificity of 97.64%, 96.78%, and 97.95%, respectively; finally, the stability of the EFS method evidenced a reliable subset of relevant features. Moreover, the accuracy, sensitivity, and specificity of the EEG detector are equal to or greater than the values reported in the literature.


MethodsX ◽  
2021 ◽  
Vol 8 ◽  
pp. 101166
Author(s):  
Timothy J. Fawcett ◽  
Chad S. Cooper ◽  
Ryan J. Longenecker ◽  
Joseph P. Walton

The article aims to develop a model for forecasting the characteristics of traffic flows in real-time based on the classification of applications using machine learning methods to ensure the quality of service. It is shown that the model can forecast the mean rate and frequency of packet arrival for the entire flow of each class separately. The prediction is based on information about the previous flows of this class and the first 15 packets of the active flow. Thus, the Random Forest Regression method reduces the prediction error by approximately 1.5 times compared to the standard mean estimate for transmitted packets issued at the switch interface.


2021 ◽  
Vol 6 (22) ◽  
pp. 51-59
Author(s):  
Mustazzihim Suhaidi ◽  
Rabiah Abdul Kadir ◽  
Sabrina Tiun

Extracting features from input data is vital for successful classification and machine learning tasks. Classification is the process of declaring an object into one of the predefined categories. Many different feature selection and feature extraction methods exist, and they are being widely used. Feature extraction, obviously, is a transformation of large input data into a low dimensional feature vector, which is an input to classification or a machine learning algorithm. The task of feature extraction has major challenges, which will be discussed in this paper. The challenge is to learn and extract knowledge from text datasets to make correct decisions. The objective of this paper is to give an overview of methods used in feature extraction for various applications, with a dataset containing a collection of texts taken from social media.


Author(s):  
Ana Maria Mihaela Iordache ◽  
Codruța Cornelia Dura ◽  
Cristina Coculescu ◽  
Claudia Isac ◽  
Ana Preda

Our study addresses the issue of telework adoption by countries in the European Union and draws up a few feasible scenarios aimed at improving telework’s degree of adaptability in Romania. We employed the dataset from the 2020 Eurofound survey on Living, Working and COVID-19 (Round 2) in order to extract ten relevant determinants of teleworking on the basis of 24,123 valid answers provided by respondents aged 18 and over: the availability of work equipment; the degree of satisfaction with the experience of working from home; the risks related to potential contamination with SARS-CoV-2 virus; the employees’ openness to adhering to working-from-home patterns; the possibility of maintaining work–life balance objectives while teleworking; the level of satisfaction on the amount and the quality of work submitted, etc. Our methodology entailed the employment of SAS Enterprise Guide software to perform a cluster analysis resulting in a preliminary classification of the EU countries with respect to the degree that they have been able to adapt to telework. Further on, in order to refine this taxonomy, a multilayer perceptron neural network with ten input variables in the initial layer, six neurons in the intermediate layer, and three neurons in the final layer was successfully trained. The results of our research demonstrate the existence of significant disparities in terms of telework adaptability, such as: low to moderate levels of adaptability (detected in countries such as Greece, Croatia, Portugal, Spain, Lithuania, Latvia, Poland, Italy); fair levels of adaptability (encountered in France, Slovakia, the Czech Republic, Hungary, Slovenia, or Romania); and high levels of adaptability (exhibited by intensely digitalized economies such Denmark, Sweden, Finland, Germany, Ireland, the Netherlands, Belgium, etc.).


2021 ◽  
Vol 15 ◽  
Author(s):  
Jesús Leonardo López-Hernández ◽  
Israel González-Carrasco ◽  
José Luis López-Cuadrado ◽  
Belén Ruiz-Mezcua

Nowadays, the recognition of emotions in people with sensory disabilities still represents a challenge due to the difficulty of generalizing and modeling the set of brain signals. In recent years, the technology that has been used to study a person’s behavior and emotions based on brain signals is the brain–computer interface (BCI). Although previous works have already proposed the classification of emotions in people with sensory disabilities using machine learning techniques, a model of recognition of emotions in people with visual disabilities has not yet been evaluated. Consequently, in this work, the authors present a twofold framework focused on people with visual disabilities. Firstly, auditory stimuli have been used, and a component of acquisition and extraction of brain signals has been defined. Secondly, analysis techniques for the modeling of emotions have been developed, and machine learning models for the classification of emotions have been defined. Based on the results, the algorithm with the best performance in the validation is random forest (RF), with an accuracy of 85 and 88% in the classification for negative and positive emotions, respectively. According to the results, the framework is able to classify positive and negative emotions, but the experimentation performed also shows that the framework performance depends on the number of features in the dataset and the quality of the Electroencephalogram (EEG) signals is a determining factor.


Sign in / Sign up

Export Citation Format

Share Document