A literature review of feature selection techniques and applications: Review of feature selection in data mining

The method presented in this paper is novel as a natural combination of two mutually dependent steps. Feature selection is a key element (first step) in our classification system, which was employed during the 2010 International RSCTC data mining (bioinformatics) Challenge. The second step may be implemented using any suitable classifier such as linear regression, support vector machine or neural networks. We conducted leave-one-out (LOO) experiments with several feature selection techniques and classifiers. Based on the LOO evaluations, we decided to use feature selection with the separation type Wilcoxon-based criterion for all final submissions. The method presented in this paper was tested successfully during the RSCTC data mining Challenge, where we achieved the top score in the Basic track.

Download Full-text

A Survey on Phishing Detection and The Importance of Feature Selection In Data Mining Classification Algorithms

Issue 4 - Journal of Science and Technology ◽

10.46243/jst.2020.v5.i6.pp11-18 ◽

2020 ◽

pp. 11-18

Keyword(s):

Data Mining ◽

Feature Selection ◽

Support Vector ◽

Classification Algorithms ◽

End User ◽

Preparation Methods ◽

Survey Paper ◽

Vector Machines ◽

Feature Selection Techniques ◽

Phishing Detection

: In this era of Internet, the issue of security of information is at its peak. One of the main threats in this cyber world is phishing attacks which is an email or website fraud method that targets the genuine webpage or an email and hacks it without the consent of the end user. There are various techniques which help to classify whether the website or an email is legitimate or fake. The major contributors in the process of detection of these phishing frauds include the classification algorithms, feature selection techniques or dataset preparation methods and the feature extraction that plays an important role in detection as well as in prevention of these attacks. This Survey Paper studies the effect of all these contributors and the approaches that are applied in the study conducted on the recent papers. Some of the classification algorithms that are implemented includes Decision tree, Random Forest , Support Vector Machines, Logistic Regression , Lazy K Star, Naive Bayes and J48 etc.

Download Full-text

FEATURE SELECTION FOR OPTIMIZATION OF WAVELET PACKET DECOMPOSITION IN RELIABILITY ANALYSIS OF SYSTEMS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213013600117 ◽

2013 ◽

Vol 22 (05) ◽

pp. 1360011 ◽

Cited By ~ 4

Author(s):

RANDALL WALD ◽

TAGHI M. KHOSHGOFTAAR ◽

JOHN C. SLOAN

Keyword(s):

Data Mining ◽

Feature Selection ◽

Wavelet Packet ◽

Vibration Signal ◽

Machine Learning Algorithms ◽

Wavelet Packet Decomposition ◽

Time Frequency ◽

Speed Up ◽

Frequency Domain Techniques ◽

Feature Selection Techniques

One of the most important types of signal found in the area of machine condition monitoring/prognostic health monitoring (MCM/PHM) is the vibration signal, a type of waveform. Many time-frequency domain techniques have been proposed to interpret such signals, including wavelet packet decomposition (WPD). Previous work has shown how to extend the WPD algorithm to operate on streaming signals, but the number of output variables becomes exponential in the number of levels of decomposition, hindering data mining in limited-memory environments. Feature selection techniques, well understood in other areas of data mining, can be used to greatly reduce the number of output variables and speed up the machine learning algorithms. This paper presents a case study comparing two versions of WPD both with and without feature selection, demonstrating that removing most of the features produced by the WPD does not impair its performance within the context of MCM/PHM.

Download Full-text

Applications of Feature Selection and Regression Techniques in Materials Design

Advances in Chemical and Materials Engineering - Computational Approaches to Materials Design ◽

10.4018/978-1-5225-0290-6.ch008 ◽

2016 ◽

pp. 224-251 ◽

Cited By ~ 2

Author(s):

Partha Dey ◽

Joe Bible ◽

Swati Dey ◽

Somnath Datta

Keyword(s):

Data Mining ◽

Feature Selection ◽

Soft Computing ◽

Material Property ◽

Target Material ◽

Materials Design ◽

Hidden Knowledge ◽

Regression Techniques ◽

Noisy Output ◽

Feature Selection Techniques

Feature selection is considered as an important preprocessing step to data mining and soft computing, whereas regression is a collection of methods to optimally assess the signal from a noisy output. Both seek to arrive at the dependence and relation between different attributes and a target material property. In the present chapter a flock of regression and feature selection techniques are discussed, and the kind of results that can be obtained with each of them has been illustrated with the help of a dataset on steel. The different methods are capable of abstracting data in different forms, thus revealing hidden knowledge from different perspectives. Choosing the most appropriate method depends on the application at hand and the kind of objective that one is looking for.

Download Full-text

Review On Feature Selection Techniques in Data Mining

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v5i11.187191 ◽

2017 ◽

Vol 5 (11) ◽

pp. 187-191

Author(s):

S. Ramadass ◽

◽

M.Gunasekaran .

Keyword(s):

Data Mining ◽

Feature Selection ◽

Feature Selection Techniques

Download Full-text

Success/Failure Prediction of Noninvasive Mechanical Ventilation in Intensive Care Units

Methods of Information in Medicine ◽

10.3414/me14-01-0015 ◽

2016 ◽

Vol 55 (03) ◽

pp. 234-241 ◽

Cited By ~ 6

Author(s):

Félix Martín-González ◽

Javier González-Robledo ◽

Fernando Sánchez-Hernández ◽

María Moreno-García

Keyword(s):

Data Mining ◽

Feature Selection ◽

Intensive Care ◽

Intensive Care Units ◽

Influential Factors ◽

Selection Methods ◽

Noninvasive Mechanical Ventilation ◽

Mining Methods ◽

The One ◽

Feature Selection Techniques

SummaryObjectives: This paper addresses the problem of decision-making in relation to the administration of noninvasive mechanical ventila tion (NIMV) in intensive care units.Methods: Data mining methods were employed to find out the factors influencing the success/failure of NIMV and to predict its results in future patients. These artificial intelligence-based methods have not been applied in this field in spite of the good results obtained in other medical areas.Results: Feature selection methods provided the most influential variables in the success/ failure of NIMV, such as NIMV hours, PaCO2 at the start, PaO2 / FiO2 ratio at the start, hematocrit at the start or PaO2 / FiO2 ratio after two hours. These methods were also used in the preprocessing step with the aim of improving the results of the classifiers. The algorithms provided the best results when the dataset used as input was the one containing the attributes selected with the CFS method. Conclusions: Data mining methods can be successfully applied to determine the most influential factors in the success/failure of NIMV and also to predict NIMV results in future patients. The results provided by classifiers can be improved by preprocessing the data with feature selection techniques.

Download Full-text

Trends and Opportunities in Health Analytics as a Service and Implications for Use in Low Resource Settings: A Literature Review Abstract (Preprint)

10.2196/preprints.15737 ◽

2019 ◽

Author(s):

Meghana Bastwadkar ◽

Carolyn McGregor ◽

S Balaji

Keyword(s):

Data Mining ◽

Cloud Computing ◽

Big Data ◽

Intensive Care ◽

Literature Review ◽

Health Monitoring ◽

Data Analytics ◽

Neonatal Intensive Care ◽

Big Data Analytics ◽

Healthcare Facilities

BACKGROUND This paper presents a systematic literature review of existing remote health monitoring systems with special reference to neonatal intensive care (NICU). Articles on NICU clinical decision support systems (CDSSs) which used cloud computing and big data analytics were surveyed. OBJECTIVE The aim of this study is to review technologies used to provide NICU CDSS. The literature review highlights the gaps within frameworks providing HAaaS paradigm for big data analytics METHODS Literature searches were performed in Google Scholar, IEEE Digital Library, JMIR Medical Informatics, JMIR Human Factors and JMIR mHealth and only English articles published on and after 2015 were included. The overall search strategy was to retrieve articles that included terms that were related to “health analytics” and “as a service” or “internet of things” / ”IoT” and “neonatal intensive care unit” / ”NICU”. Title and abstracts were reviewed to assess relevance. RESULTS In total, 17 full papers met all criteria and were selected for full review. Results showed that in most cases bedside medical devices like pulse oximeters have been used as the sensor device. Results revealed a great diversity in data acquisition techniques used however in most cases the same physiological data (heart rate, respiratory rate, blood pressure, blood oxygen saturation) was acquired. Results obtained have shown that in most cases data analytics involved data mining classification techniques, fuzzy logic-NICU decision support systems (DSS) etc where as big data analytics involving Artemis cloud data analysis have used CRISP-TDM and STDM temporal data mining technique to support clinical research studies. In most scenarios both real-time and retrospective analytics have been performed. Results reveal that most of the research study has been performed within small and medium sized urban hospitals so there is wide scope for research within rural and remote hospitals with NICU set ups. Results have shown creating a HAaaS approach where data acquisition and data analytics are not tightly coupled remains an open research area. Reviewed articles have described architecture and base technologies for neonatal health monitoring with an IoT approach. CONCLUSIONS The current work supports implementation of the expanded Artemis cloud as a commercial offering to healthcare facilities in Canada and worldwide to provide cloud computing services to critical care. However, no work till date has been completed for low resource setting environment within healthcare facilities in India which results in scope for research. It is observed that all the big data analytics frameworks which have been reviewed in this study have tight coupling of components within the framework, so there is a need for a framework with functional decoupling of components.

Download Full-text

On the value of filter feature selection techniques in homogeneous ensembles effort estimation

Journal of Software Evolution and Process ◽

10.1002/smr.2343 ◽

2021 ◽

Author(s):

Mohamed Hosni ◽

Ali Idri ◽

Alain Abran

Keyword(s):

Feature Selection ◽

Effort Estimation ◽

Feature Selection Techniques

Download Full-text

Arabic Named Entity Recognition on Social Media based on feature selection techniques usi ng SVM-RFE

2020 Fourth International Conference On Intelligent Computing in Data Sciences (ICDS) ◽

10.1109/icds50568.2020.9268762 ◽

2020 ◽

Author(s):

Brahim AIT BEN ALI ◽

Soukaina MIHI ◽

Ismail EL BAZI ◽

Nabil LAACHFOUBI

Keyword(s):

Social Media ◽

Feature Selection ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Feature Selection Techniques

Download Full-text

A lazy feature selection method for multi-label classification

Intelligent Data Analysis ◽

10.3233/ida-194878 ◽

2021 ◽

Vol 25 (1) ◽

pp. 21-34

Author(s):

Rafael B. Pereira ◽

Alexandre Plastino ◽

Bianca Zadrozny ◽

Luiz H.C. Merschmann

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Feature Selection Method ◽

Selection Method ◽

Video Classification ◽

Classification Problems ◽

Class Label ◽

New Feature ◽

Feature Selection Techniques ◽

Biomolecular Analysis

In many important application domains, such as text categorization, biomolecular analysis, scene or video classification and medical diagnosis, instances are naturally associated with more than one class label, giving rise to multi-label classification problems. This has led, in recent years, to a substantial amount of research in multi-label classification. More specifically, feature selection methods have been developed to allow the identification of relevant and informative features for multi-label classification. This work presents a new feature selection method based on the lazy feature selection paradigm and specific for the multi-label context. Experimental results show that the proposed technique is competitive when compared to multi-label feature selection techniques currently used in the literature, and is clearly more scalable, in a scenario where there is an increasing amount of data.

Download Full-text