Feature Selection for Text Classification Using Machine Learning Approaches

Companies always seek ways to make their professional employees stay with them to reduce extra recruiting and training costs. Predicting whether a particular employee may leave or not will help the company to make preventive decisions. Unlike physical systems, human resource problems cannot be described by a scientific-analytical formula. Therefore, machine learning approaches are the best tools for this aim. This paper presents a three-stage (pre-processing, processing, post-processing) framework for attrition prediction. An IBM HR dataset is chosen as the case study. Since there are several features in the dataset, the “max-out” feature selection method is proposed for dimension reduction in the pre-processing stage. This method is implemented for the IBM HR dataset. The coefficient of each feature in the logistic regression model shows the importance of the feature in attrition prediction. The results show improvement in the F1-score performance measure due to the “max-out” feature selection method. Finally, the validity of parameters is checked by training the model for multiple bootstrap datasets. Then, the average and standard deviation of parameters are analyzed to check the confidence value of the model’s parameters and their stability. The small standard deviation of parameters indicates that the model is stable and is more likely to generalize well.

Download Full-text

Optimal feature selection for machine learning based intrusion detection system by exploiting attribute dependence

Materials Today Proceedings ◽

10.1016/j.matpr.2021.04.643 ◽

2021 ◽

Author(s):

Ghanshyam Prasad Dubey ◽

Dr. Rakesh Kumar Bhujade

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Optimal Feature Selection ◽

Selection For ◽

Optimal Feature

Download Full-text

Feature Selection for Unsupervised Machine Learning of Accelerometer Data Physical Activity Clusters – A Systematic Review

Gait & Posture ◽

10.1016/j.gaitpost.2021.08.007 ◽

2021 ◽

Author(s):

Petra J. Jones ◽

Mike Catt ◽

Melanie J. Davies ◽

Charlotte L. Edwardson ◽

Evgeny M. Mirkes ◽

...

Keyword(s):

Physical Activity ◽

Machine Learning ◽

Systematic Review ◽

Feature Selection ◽

Accelerometer Data ◽

Unsupervised Machine Learning ◽

Selection For

Download Full-text

Incorporate Syntactic Information for Short Text Classification

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.268-270.697 ◽

2011 ◽

Vol 268-270 ◽

pp. 697-700

Author(s):

Rui Xue Duan ◽

Xiao Jie Wang ◽

Wen Feng Li

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Environment ◽

Text Classification ◽

The Internet ◽

Selection Methods ◽

Text Documents ◽

Short Text ◽

Syntactic Information ◽

Dependency Relations

As the volume of online short text documents grow tremendously on the Internet, it is much more urgent to solve the task of organizing the short texts well. However, the traditional feature selection methods cannot suitable for the short text. In this paper, we proposed a method to incorporate syntactic information for the short text. It emphasizes the feature which has more dependency relations with other words. The classifier SVM and machine learning environment Weka are involved in our experiments. The experiment results show that incorporate syntactic information in the short text, we can get more powerful features than traditional feature selection methods, such as DF, CHI. The precision of short text classification improved from 86.2% to 90.8%.

Download Full-text

Feature Selection for Machine Learning-Based Early Detection of Distributed Cyber Attacks

2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech) ◽

10.1109/dasc/picom/datacom/cyberscitec.2018.00040 ◽

2018 ◽

Cited By ~ 9

Author(s):

Yaokai Feng ◽

Hitoshi Akiyama ◽

Liang Lu ◽

Kouichi Sakurai

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Early Detection ◽

Cyber Attacks ◽

Selection For

Download Full-text

Application of all relevant feature selection for failure analysis of parameter-induced simulation crashes in climate models

Geoscientific Model Development Discussions ◽

10.5194/gmdd-8-5419-2015 ◽

2015 ◽

Vol 8 (7) ◽

pp. 5419-5435 ◽

Cited By ~ 1

Author(s):

W. Paja ◽

M. Wrzesie&nacute; ◽

R. Niemiec ◽

W. R. Rudnicki

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Climate Models ◽

Original Study ◽

Relative Importance ◽

Relevant Feature ◽

Machine Learning Methods ◽

Selection For ◽

Robust Prediction ◽

Physical Components

Abstract. The climate models are extremely complex pieces of software. They reflect best knowledge on physical components of the climate, nevertheless, they contain several parameters, which are too weakly constrained by observations, and can potentially lead to a crash of simulation. Recently a study by Lucas et al. (2013) has shown that machine learning methods can be used for predicting which combinations of parameters can lead to crash of simulation, and hence which processes described by these parameters need refined analyses. In the current study we reanalyse the dataset used in this research using different methodology. We confirm the main conclusion of the original study concerning suitability of machine learning for prediction of crashes. We show, that only three of the eight parameters indicated in the original study as relevant for prediction of the crash are indeed strongly relevant, three other are relevant but redundant, and two are not relevant at all. We also show that the variance due to split of data between training and validation sets has large influence both on accuracy of predictions and relative importance of variables, hence only cross-validated approach can deliver robust prediction of performance and relevance of variables.

Download Full-text

Detection of Economy-Related Turkish Tweets Based on Machine Learning Approaches

10.4018/978-1-7998-8413-2.ch008 ◽

2022 ◽

pp. 171-195

Author(s):

Jale Bektaş

Keyword(s):

Machine Learning ◽

Text Mining ◽

Text Classification ◽

Integration Method ◽

Classification Problem ◽

Feature Representation ◽

Learning Approaches ◽

Machine Learning Methods ◽

Linguistic Approach ◽

Turkish Language

Conducting NLP for Turkish is a lot harder than other Latin-based languages such as English. In this study, by using text mining techniques, a pre-processing frame is conducted in which TF-IDF values are calculated in accordance with a linguistic approach on 7,731 tweets shared by 13 famous economists in Turkey, retrieved from Twitter. Then, the classification results are compared with four common machine learning methods (SVM, Naive Bayes, LR, and integration LR with SVM). The features represented by the TF-IDF are experimented in different N-grams. The findings show the success of a text classification problem is relative with the feature representation methods, and the performance superiority of SVM is better compared to other ML methods with unigram feature representation. The best results are obtained via the integration method of SVM with LR with the Acc of 82.9%. These results show that these methodologies are satisfying for the Turkish language.

Download Full-text

A novel machine learning based feature selection for motor imagery EEG signal classification in Internet of medical things environment

Future Generation Computer Systems ◽

10.1016/j.future.2019.01.048 ◽

2019 ◽

Vol 98 ◽

pp. 419-434 ◽

Cited By ~ 15

Author(s):

Rajdeep Chatterjee ◽

Tanmoy Maitra ◽

SK Hafizul Islam ◽

Mohammad Mehedi Hassan ◽

Atif Alamri ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Motor Imagery ◽

Signal Classification ◽

Eeg Signal ◽

Internet Of Medical Things ◽

Eeg Signal Classification ◽

Selection For

Download Full-text

Computational prediction of implantation outcome after embryo transfer

Health Informatics Journal ◽

10.1177/1460458219892138 ◽

2019 ◽

Vol 26 (3) ◽

pp. 1810-1826 ◽

Cited By ~ 3

Author(s):

Behnaz Raef ◽

Masoud Maleki ◽

Reza Ferdousi

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Prediction Model ◽

Embryo Transfer ◽

Area Under The Curve ◽

Computational Prediction ◽

Support Vector ◽

Human Menopausal Gonadotropin ◽

Optimum Number ◽

Learning Approaches

The aim of this study is to develop a computational prediction model for implantation outcome after an embryo transfer cycle. In this study, information of 500 patients and 1360 transferred embryos, including cleavage and blastocyst stages and fresh or frozen embryos, from April 2016 to February 2018, were collected. The dataset containing 82 attributes and a target label (indicating positive and negative implantation outcomes) was constructed. Six dominant machine learning approaches were examined based on their performance to predict embryo transfer outcomes. Also, feature selection procedures were used to identify effective predictive factors and recruited to determine the optimum number of features based on classifiers performance. The results revealed that random forest was the best classifier (accuracy = 90.40% and area under the curve = 93.74%) with optimum features based on a 10-fold cross-validation test. According to the Support Vector Machine-Feature Selection algorithm, the ideal numbers of features are 78. Follicle stimulating hormone/human menopausal gonadotropin dosage for ovarian stimulation was the most important predictive factor across all examined embryo transfer features. The proposed machine learning-based prediction model could predict embryo transfer outcome and implantation of embryos with high accuracy, before the start of an embryo transfer cycle.

Download Full-text

Dissimilarity based feature selection for text classification

Proceedings of the International Conference & Workshop on Emerging Trends in Technology - ICWET '11 ◽

10.1145/1980022.1980129 ◽

2011 ◽

Cited By ~ 1

Author(s):

S. Manjunath ◽

B. S. Harish ◽

D. S. Guru

Keyword(s):

Feature Selection ◽

Text Classification ◽

Selection For

Download Full-text