An Improved Machine Learning-Based Employees Attrition Prediction Framework with Emphasis on Feature Selection

Companies always seek ways to make their professional employees stay with them to reduce extra recruiting and training costs. Predicting whether a particular employee may leave or not will help the company to make preventive decisions. Unlike physical systems, human resource problems cannot be described by a scientific-analytical formula. Therefore, machine learning approaches are the best tools for this aim. This paper presents a three-stage (pre-processing, processing, post-processing) framework for attrition prediction. An IBM HR dataset is chosen as the case study. Since there are several features in the dataset, the “max-out” feature selection method is proposed for dimension reduction in the pre-processing stage. This method is implemented for the IBM HR dataset. The coefficient of each feature in the logistic regression model shows the importance of the feature in attrition prediction. The results show improvement in the F1-score performance measure due to the “max-out” feature selection method. Finally, the validity of parameters is checked by training the model for multiple bootstrap datasets. Then, the average and standard deviation of parameters are analyzed to check the confidence value of the model’s parameters and their stability. The small standard deviation of parameters indicates that the model is stable and is more likely to generalize well.

Download Full-text

Flight State Identification of a Self-Sensing Wing via an Improved Feature Selection Method and Machine Learning Approaches

Sensors ◽

10.3390/s18051379 ◽

2018 ◽

Vol 18 (5) ◽

pp. 1379 ◽

Cited By ~ 4

Author(s):

Xi Chen ◽

Fotis Kopsaftopoulos ◽

Qi Wu ◽

He Ren ◽

Fu-Kuo Chang

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Learning Approaches ◽

State Identification

Download Full-text

Machine learning approaches for classification of colorectal cancer with and without feature selection method on Microarray Data

Gene Reports ◽

10.1016/j.genrep.2021.101419 ◽

2021 ◽

pp. 101419

Author(s):

Elham Nazari ◽

Mehran Aghemiri ◽

Amir Avan ◽

Amin Mehrabian ◽

Hamed Tabesh

Keyword(s):

Colorectal Cancer ◽

Machine Learning ◽

Feature Selection ◽

Microarray Data ◽

Feature Selection Method ◽

Selection Method ◽

Learning Approaches

Download Full-text

NIMG-46. RADIOGENOMIC FEATURES PREDICT CLINICALLY RELEVANT GENOME-WIDE ALTERATION SIGNATURES IN GLIOBLASTOMA

Neuro-Oncology ◽

10.1093/neuonc/noaa215.659 ◽

2020 ◽

Vol 22 (Supplement_2) ◽

pp. ii158-ii158

Author(s):

Nicholas Nuechterlein ◽

Beibin Li ◽

James Fink ◽

David Haynor ◽

Eric Holland ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Molecular Subtypes ◽

Feature Selection Method ◽

Selection Method ◽

Versus Group ◽

Mri Features ◽

Genome Wide ◽

Group 2 ◽

Group 1

Abstract BACKGROUND Previously, we have shown that combined whole-exome sequencing (WES) and genome-wide somatic copy number alteration (SCNA) information can separate IDH1/2-wildtype glioblastoma into two prognostic molecular subtypes (Group 1 and Group 2) and that these subtypes cannot be distinguished by epigenetic or clinical features. However, the potential for radiographic features to discriminate between these molecular subtypes has not been established. METHODS Radiogenomic features (n=35,400) were extracted from 46 multiparametric, pre-operative magnetic resonance imaging (MRI) of IDH1/2-wildtype glioblastoma patients from The Cancer Imaging Archive, all of whom have corresponding WES and SCNA data in The Cancer Genome Atlas. We developed a novel feature selection method that leverages the structure of extracted radiogenomic MRI features to mitigate the dimensionality challenge posed by the disparity between the number of features and patients in our cohort. Seven traditional machine learning classifiers were trained to distinguish Group 1 versus Group 2 using our feature selection method. Our feature selection was compared to lasso feature selection, recursive feature elimination, and variance thresholding. RESULTS We are able to classify Group 1 versus Group 2 glioblastomas with a cross-validated area under the curve (AUC) score of 0.82 using ridge logistic regression and our proposed feature selection method, which reduces the size of our feature set from 35,400 to 288. An interrogation of the selected features suggests that features describing contours in the T2 abnormality region on the FLAIR MRI modality may best distinguish these two groups from one another. CONCLUSIONS We successfully trained a machine learning model that allows for relevant targeted feature extraction from standard MRI to accurately predict molecularly-defined risk-stratifying IDH1/2-wildtype glioblastoma patient groups. This algorithm may be applied to future prospective studies to assess the utility of MRI as a surrogate for costly prognostic genomic studies.

Download Full-text

Radiogenomic modeling predicts survival-associated prognostic groups in glioblastoma

Neuro-Oncology Advances ◽

10.1093/noajnl/vdab004 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Nicholas Nuechterlein ◽

Beibin Li ◽

Abdullah Feroze ◽

Eric C Holland ◽

Linda Shapiro ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Molecular Subtypes ◽

Feature Selection Method ◽

Area Under The Curve ◽

Selection Method ◽

Recursive Feature Elimination ◽

Signal Abnormality ◽

Mri Features ◽

Mri Scans

Abstract Background Combined whole-exome sequencing (WES) and somatic copy number alteration (SCNA) information can separate isocitrate dehydrogenase (IDH)1/2-wildtype glioblastoma into two prognostic molecular subtypes, which cannot be distinguished by epigenetic or clinical features. The potential for radiographic features to discriminate between these molecular subtypes has yet to be established. Methods Radiologic features (n = 35 340) were extracted from 46 multisequence, pre-operative magnetic resonance imaging (MRI) scans of IDH1/2-wildtype glioblastoma patients from The Cancer Imaging Archive (TCIA), all of whom have corresponding WES/SCNA data. We developed a novel feature selection method that leverages the structure of extracted MRI features to mitigate the dimensionality challenge posed by the disparity between a large number of features and the limited patients in our cohort. Six traditional machine learning classifiers were trained to distinguish molecular subtypes using our feature selection method, which was compared to least absolute shrinkage and selection operator (LASSO) feature selection, recursive feature elimination, and variance thresholding. Results We were able to classify glioblastomas into two prognostic subgroups with a cross-validated area under the curve score of 0.80 (±0.03) using ridge logistic regression on the 15-dimensional principle component analysis (PCA) embedding of the features selected by our novel feature selection method. An interrogation of the selected features suggested that features describing contours in the T2 signal abnormality region on the T2-weighted fluid-attenuated inversion recovery (FLAIR) MRI sequence may best distinguish these two groups from one another. Conclusions We successfully trained a machine learning model that allows for relevant targeted feature extraction from standard MRI to accurately predict molecularly-defined risk-stratifying IDH1/2-wildtype glioblastoma patient groups.

Download Full-text

New feature selection method based on neural network and machine learning

2016 IEEE International Multidisciplinary Conference on Engineering Technology (IMCET) ◽

10.1109/imcet.2016.7777431 ◽

2016 ◽

Cited By ~ 4

Author(s):

Nicole Challita ◽

Mohamad Khalil ◽

Pierre Beauseroy

Keyword(s):

Neural Network ◽

Machine Learning ◽

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

New Feature

Download Full-text

A Machine Learning Framework for Intrusion Detection System in IoT Networks Using an Ensemble Feature Selection Method

10.1109/iemcon53756.2021.9623082 ◽

2021 ◽

Author(s):

Ge Guo

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Feature Selection Method ◽

Selection Method ◽

Learning Framework

Download Full-text

Random Forests Followed by Computed ABC Analysis as a Feature Selection Method for Machine Learning in Biomedical Data

Studies in Classification, Data Analysis, and Knowledge Organization - Advanced Studies in Classification and Data Science ◽

10.1007/978-981-15-3311-2_5 ◽

2020 ◽

pp. 57-69

Author(s):

Jörn Lötsch ◽

Alfred Ultsch

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forests ◽

Feature Selection Method ◽

Selection Method ◽

Biomedical Data ◽

Abc Analysis

Download Full-text

A new feature selection method based on machine learning technique for air quality dataset

Journal of Statistics and Management Systems ◽

10.1080/09720510.2019.1609726 ◽

2019 ◽

Vol 22 (4) ◽

pp. 697-705 ◽

Cited By ~ 9

Author(s):

Jasleen Kaur Sethi ◽

Mamta Mittal

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Air Quality ◽

Feature Selection Method ◽

Selection Method ◽

Machine Learning Technique ◽

Learning Technique ◽

New Feature

Download Full-text

Diagnostic Performance of 2D and 3D T2WI-Based Radiomics Features With Machine Learning Algorithms to Distinguish Solid Solitary Pulmonary Lesion

Frontiers in Oncology ◽

10.3389/fonc.2021.683587 ◽

2021 ◽

Vol 11 ◽

Author(s):

Qi Wan ◽

Jiaxuan Zhou ◽

Xiaoying Xia ◽

Jianfeng Hu ◽

Peng Wang ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Diagnostic Performance ◽

Feature Selection Method ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approaches ◽

Selection Methods ◽

Linear Discriminant ◽

2D And 3D

ObjectiveTo evaluate the performance of 2D and 3D radiomics features with different machine learning approaches to classify SPLs based on magnetic resonance(MR) T2 weighted imaging (T2WI).Material and MethodsA total of 132 patients with pathologically confirmed SPLs were examined and randomly divided into training (n = 92) and test datasets (n = 40). A total of 1692 3D and 1231 2D radiomics features per patient were extracted. Both radiomics features and clinical data were evaluated. A total of 1260 classification models, comprising 3 normalization methods, 2 dimension reduction algorithms, 3 feature selection methods, and 10 classifiers with 7 different feature numbers (confined to 3–9), were compared. The ten-fold cross-validation on the training dataset was applied to choose the candidate final model. The area under the receiver operating characteristic curve (AUC), precision-recall plot, and Matthews Correlation Coefficient were used to evaluate the performance of machine learning approaches.ResultsThe 3D features were significantly superior to 2D features, showing much more machine learning combinations with AUC greater than 0.7 in both validation and test groups (129 vs. 11). The feature selection method Analysis of Variance(ANOVA), Recursive Feature Elimination(RFE) and the classifier Logistic Regression(LR), Linear Discriminant Analysis(LDA), Support Vector Machine(SVM), Gaussian Process(GP) had relatively better performance. The best performance of 3D radiomics features in the test dataset (AUC = 0.824, AUC-PR = 0.927, MCC = 0.514) was higher than that of 2D features (AUC = 0.740, AUC-PR = 0.846, MCC = 0.404). The joint 3D and 2D features (AUC=0.813, AUC-PR = 0.926, MCC = 0.563) showed similar results as 3D features. Incorporating clinical features with 3D and 2D radiomics features slightly improved the AUC to 0.836 (AUC-PR = 0.918, MCC = 0.620) and 0.780 (AUC-PR = 0.900, MCC = 0.574), respectively.ConclusionsAfter algorithm optimization, 2D feature-based radiomics models yield favorable results in differentiating malignant and benign SPLs, but 3D features are still preferred because of the availability of more machine learning algorithmic combinations with better performance. Feature selection methods ANOVA and RFE, and classifier LR, LDA, SVM and GP are more likely to demonstrate better diagnostic performance for 3D features in the current study.

Download Full-text