Machine Learning Based FaaS Cloud Region Selection Method

Companies always seek ways to make their professional employees stay with them to reduce extra recruiting and training costs. Predicting whether a particular employee may leave or not will help the company to make preventive decisions. Unlike physical systems, human resource problems cannot be described by a scientific-analytical formula. Therefore, machine learning approaches are the best tools for this aim. This paper presents a three-stage (pre-processing, processing, post-processing) framework for attrition prediction. An IBM HR dataset is chosen as the case study. Since there are several features in the dataset, the “max-out” feature selection method is proposed for dimension reduction in the pre-processing stage. This method is implemented for the IBM HR dataset. The coefficient of each feature in the logistic regression model shows the importance of the feature in attrition prediction. The results show improvement in the F1-score performance measure due to the “max-out” feature selection method. Finally, the validity of parameters is checked by training the model for multiple bootstrap datasets. Then, the average and standard deviation of parameters are analyzed to check the confidence value of the model’s parameters and their stability. The small standard deviation of parameters indicates that the model is stable and is more likely to generalize well.

Download Full-text

NIMG-46. RADIOGENOMIC FEATURES PREDICT CLINICALLY RELEVANT GENOME-WIDE ALTERATION SIGNATURES IN GLIOBLASTOMA

Neuro-Oncology ◽

10.1093/neuonc/noaa215.659 ◽

2020 ◽

Vol 22 (Supplement_2) ◽

pp. ii158-ii158

Author(s):

Nicholas Nuechterlein ◽

Beibin Li ◽

James Fink ◽

David Haynor ◽

Eric Holland ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Molecular Subtypes ◽

Feature Selection Method ◽

Selection Method ◽

Versus Group ◽

Mri Features ◽

Genome Wide ◽

Group 2 ◽

Group 1

Abstract BACKGROUND Previously, we have shown that combined whole-exome sequencing (WES) and genome-wide somatic copy number alteration (SCNA) information can separate IDH1/2-wildtype glioblastoma into two prognostic molecular subtypes (Group 1 and Group 2) and that these subtypes cannot be distinguished by epigenetic or clinical features. However, the potential for radiographic features to discriminate between these molecular subtypes has not been established. METHODS Radiogenomic features (n=35,400) were extracted from 46 multiparametric, pre-operative magnetic resonance imaging (MRI) of IDH1/2-wildtype glioblastoma patients from The Cancer Imaging Archive, all of whom have corresponding WES and SCNA data in The Cancer Genome Atlas. We developed a novel feature selection method that leverages the structure of extracted radiogenomic MRI features to mitigate the dimensionality challenge posed by the disparity between the number of features and patients in our cohort. Seven traditional machine learning classifiers were trained to distinguish Group 1 versus Group 2 using our feature selection method. Our feature selection was compared to lasso feature selection, recursive feature elimination, and variance thresholding. RESULTS We are able to classify Group 1 versus Group 2 glioblastomas with a cross-validated area under the curve (AUC) score of 0.82 using ridge logistic regression and our proposed feature selection method, which reduces the size of our feature set from 35,400 to 288. An interrogation of the selected features suggests that features describing contours in the T2 abnormality region on the FLAIR MRI modality may best distinguish these two groups from one another. CONCLUSIONS We successfully trained a machine learning model that allows for relevant targeted feature extraction from standard MRI to accurately predict molecularly-defined risk-stratifying IDH1/2-wildtype glioblastoma patient groups. This algorithm may be applied to future prospective studies to assess the utility of MRI as a surrogate for costly prognostic genomic studies.

Download Full-text

Radiogenomic modeling predicts survival-associated prognostic groups in glioblastoma

Neuro-Oncology Advances ◽

10.1093/noajnl/vdab004 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Nicholas Nuechterlein ◽

Beibin Li ◽

Abdullah Feroze ◽

Eric C Holland ◽

Linda Shapiro ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Molecular Subtypes ◽

Feature Selection Method ◽

Area Under The Curve ◽

Selection Method ◽

Recursive Feature Elimination ◽

Signal Abnormality ◽

Mri Features ◽

Mri Scans

Abstract Background Combined whole-exome sequencing (WES) and somatic copy number alteration (SCNA) information can separate isocitrate dehydrogenase (IDH)1/2-wildtype glioblastoma into two prognostic molecular subtypes, which cannot be distinguished by epigenetic or clinical features. The potential for radiographic features to discriminate between these molecular subtypes has yet to be established. Methods Radiologic features (n = 35 340) were extracted from 46 multisequence, pre-operative magnetic resonance imaging (MRI) scans of IDH1/2-wildtype glioblastoma patients from The Cancer Imaging Archive (TCIA), all of whom have corresponding WES/SCNA data. We developed a novel feature selection method that leverages the structure of extracted MRI features to mitigate the dimensionality challenge posed by the disparity between a large number of features and the limited patients in our cohort. Six traditional machine learning classifiers were trained to distinguish molecular subtypes using our feature selection method, which was compared to least absolute shrinkage and selection operator (LASSO) feature selection, recursive feature elimination, and variance thresholding. Results We were able to classify glioblastomas into two prognostic subgroups with a cross-validated area under the curve score of 0.80 (±0.03) using ridge logistic regression on the 15-dimensional principle component analysis (PCA) embedding of the features selected by our novel feature selection method. An interrogation of the selected features suggested that features describing contours in the T2 signal abnormality region on the T2-weighted fluid-attenuated inversion recovery (FLAIR) MRI sequence may best distinguish these two groups from one another. Conclusions We successfully trained a machine learning model that allows for relevant targeted feature extraction from standard MRI to accurately predict molecularly-defined risk-stratifying IDH1/2-wildtype glioblastoma patient groups.

Download Full-text

An Intelligent MAC Protocol Selection Method based on Machine Learning in Wireless Sensor Networks

KSII Transactions on Internet and Information Systems ◽

10.3837/tiis.2018.11.014 ◽

2018 ◽

Vol 12 (11) ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Wireless Sensor Networks ◽

Sensor Networks ◽

Mac Protocol ◽

Selection Method ◽

Wireless Sensor ◽

Protocol Selection

Download Full-text

New feature selection method based on neural network and machine learning

2016 IEEE International Multidisciplinary Conference on Engineering Technology (IMCET) ◽

10.1109/imcet.2016.7777431 ◽

2016 ◽

Cited By ~ 4

Author(s):

Nicole Challita ◽

Mohamad Khalil ◽

Pierre Beauseroy

Keyword(s):

Neural Network ◽

Machine Learning ◽

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

New Feature

Download Full-text

A Machine Learning Framework for Intrusion Detection System in IoT Networks Using an Ensemble Feature Selection Method

10.1109/iemcon53756.2021.9623082 ◽

2021 ◽

Author(s):

Ge Guo

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Feature Selection Method ◽

Selection Method ◽

Learning Framework

Download Full-text

Random Forests Followed by Computed ABC Analysis as a Feature Selection Method for Machine Learning in Biomedical Data

Studies in Classification, Data Analysis, and Knowledge Organization - Advanced Studies in Classification and Data Science ◽

10.1007/978-981-15-3311-2_5 ◽

2020 ◽

pp. 57-69

Author(s):

Jörn Lötsch ◽

Alfred Ultsch

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forests ◽

Feature Selection Method ◽

Selection Method ◽

Biomedical Data ◽

Abc Analysis

Download Full-text

Bio-informatics and psychiatric epidemiology

Practical Psychiatric Epidemiology ◽

10.1093/med/9780198735564.003.0021 ◽

2020 ◽

pp. 359-372

Author(s):

Nicola Voyle ◽

Maximilian Kerz ◽

Steven Kiddle ◽

Richard Dobson

Keyword(s):

Machine Learning ◽

Data Cleaning ◽

Epidemiological Studies ◽

Selection Method ◽

Psychiatric Epidemiology ◽

Large Datasets ◽

Data Exploration ◽

Feature Identification ◽

Data Formats ◽

Method Selection

This chapter highlights the methodologies which are increasingly being applied to large datasets or ‘big data’, with an emphasis on bio-informatics. The first stage of any analysis is to collect data from a well-designed study. The chapter begins by looking at the raw data that arises from epidemiological studies and highlighting the first stages in creating clean data that can be used to draw informative conclusions through analysis. The remainder of the chapter covers data formats, data exploration, data cleaning, missing data (i.e. the lack of data for a variable in an observation), reproducibility, classification versus regression, feature identification and selection, method selection (e.g. supervised versus unsupervised machine learning), training a classifier, and drawing conclusions from modelling.

Download Full-text

A new feature selection method based on machine learning technique for air quality dataset

Journal of Statistics and Management Systems ◽

10.1080/09720510.2019.1609726 ◽

2019 ◽

Vol 22 (4) ◽

pp. 697-705 ◽

Cited By ~ 9

Author(s):

Jasleen Kaur Sethi ◽

Mamta Mittal

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Air Quality ◽

Feature Selection Method ◽

Selection Method ◽

Machine Learning Technique ◽

Learning Technique ◽

New Feature

Download Full-text

The Prediction of Diatom Abundance by Comparison of Various Machine Learning Methods

Mathematical Problems in Engineering ◽

10.1155/2019/5749746 ◽

2019 ◽

Vol 2019 ◽

pp. 1-13

Author(s):

Yuna Shin ◽

Heesuk Lee ◽

Young-Joo Lee ◽

Dae Keun Seo ◽

Bomi Jeong ◽

...

Keyword(s):

Machine Learning ◽

Variable Selection ◽

Roc Curve ◽

Model Evaluation ◽

Selection Method ◽

The Other ◽

Evaluation Tool ◽

Evaluation Tools ◽

Variable Selection Method ◽

Boruta Algorithm

This study adopts two approaches to analyze the occurrence of algae at Haman Weir for Nakdong River; one is the traditional statistical method, such as logistic regression, while the other is machine learning technique, such as kNN, ANN, RF, Bagging, Boosting, and SVM. In order to compare the performance of the models, this study measured the accuracy, specificity, sensitivity, and AUC, which are representative model evaluation tools. The ROC curve is created by plotting association of sensitivity and (1-specificity). The AUC that is area of ROC curve represents sensitivity and specificity. This measure has two competitive advantages compared to other evaluation tools. One is that it is scale-invariant. It means that purpose of AUC is how well the model predicts. The other is that the AUC is classification-threshold-invariant. It shows that the AUC is independent of threshold because it is plotted association of sensitivity and (1-specificity) obtained by threshold. We chose AUC as a final model evaluation tool with two advantages. Also, variable selection was conducted using the Boruta algorithm. In addition, we tried to distinguish the better model by comparing the model with the variable selection method and the model without the variable selection method. As a result of the analysis, Boruta algorithm as a variable selection method suggested PO4-P, DO, BOD, NH3-N, Susp, pH, TOC, Temp, TN, and TP as significant explanatory variables. A comparison was made between the model with and without these selected variables. Among the models without variable selection method, the accuracy of RF analysis was highest, and ANN analysis showed the highest AUC. In conclusion, ANN analysis using the variable selection method showed the best performance among the models with and without variable selection method.

Download Full-text