Gene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method

Iranian Journal of Pediatric Hematology & Oncology ◽

10.18502/ijpho.v11i2.5838 ◽

2021 ◽

Author(s):

Razieh Sheikhpour ◽

Roohallah Fazli ◽

Sanaz Mehrabani

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Microarray Data ◽

Lymphoblastic Leukemia ◽

Feature Selection Method ◽

Selection Method ◽

Learning Methods ◽

Machine Learning Methods ◽

Acute Myeloid ◽

Sparse Feature Selection

Background: Microarray experiments can simultaneously determine the expression of thousands of genes. Identification of potential genes from microarray data for diagnosis of cancer is important. This study aimed to identify genes for the diagnosis of acute myeloid and lymphoblastic leukemia using a sparse feature selection method. Materials and Methods: In this descriptive study, the expression of 7129 genes of 25 patients with acute myeloid leukemia (AML), and 47 patients with lymphoblastic leukemia (ALL) achieved by the microarray technology were used in this study. Then, the important genes were identified using a sparse feature selection method to diagnose AML and ALL tissues based on the machine learning methods such as support vector machine (SVM), Gaussian kernel density estimation based classifier (GKDEC), k-nearest neighbor (KNN), and linear discriminant classifier (LDC). Results: Diagnosis of ALL and AML was done with the accuracy of 100% using 8 genes of microarray data selected by the sparse feature selection method, GKDEC, and LDC. Moreover, the KNN classifier using 6 genes and the SVM classifier using 7 genes diagnosed AML and ALL with the accuracy of 91.18% and 94.12%, respectively. The gene with the description “Paired-box protein PAX2 (PAX2) gene, exon 11 and complete CDs” was determined as the most important gene in the diagnosis of ALL and AML. Conclusion: The experimental results of the current study showed that AML and ALL can be diagnosed with high accuracy using sparse feature selection and machine learning methods. It seems that the investigation of the expression of selected genes in this study can be helpful in the diagnosis of ALL and AML.

Download Full-text

Prediction of Concrete Compressive Strength and Slump by Machine Learning Methods

Advances in Civil Engineering ◽

10.1155/2019/3069046 ◽

2019 ◽

Vol 2019 ◽

pp. 1-11 ◽

Cited By ~ 6

Author(s):

M. Timur Cihan

Keyword(s):

Machine Learning ◽

Compressive Strength ◽

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Regression Method ◽

Concrete Compressive Strength ◽

Learning Methods ◽

Acceptable Error ◽

Machine Learning Methods

Machine learning methods have been successfully applied to many engineering disciplines. Prediction of the concrete compressive strength (fc) and slump (S) is important in terms of the desirability of concrete and its sustainability. The goals of this study were (i) to determine the most successful normalization technique for the datasets, (ii) to select the prime regression method to predict the fc and S outputs, (iii) to obtain the best subset with the ReliefF feature selection method, and (iv) to compare the regression results for the original and selected subsets. Experimental results demonstrate that the decimal scaling and min-max normalization techniques are the most successful methods for predicting the compressive strength and slump outputs, respectively. According to the evaluation metrics, such as the correlation coefficient, root mean squared error, and mean absolute error, the fuzzy logic method makes better predictions than any other regression method. Moreover, when the input variable was reduced from seven to four by the ReliefF feature selection method, the predicted accuracy was within the acceptable error rate.

Download Full-text

Classification of lung cancer using ensemble-based feature selection and machine learning methods

Molecular BioSystems ◽

10.1039/c4mb00659c ◽

2015 ◽

Vol 11 (3) ◽

pp. 791-800 ◽

Cited By ~ 53

Author(s):

Zhihua Cai ◽

Dong Xu ◽

Qing Zhang ◽

Jiexia Zhang ◽

Sai-Ming Ngai ◽

...

Keyword(s):

Machine Learning ◽

Lung Cancer ◽

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Learning Methods ◽

Machine Learning Methods

The ensemble-based feature selection method presents the merit of acquisition of more informative and compact features than those obtained by individual methods.

Download Full-text

Machine learning approaches for classification of colorectal cancer with and without feature selection method on Microarray Data

Gene Reports ◽

10.1016/j.genrep.2021.101419 ◽

2021 ◽

pp. 101419

Author(s):

Elham Nazari ◽

Mehran Aghemiri ◽

Amir Avan ◽

Amin Mehrabian ◽

Hamed Tabesh

Keyword(s):

Colorectal Cancer ◽

Machine Learning ◽

Feature Selection ◽

Microarray Data ◽

Feature Selection Method ◽

Selection Method ◽

Learning Approaches

Download Full-text

Improved microarray data analysis using feature selection methods with machine learning methods

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2016.7822748 ◽

2016 ◽

Author(s):

Jing Sun ◽

Kalpdrum Passi ◽

Chakresh Kumar Jain

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Data Analysis ◽

Microarray Data ◽

Microarray Data Analysis ◽

Selection Methods ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

A fuzzy gaussian rank aggregation ensemble feature selection method for microarray data

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-190134 ◽

2021 ◽

Vol 24 (4) ◽

pp. 289-301

Author(s):

B. Venkatesh ◽

J. Anuradha

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Classification Accuracy ◽

Performance Metrics ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Binary Particle Swarm Optimization ◽

Selection Methods

In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.

Download Full-text

An Improved Machine Learning-Based Employees Attrition Prediction Framework with Emphasis on Feature Selection

Mathematics ◽

10.3390/math9111226 ◽

2021 ◽

Vol 9 (11) ◽

pp. 1226

Author(s):

Saeed Najafi-Zangeneh ◽

Naser Shams-Gharneh ◽

Ali Arjomandi-Nezhad ◽

Sarfaraz Hashemkhani Zolfani

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Standard Deviation ◽

Analytical Formula ◽

Feature Selection Method ◽

Selection Method ◽

Performance Measure ◽

Learning Approaches ◽

Training Costs ◽

Professional Employees

Companies always seek ways to make their professional employees stay with them to reduce extra recruiting and training costs. Predicting whether a particular employee may leave or not will help the company to make preventive decisions. Unlike physical systems, human resource problems cannot be described by a scientific-analytical formula. Therefore, machine learning approaches are the best tools for this aim. This paper presents a three-stage (pre-processing, processing, post-processing) framework for attrition prediction. An IBM HR dataset is chosen as the case study. Since there are several features in the dataset, the “max-out” feature selection method is proposed for dimension reduction in the pre-processing stage. This method is implemented for the IBM HR dataset. The coefficient of each feature in the logistic regression model shows the importance of the feature in attrition prediction. The results show improvement in the F1-score performance measure due to the “max-out” feature selection method. Finally, the validity of parameters is checked by training the model for multiple bootstrap datasets. Then, the average and standard deviation of parameters are analyzed to check the confidence value of the model’s parameters and their stability. The small standard deviation of parameters indicates that the model is stable and is more likely to generalize well.

Download Full-text

NIMG-46. RADIOGENOMIC FEATURES PREDICT CLINICALLY RELEVANT GENOME-WIDE ALTERATION SIGNATURES IN GLIOBLASTOMA

Neuro-Oncology ◽

10.1093/neuonc/noaa215.659 ◽

2020 ◽

Vol 22 (Supplement_2) ◽

pp. ii158-ii158

Author(s):

Nicholas Nuechterlein ◽

Beibin Li ◽

James Fink ◽

David Haynor ◽

Eric Holland ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Molecular Subtypes ◽

Feature Selection Method ◽

Selection Method ◽

Versus Group ◽

Mri Features ◽

Genome Wide ◽

Group 2 ◽

Group 1

Abstract BACKGROUND Previously, we have shown that combined whole-exome sequencing (WES) and genome-wide somatic copy number alteration (SCNA) information can separate IDH1/2-wildtype glioblastoma into two prognostic molecular subtypes (Group 1 and Group 2) and that these subtypes cannot be distinguished by epigenetic or clinical features. However, the potential for radiographic features to discriminate between these molecular subtypes has not been established. METHODS Radiogenomic features (n=35,400) were extracted from 46 multiparametric, pre-operative magnetic resonance imaging (MRI) of IDH1/2-wildtype glioblastoma patients from The Cancer Imaging Archive, all of whom have corresponding WES and SCNA data in The Cancer Genome Atlas. We developed a novel feature selection method that leverages the structure of extracted radiogenomic MRI features to mitigate the dimensionality challenge posed by the disparity between the number of features and patients in our cohort. Seven traditional machine learning classifiers were trained to distinguish Group 1 versus Group 2 using our feature selection method. Our feature selection was compared to lasso feature selection, recursive feature elimination, and variance thresholding. RESULTS We are able to classify Group 1 versus Group 2 glioblastomas with a cross-validated area under the curve (AUC) score of 0.82 using ridge logistic regression and our proposed feature selection method, which reduces the size of our feature set from 35,400 to 288. An interrogation of the selected features suggests that features describing contours in the T2 abnormality region on the FLAIR MRI modality may best distinguish these two groups from one another. CONCLUSIONS We successfully trained a machine learning model that allows for relevant targeted feature extraction from standard MRI to accurately predict molecularly-defined risk-stratifying IDH1/2-wildtype glioblastoma patient groups. This algorithm may be applied to future prospective studies to assess the utility of MRI as a surrogate for costly prognostic genomic studies.

Download Full-text

Radiogenomic modeling predicts survival-associated prognostic groups in glioblastoma

Neuro-Oncology Advances ◽

10.1093/noajnl/vdab004 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Nicholas Nuechterlein ◽

Beibin Li ◽

Abdullah Feroze ◽

Eric C Holland ◽

Linda Shapiro ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Molecular Subtypes ◽

Feature Selection Method ◽

Area Under The Curve ◽

Selection Method ◽

Recursive Feature Elimination ◽

Signal Abnormality ◽

Mri Features ◽

Mri Scans

Abstract Background Combined whole-exome sequencing (WES) and somatic copy number alteration (SCNA) information can separate isocitrate dehydrogenase (IDH)1/2-wildtype glioblastoma into two prognostic molecular subtypes, which cannot be distinguished by epigenetic or clinical features. The potential for radiographic features to discriminate between these molecular subtypes has yet to be established. Methods Radiologic features (n = 35 340) were extracted from 46 multisequence, pre-operative magnetic resonance imaging (MRI) scans of IDH1/2-wildtype glioblastoma patients from The Cancer Imaging Archive (TCIA), all of whom have corresponding WES/SCNA data. We developed a novel feature selection method that leverages the structure of extracted MRI features to mitigate the dimensionality challenge posed by the disparity between a large number of features and the limited patients in our cohort. Six traditional machine learning classifiers were trained to distinguish molecular subtypes using our feature selection method, which was compared to least absolute shrinkage and selection operator (LASSO) feature selection, recursive feature elimination, and variance thresholding. Results We were able to classify glioblastomas into two prognostic subgroups with a cross-validated area under the curve score of 0.80 (±0.03) using ridge logistic regression on the 15-dimensional principle component analysis (PCA) embedding of the features selected by our novel feature selection method. An interrogation of the selected features suggested that features describing contours in the T2 signal abnormality region on the T2-weighted fluid-attenuated inversion recovery (FLAIR) MRI sequence may best distinguish these two groups from one another. Conclusions We successfully trained a machine learning model that allows for relevant targeted feature extraction from standard MRI to accurately predict molecularly-defined risk-stratifying IDH1/2-wildtype glioblastoma patient groups.

Download Full-text