Feature Selection Combined with Random Subspace Ensemble for Gene Expression Based Diagnosis of Malignancies

Multivariate Feature Selection using Random Subspace Classifiers for Gene Expression Data

2007 IEEE 7th International Symposium on BioInformatics and BioEngineering ◽

10.1109/bibe.2007.4375685 ◽

2007 ◽

Cited By ~ 1

Author(s):

Vidya P. Kamath ◽

Lawrence O. Hall ◽

Timothy J. Yeatman ◽

Steven. A Eschrich

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Random Subspace ◽

Expression Data

Download Full-text

Multi-Round Random Subspace Feature Selection for Incomplete Gene Expression Data

2019 IEEE Congress on Evolutionary Computation (CEC) ◽

10.1109/cec.2019.8790237 ◽

2019 ◽

Cited By ~ 2

Author(s):

Will Pearson ◽

Cao Truong Tran ◽

Mengjie Zhang ◽

Bing Xue

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Random Subspace ◽

Expression Data ◽

Selection For

Download Full-text

Feature Selection for Gene Expression Data Analysis – A Review

International Journal of Psychosocial Rehabilitation ◽

10.37200/ijpr/v24i5/pr2020695 ◽

2020 ◽

Vol 24 (5) ◽

pp. 6955-6964

Author(s):

Dr. Prema R

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Data Analysis ◽

Gene Expression Data ◽

Expression Data ◽

Gene Expression Data Analysis ◽

Selection For

Download Full-text

An Integrated Feature Selection Algorithm for Cancer Classification using Gene Expression Data

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220124756 ◽

2019 ◽

Vol 21 (9) ◽

pp. 631-645 ◽

Cited By ~ 5

Author(s):

Saeed Ahmed ◽

Muhammad Kabir ◽

Zakir Ali ◽

Muhammad Arif ◽

Farman Ali ◽

...

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Classification Accuracy ◽

Early Stage ◽

Small Sample Size ◽

Feature Selection Method ◽

Small Sample ◽

Expression Data ◽

Base Function

Aim and Objective: Cancer is a dangerous disease worldwide, caused by somatic mutations in the genome. Diagnosis of this deadly disease at an early stage is exceptionally new clinical application of microarray data. In DNA microarray technology, gene expression data have a high dimension with small sample size. Therefore, the development of efficient and robust feature selection methods is indispensable that identify a small set of genes to achieve better classification performance. Materials and Methods: In this study, we developed a hybrid feature selection method that integrates correlation-based feature selection (CFS) and Multi-Objective Evolutionary Algorithm (MOEA) approaches which select the highly informative genes. The hybrid model with Redial base function neural network (RBFNN) classifier has been evaluated on 11 benchmark gene expression datasets by employing a 10-fold cross-validation test. Results: The experimental results are compared with seven conventional-based feature selection and other methods in the literature, which shows that our approach owned the obvious merits in the aspect of classification accuracy ratio and some genes selected by extensive comparing with other methods. Conclusion: Our proposed CFS-MOEA algorithm attained up to 100% classification accuracy for six out of eleven datasets with a minimal sized predictive gene subset.

Download Full-text

Lung adenocarcinoma and lung squamous cell carcinoma cancer classification, biomarker identification, and gene expression analysis using overlapping feature selection methods

Scientific Reports ◽

10.1038/s41598-021-92725-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Joe W. Chen ◽

Joseph Dhahbi

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Expression Analysis ◽

Gene Expression Analysis ◽

Biomarker Discovery ◽

Lung Squamous Cell Carcinoma ◽

Cancer Classification ◽

Selection Methods ◽

Biomarker Identification ◽

Overlapping Method

AbstractLung cancer is one of the deadliest cancers in the world. Two of the most common subtypes, lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC), have drastically different biological signatures, yet they are often treated similarly and classified together as non-small cell lung cancer (NSCLC). LUAD and LUSC biomarkers are scarce, and their distinct biological mechanisms have yet to be elucidated. To detect biologically relevant markers, many studies have attempted to improve traditional machine learning algorithms or develop novel algorithms for biomarker discovery. However, few have used overlapping machine learning or feature selection methods for cancer classification, biomarker identification, or gene expression analysis. This study proposes to use overlapping traditional feature selection or feature reduction techniques for cancer classification and biomarker discovery. The genes selected by the overlapping method were then verified using random forest. The classification statistics of the overlapping method were compared to those of the traditional feature selection methods. The identified biomarkers were validated in an external dataset using AUC and ROC analysis. Gene expression analysis was then performed to further investigate biological differences between LUAD and LUSC. Overall, our method achieved classification results comparable to, if not better than, the traditional algorithms. It also identified multiple known biomarkers, and five potentially novel biomarkers with high discriminating values between LUAD and LUSC. Many of the biomarkers also exhibit significant prognostic potential, particularly in LUAD. Our study also unraveled distinct biological pathways between LUAD and LUSC.

Download Full-text

Multi-view feature selection for identifying gene markers: a diversified biological data driven approach

BMC Bioinformatics ◽

10.1186/s12859-020-03810-0 ◽

2020 ◽

Vol 21 (S18) ◽

Author(s):

Sudipta Acharya ◽

Laizhong Cui ◽

Yi Pan

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Selection ◽

Marker Gene ◽

Biological Data ◽

Protein Interaction Data ◽

Marker Genes ◽

Data Sets ◽

Gene Markers ◽

Multi Objective

Abstract Background In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population. Results In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets. Conclusion A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.

Download Full-text

Memory Based Cuckoo Search Algorithm for Feature Selection of Gene Expression Dataset

Informatics in Medicine Unlocked ◽

10.1016/j.imu.2021.100572 ◽

2021 ◽

pp. 100572

Author(s):

Malek Alzaqebah ◽

Khaoula Briki ◽

Nashat Alrefai ◽

Sami Brini ◽

Sana Jawarneh ◽

...

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Search Algorithm ◽

Cuckoo Search ◽

Cuckoo Search Algorithm ◽

Gene Expression Dataset ◽

Selection Of

Download Full-text

Improving the Performance of Principal Components for Classification of Gene Expression Data Through Feature Selection

Studies in Classification, Data Analysis, and Knowledge Organization - Data Science and Classification ◽

10.1007/3-540-34416-0_35 ◽

2006 ◽

pp. 325-332

Author(s):

Edgar Acuña ◽

Jaime Porras

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Principal Components ◽

Expression Data

Download Full-text

Ensemble Feature Selection from Cancer Gene Expression Data using Mutual Information and Recursive Feature Elimination

2020 Third International Conference on Advances in Electronics, Computers and Communications (ICAECC) ◽

10.1109/icaecc50550.2020.9339518 ◽

2020 ◽

Author(s):

Nimrita Koul ◽

Sunilkumar S Manvi

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Mutual Information ◽

Gene Expression Data ◽

Recursive Feature Elimination ◽

Cancer Gene ◽

Expression Data

Download Full-text

A Robust Gene selection Method for Microarray-based Cancer Classification

Cancer Informatics ◽

10.4137/cin.s3794 ◽

2010 ◽

Vol 9 ◽

pp. CIN.S3794 ◽

Cited By ~ 21

Author(s):

Xiaosheng Wang ◽

Osamu Gotoh

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Selection ◽

Information Gain ◽

Expression Profiles ◽

Feature Selection Method ◽

Gene Expression Profiles ◽

Molecular Classification ◽

Selection Method ◽

Chi Square

Gene selection is of vital importance in molecular classification of cancer using high-dimensional gene expression data. Because of the distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and robust feature selection methods is extremely crucial. We investigated the properties of one feature selection approach proposed in our previous work, which was the generalization of the feature selection method based on the depended degree of attribute in rough sets. We compared the feature selection method with the established methods: the depended degree, chi-square, information gain, Relief-F and symmetric uncertainty, and analyzed its properties through a series of classification experiments. The results revealed that our method was superior to the canonical depended degree of attribute based method in robustness and applicability. Moreover, the method was comparable to the other four commonly used methods. More importantly, the method can exhibit the inherent classification difficulty with respect to different gene expression datasets, indicating the inherent biology of specific cancers.

Download Full-text