HFS-SLPEE: A Novel Hierarchical Feature Selection and Second Learning Probability Error Ensemble Model for Precision Cancer Diagnosis

The emergence of high-throughput RNA-seq data has offered unprecedented opportunities for cancer diagnosis. However, capturing biological data with highly nonlinear and complex associations by most existing approaches for cancer diagnosis has been challenging. In this study, we propose a novel hierarchical feature selection and second learning probability error ensemble model (named HFS-SLPEE) for precision cancer diagnosis. Specifically, we first integrated protein-coding gene expression profiles, non-coding RNA expression profiles, and DNA methylation data to provide rich information; afterward, we designed a novel hierarchical feature selection method, which takes the CpG-gene biological associations into account and can select a compact set of superior features; next, we used four individual classifiers with significant differences and apparent complementary to build the heterogeneous classifiers; lastly, we developed a second learning probability error ensemble model called SLPEE to thoroughly learn the new data consisting of classifiers-predicted class probability values and the actual label, further realizing the self-correction of the diagnosis errors. Benchmarking comparisons on TCGA showed that HFS-SLPEE performs better than the state-of-the-art approaches. Moreover, we analyzed in-depth 10 groups of selected features and found several novel HFS-SLPEE-predicted epigenomics and epigenetics biomarkers for breast invasive carcinoma (BRCA) (e.g., TSLP and ADAMTS9-AS2), lung adenocarcinoma (LUAD) (e.g., HBA1 and CTB-43E15.1), and kidney renal clear cell carcinoma (KIRC) (e.g., IRX2 and BMPR1B-AS1).

Download Full-text

A Robust Gene selection Method for Microarray-based Cancer Classification

Cancer Informatics ◽

10.4137/cin.s3794 ◽

2010 ◽

Vol 9 ◽

pp. CIN.S3794 ◽

Cited By ~ 21

Author(s):

Xiaosheng Wang ◽

Osamu Gotoh

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Selection ◽

Information Gain ◽

Expression Profiles ◽

Feature Selection Method ◽

Gene Expression Profiles ◽

Molecular Classification ◽

Selection Method ◽

Chi Square

Gene selection is of vital importance in molecular classification of cancer using high-dimensional gene expression data. Because of the distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and robust feature selection methods is extremely crucial. We investigated the properties of one feature selection approach proposed in our previous work, which was the generalization of the feature selection method based on the depended degree of attribute in rough sets. We compared the feature selection method with the established methods: the depended degree, chi-square, information gain, Relief-F and symmetric uncertainty, and analyzed its properties through a series of classification experiments. The results revealed that our method was superior to the canonical depended degree of attribute based method in robustness and applicability. Moreover, the method was comparable to the other four commonly used methods. More importantly, the method can exhibit the inherent classification difficulty with respect to different gene expression datasets, indicating the inherent biology of specific cancers.

Download Full-text

A New Feature Selection Method for Enhancing Cancer Diagnosis Based on DNA Microarray

2020 37th National Radio Science Conference (NRSC) ◽

10.1109/nrsc49500.2020.9235095 ◽

2020 ◽

Author(s):

Mostafa Atlam ◽

Hanaa Torkey ◽

Hanaa Salem ◽

Nawal El-Fishawy

Keyword(s):

Feature Selection ◽

Dna Microarray ◽

Cancer Diagnosis ◽

Feature Selection Method ◽

Selection Method ◽

New Feature

Download Full-text

A Systematic Framework for Drug Repositioning from Integrated Omics and Drug Phenotype Profiles Using Pathway-Drug Network

BioMed Research International ◽

10.1155/2016/7147039 ◽

2016 ◽

Vol 2016 ◽

pp. 1-17 ◽

Cited By ~ 21

Author(s):

Erkhembayar Jadamba ◽

Miyoung Shin

Keyword(s):

Breast Cancer ◽

Drug Repositioning ◽

Expression Profiles ◽

A Priori ◽

Gene Expression Profiles ◽

Biological Data ◽

Genomic Knowledge ◽

High Level ◽

Systematic Framework ◽

Phenotype Expression

Drug repositioning offers new clinical indications for old drugs. Recently, many computational approaches have been developed to repurpose marketed drugs in human diseases by mining various of biological data including disease expression profiles, pathways, drug phenotype expression profiles, and chemical structure data. However, despite encouraging results, a comprehensive and efficient computational drug repositioning approach is needed that includes the high-level integration of available resources. In this study, we propose a systematic framework employing experimental genomic knowledge and pharmaceutical knowledge to reposition drugs for a specific disease. Specifically, we first obtain experimental genomic knowledge from disease gene expression profiles and pharmaceutical knowledge from drug phenotype expression profiles and construct a pathway-drug network representing a priori known associations between drugs and pathways. To discover promising candidates for drug repositioning, we initialize node labels for the pathway-drug network using identified disease pathways and known drugs associated with the phenotype of interest and perform network propagation in a semisupervised manner. To evaluate our method, we conducted some experiments to reposition 1309 drugs based on four different breast cancer datasets and verified the results of promising candidate drugs for breast cancer by a two-step validation procedure. Consequently, our experimental results showed that the proposed framework is quite useful approach to discover promising candidates for breast cancer treatment.

Download Full-text

Learning and Feature Selection Using the Set Covering Machine with Data-Dependent Rays on Gene Expression Profiles

Artificial Neural Networks in Pattern Recognition - Lecture Notes in Computer Science ◽

10.1007/11829898_26 ◽

2006 ◽

pp. 286-297

Author(s):

Hans A. Kestler ◽

Wolfgang Lindner ◽

André Müller

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Set Covering

Download Full-text

Improved Feature Selection by Incorporating Gene Similarity into the LASSO

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/jkdb.2012010101 ◽

2012 ◽

Vol 3 (1) ◽

pp. 1-22 ◽

Cited By ~ 1

Author(s):

Christopher E. Gillies ◽

Xiaoli Gao ◽

Nilesh V. Patel ◽

Mohammad-Reza Siadat ◽

George D. Wilson

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Personalized Medicine ◽

Objective Function ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Genetic Profile ◽

Data Set ◽

Coordinate Descent Algorithm ◽

Gene Similarity

Personalized medicine is customizing treatments to a patient’s genetic profile and has the potential to revolutionize medical practice. An important process used in personalized medicine is gene expression profiling. Analyzing gene expression profiles is difficult, because there are usually few patients and thousands of genes, leading to the curse of dimensionality. To combat this problem, researchers suggest using prior knowledge to enhance feature selection for supervised learning algorithms. The authors propose an enhancement to the LASSO, a shrinkage and selection technique that induces parameter sparsity by penalizing a model’s objective function. Their enhancement gives preference to the selection of genes that are involved in similar biological processes. The authors’ modified LASSO selects similar genes by penalizing interaction terms between genes. They devise a coordinate descent algorithm to minimize the corresponding objective function. To evaluate their method, the authors created simulation data where they compared their model to the standard LASSO model and an interaction LASSO model. The authors’ model outperformed both the standard and interaction LASSO models in terms of detecting important genes and gene interactions for a reasonable number of training samples. They also demonstrated the performance of their method on a real gene expression data set from lung cancer cell lines.

Download Full-text

Kidney transplant classification with gene expression profiles using L1 feature selection ensemble classifier based on data clustering

2017 International Conference on Advanced Computer Science and Information Systems (ICACSIS) ◽

10.1109/icacsis.2017.8355040 ◽

2017 ◽

Author(s):

M Octaviano Pratama

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Kidney Transplant ◽

Data Clustering ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Ensemble Classifier

Download Full-text

A feature selection method based on multiple kernel learning with expression profiles of different types

BioData Mining ◽

10.1186/s13040-017-0124-x ◽

2017 ◽

Vol 10 (1) ◽

Cited By ~ 13

Author(s):

Wei Du ◽

Zhongbo Cao ◽

Tianci Song ◽

Ying Li ◽

Yanchun Liang

Keyword(s):

Feature Selection ◽

Expression Profiles ◽

Multiple Kernel Learning ◽

Feature Selection Method ◽

Selection Method ◽

Kernel Learning ◽

Multiple Kernel ◽

Different Types

Download Full-text

Intelligent decision support system for breast cancer diagnosis by gene expression profiles

2016 33rd National Radio Science Conference (NRSC) ◽

10.1109/nrsc.2016.7450870 ◽

2016 ◽

Cited By ~ 1

Author(s):

Hanaa Salem ◽

Gamal Attiya ◽

Nawal El-Fishawy

Keyword(s):

Breast Cancer ◽

Gene Expression ◽

Decision Support ◽

Cancer Diagnosis ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Breast Cancer Diagnosis ◽

Intelligent Decision Support System ◽

Intelligent Decision ◽

Intelligent Decision Support

Download Full-text

Improved cancer biomarkers identification using network-constrained infinite latent feature selection

PLoS ONE ◽

10.1371/journal.pone.0246668 ◽

2021 ◽

Vol 16 (2) ◽

pp. e0246668

Author(s):

Lihua Cai ◽

Honglong Wu ◽

Ke Zhou

Keyword(s):

Feature Selection ◽

Expression Profiles ◽

Biological Significance ◽

Feature Selection Method ◽

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Functional Enrichment ◽

Gene Sets ◽

Significant Gene

Identifying biomarkers that are associated with different types of cancer is an important goal in the field of bioinformatics. Different researcher groups have analyzed the expression profiles of many genes and found some certain genetic patterns that can promote the improvement of targeted therapies, but the significance of some genes is still ambiguous. More reliable and effective biomarkers identification methods are then needed to detect candidate cancer-related genes. In this paper, we proposed a novel method that combines the infinite latent feature selection (ILFS) method with the functional interaction (FIs) network to rank the biomarkers. We applied the proposed method to the expression data of five cancer types. The experiments indicated that our network-constrained ILFS (NCILFS) provides an improved prediction of the diagnosis of the samples and locates many more known oncogenes than the original ILFS and some other existing methods. We also performed functional enrichment analysis by inspecting the over-represented gene ontology (GO) biological process (BP) terms and applying the gene set enrichment analysis (GSEA) method on selected biomarkers for each feature selection method. The enrichments analysis reports show that our network-constraint ILFS can produce more biologically significant gene sets than other methods. The results suggest that network-constrained ILFS can identify cancer-related genes with a higher discriminative power and biological significance.

Download Full-text

Identification of Novel COVID-19 Biomarkers by Multiple Feature Selection Strategies

Computational and Mathematical Methods in Medicine ◽

10.1155/2021/2203636 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Shuai Zhang ◽

Renliang Qu ◽

Pengyan Wang ◽

Shenghan Wang

Keyword(s):

Feature Selection ◽

Expression Profiles ◽

Feature Selection Method ◽

Principal Component ◽

Enrichment Analysis ◽

Functional Enrichment ◽

Nucleic Acid Detection ◽

Support Vector ◽

New Biomarkers ◽

Optimal Feature

Coronavirus disease 2019 (COVID-19) arising from severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has resulted in a global pandemic since its first report in December 2019. So far, SARS-CoV-2 nucleic acid detection has been deemed as the golden standard of COVID-19 diagnosis. However, this detection method often leads to false negatives, thus triggering missed COVID-19 diagnosis. Therefore, it is urgent to find new biomarkers to increase the accuracy of COVID-19 diagnosis. To explore new biomarkers of COVID-19 in this study, expression profiles were firstly accessed from the GEO database. On this basis, 500 feature genes were screened by the minimum-redundancy maximum-relevancy (mRMR) feature selection method. Afterwards, the incremental feature selection (IFS) method was used to choose a classifier with the best performance from different feature gene-based support vector machine (SVM) classifiers. The corresponding 66 feature genes were set as the optimal feature genes. Lastly, the optimal feature genes were subjected to GO functional enrichment analysis, principal component analysis (PCA), and protein-protein interaction (PPI) network analysis. All in all, it was posited that the 66 feature genes could effectively classify positive and negative COVID-19 and work as new biomarkers of the disease.

Download Full-text