A Comparative Analysis of Feature Selection Methods for Biomarker Discovery in Study of Toxicant-Treated Atlantic Cod (Gadus Morhua) Liver

AbstractLung cancer is one of the deadliest cancers in the world. Two of the most common subtypes, lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC), have drastically different biological signatures, yet they are often treated similarly and classified together as non-small cell lung cancer (NSCLC). LUAD and LUSC biomarkers are scarce, and their distinct biological mechanisms have yet to be elucidated. To detect biologically relevant markers, many studies have attempted to improve traditional machine learning algorithms or develop novel algorithms for biomarker discovery. However, few have used overlapping machine learning or feature selection methods for cancer classification, biomarker identification, or gene expression analysis. This study proposes to use overlapping traditional feature selection or feature reduction techniques for cancer classification and biomarker discovery. The genes selected by the overlapping method were then verified using random forest. The classification statistics of the overlapping method were compared to those of the traditional feature selection methods. The identified biomarkers were validated in an external dataset using AUC and ROC analysis. Gene expression analysis was then performed to further investigate biological differences between LUAD and LUSC. Overall, our method achieved classification results comparable to, if not better than, the traditional algorithms. It also identified multiple known biomarkers, and five potentially novel biomarkers with high discriminating values between LUAD and LUSC. Many of the biomarkers also exhibit significant prognostic potential, particularly in LUAD. Our study also unraveled distinct biological pathways between LUAD and LUSC.

Download Full-text

A Critical Assessment of Feature Selection Methods for Biomarker Discovery in Clinical Proteomics

Molecular & Cellular Proteomics ◽

10.1074/mcp.m112.022566 ◽

2012 ◽

Vol 12 (1) ◽

pp. 263-276 ◽

Cited By ~ 73

Author(s):

Christin Christin ◽

Huub C. J. Hoefsloot ◽

Age K. Smilde ◽

B. Hoekman ◽

Frank Suits ◽

...

Keyword(s):

Feature Selection ◽

Biomarker Discovery ◽

Critical Assessment ◽

Clinical Proteomics ◽

Selection Methods

Download Full-text

A Comparative Study of Feature Selection Methods for Biomarker Discovery

2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2018.8621267 ◽

2018 ◽

Cited By ~ 2

Author(s):

Zahra Mungloo-Dilmohamud ◽

Gary Marigliano ◽

Yasmina Jaufeerally-Fakim ◽

Carlos Pena-Reyes

Keyword(s):

Feature Selection ◽

Comparative Study ◽

Biomarker Discovery ◽

Selection Methods

Download Full-text

Comparative Analysis of Feature Selection Methods to Identify Biomarkers in a Stroke-Related Dataset

2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) ◽

10.1109/cibcb.2019.8791457 ◽

2019 ◽

Cited By ~ 1

Author(s):

Thomas Clifford ◽

Justin Bruce ◽

Tayo Obafemi-Ajayi ◽

John Matta

Keyword(s):

Feature Selection ◽

Comparative Analysis ◽

Selection Methods

Download Full-text

Machine Learning for Web Intrusion Detection: A Comparative Analysis of Feature Selection Methods mRMR and PFI

Artificial Intelligence and Soft Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-030-61401-0_50 ◽

2020 ◽

pp. 535-546

Author(s):

Thiago José Lucas ◽

Carlos Alexandre Carvalho Tojeiro ◽

Rafael Gonçalves Pires ◽

Kelton Augusto Pontara da Costa ◽

João Paulo Papa

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Comparative Analysis ◽

Intrusion Detection ◽

Selection Methods

Download Full-text

Biomarker discovery in inflammatory bowel diseases using network-based feature selection

10.1101/662197 ◽

2019 ◽

Cited By ~ 1

Author(s):

Mostafa Abbas ◽

John Matta ◽

Thanh Le ◽

Halima Bensmail ◽

Tayo Obafemi-Ajayi ◽

...

Keyword(s):

Feature Selection ◽

Random Forest ◽

Biomarker Discovery ◽

Hybrid Approach ◽

Cost Effective ◽

Machine Learning Techniques ◽

Integrative Approach ◽

Selection Methods ◽

Network Analyses ◽

Metagenomics Data

ABSTRACTReliable identification of inflammatory biomarkers from metagenomics data is a promising direction for developing non-invasive, cost-effective, and rapid clinical tests for early diagnosis of IBD. We present an integrative approach to Network-Based Biomarker Discovery (NBBD) which integrates network analyses methods for prioritizing potential biomarkers and machine learning techniques for assessing the discriminative power of the prioritized biomarkers. Using a large dataset of new-onset pediatric IBD metagenomics biopsy samples, we compare the performance of Random Forest (RF) classifiers trained on features selected using a representative set of traditional feature selection methods against NBBD framework, configured using five different tools for inferring networks from metagenomics data, and nine different methods for prioritizing biomarkers as well as a hybrid approach combining best traditional and NBBD based feature selection. We also examine how the performance of the predictive models for IBD diagnosis varies as a function of the size of the data used for biomarker identification. Our results show that (i) NBBD is competitive with some of the state-of-the-art feature selection methods including Random Forest Feature Importance (RFFI) scores; and (ii) NBBD is especially effective in reliably identifying IBD biomarkers when the number of data samples available for biomarker discovery is small.

Download Full-text

A Comparative Analysis of Feature Selection Methods and Associated Machine Learning Algorithms on Wisconsin Breast Cancer Dataset (WBCD)

Advances in Intelligent Systems and Computing - Proceedings of International Conference on ICT for Sustainable Development ◽

10.1007/978-981-10-0129-1_23 ◽

2016 ◽

pp. 215-224 ◽

Cited By ~ 3

Author(s):

Nileshkumar Modi ◽

Kaushar Ghanchi

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Feature Selection ◽

Comparative Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Breast Cancer Dataset ◽

Selection Methods ◽

Cancer Dataset

Download Full-text

The Comparative Analysis of Single-Objective and Multi-objective Evolutionary Feature Selection Methods

Advances in Intelligent Systems and Computing - Proceedings of the 13th International Conference on Ubiquitous Information Management and Communication (IMCOM) 2019 ◽

10.1007/978-3-030-19063-7_76 ◽

2019 ◽

pp. 975-985

Author(s):

Syed Imran Ali ◽

Sungyoung Lee

Keyword(s):

Feature Selection ◽

Comparative Analysis ◽

Selection Methods ◽

Multi Objective ◽

Single Objective

Download Full-text

Advances in Current Diabetes Proteomics: From the Perspectives of Label- free Quantification and Biomarker Selection

Current Drug Targets ◽

10.2174/1389450120666190821160207 ◽

2019 ◽

Vol 21 (1) ◽

pp. 34-54

Author(s):

Jianbo Fu ◽

Yongchao Luo ◽

Minjie Mou ◽

Hongning Zhang ◽

Jing Tang ◽

...

Keyword(s):

Diabetes Mellitus ◽

Feature Selection ◽

Biomarker Discovery ◽

Label Free ◽

Selection Methods ◽

Advantages And Disadvantages ◽

Label Free Quantification ◽

Marker Selection ◽

Pubmed Database ◽

Free Quantification

Background: Due to its prevalence and negative impacts on both the economy and society, the diabetes mellitus (DM) has emerged as a worldwide concern. In light of this, the label-free quantification (LFQ) proteomics and diabetic marker selection methods have been applied to elucidate the underlying mechanisms associated with insulin resistance, explore novel protein biomarkers, and discover innovative therapeutic protein targets. Objective: The purpose of this manuscript is to review and analyze the recent computational advances and development of label-free quantification and diabetic marker selection in diabetes proteomics. Methods: Web of Science database, PubMed database and Google Scholar were utilized for searching label-free quantification, computational advances, feature selection and diabetes proteomics. Results: In this study, we systematically review the computational advances of label-free quantification and diabetic marker selection methods which were applied to get the understanding of DM pathological mechanisms. Firstly, different popular quantification measurements and proteomic quantification software tools which have been applied to the diabetes studies are comprehensively discussed. Secondly, a number of popular manipulation methods including transformation, pretreatment (centering, scaling, and normalization), missing value imputation methods and a variety of popular feature selection techniques applied to diabetes proteomic data are overviewed with objective evaluation on their advantages and disadvantages. Finally, the guidelines for the efficient use of the computationbased LFQ technology and feature selection methods in diabetes proteomics are proposed. Conclusion: In summary, this review provides guidelines for researchers who will engage in proteomics biomarker discovery and by properly applying these proteomic computational advances, more reliable therapeutic targets will be found in the field of diabetes mellitus.

Download Full-text

MetaFS: Performance assessment of biomarker discovery in metaproteomics

Briefings in Bioinformatics ◽

10.1093/bib/bbaa105 ◽

2020 ◽

Author(s):

Jing Tang ◽

Minjie Mou ◽

Yunxia Wang ◽

Yongchao Luo ◽

Feng Zhu

Keyword(s):

Feature Selection ◽

Biomarker Discovery ◽

Comprehensive Evaluation ◽

Feature Selection Method ◽

Selection Method ◽

Selection Methods ◽

Online Tool ◽

Differential Proteins ◽

Reduction Methods ◽

Overall Performance

Abstract Metaproteomics suffers from the issues of dimensionality and sparsity. Data reduction methods can maximally identify the relevant subset of significant differential features and reduce data redundancy. Feature selection (FS) methods were applied to obtain the significant differential subset. So far, a variety of feature selection methods have been developed for metaproteomic study. However, due to FS’s performance depended heavily on the data characteristics of a given research, the well-suitable feature selection method must be carefully selected to obtain the reproducible differential proteins. Moreover, it is critical to evaluate the performance of each FS method according to comprehensive criteria, because the single criterion is not sufficient to reflect the overall performance of the FS method. Therefore, we developed an online tool named MetaFS, which provided 13 types of FS methods and conducted the comprehensive evaluation on the complex FS methods using four widely accepted and independent criteria. Furthermore, the function and reliability of MetaFS were systematically tested and validated via two case studies. In sum, MetaFS could be a distinguished tool for discovering the overall well-performed FS method for selecting the potential biomarkers in microbiome studies. The online tool is freely available at https://idrblab.org/metafs/.

Download Full-text