scholarly journals A Comparative Analysis of Feature Selection Methods for Biomarker Discovery in Study of Toxicant-Treated Atlantic Cod (Gadus Morhua) Liver

Author(s):  
Xiaokang Zhang ◽  
Inge Jonassen
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Joe W. Chen ◽  
Joseph Dhahbi

AbstractLung cancer is one of the deadliest cancers in the world. Two of the most common subtypes, lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC), have drastically different biological signatures, yet they are often treated similarly and classified together as non-small cell lung cancer (NSCLC). LUAD and LUSC biomarkers are scarce, and their distinct biological mechanisms have yet to be elucidated. To detect biologically relevant markers, many studies have attempted to improve traditional machine learning algorithms or develop novel algorithms for biomarker discovery. However, few have used overlapping machine learning or feature selection methods for cancer classification, biomarker identification, or gene expression analysis. This study proposes to use overlapping traditional feature selection or feature reduction techniques for cancer classification and biomarker discovery. The genes selected by the overlapping method were then verified using random forest. The classification statistics of the overlapping method were compared to those of the traditional feature selection methods. The identified biomarkers were validated in an external dataset using AUC and ROC analysis. Gene expression analysis was then performed to further investigate biological differences between LUAD and LUSC. Overall, our method achieved classification results comparable to, if not better than, the traditional algorithms. It also identified multiple known biomarkers, and five potentially novel biomarkers with high discriminating values between LUAD and LUSC. Many of the biomarkers also exhibit significant prognostic potential, particularly in LUAD. Our study also unraveled distinct biological pathways between LUAD and LUSC.


2012 ◽  
Vol 12 (1) ◽  
pp. 263-276 ◽  
Author(s):  
Christin Christin ◽  
Huub C. J. Hoefsloot ◽  
Age K. Smilde ◽  
B. Hoekman ◽  
Frank Suits ◽  
...  

Author(s):  
Thiago José Lucas ◽  
Carlos Alexandre Carvalho Tojeiro ◽  
Rafael Gonçalves Pires ◽  
Kelton Augusto Pontara da Costa ◽  
João Paulo Papa

2019 ◽  
Author(s):  
Mostafa Abbas ◽  
John Matta ◽  
Thanh Le ◽  
Halima Bensmail ◽  
Tayo Obafemi-Ajayi ◽  
...  

ABSTRACTReliable identification of inflammatory biomarkers from metagenomics data is a promising direction for developing non-invasive, cost-effective, and rapid clinical tests for early diagnosis of IBD. We present an integrative approach to Network-Based Biomarker Discovery (NBBD) which integrates network analyses methods for prioritizing potential biomarkers and machine learning techniques for assessing the discriminative power of the prioritized biomarkers. Using a large dataset of new-onset pediatric IBD metagenomics biopsy samples, we compare the performance of Random Forest (RF) classifiers trained on features selected using a representative set of traditional feature selection methods against NBBD framework, configured using five different tools for inferring networks from metagenomics data, and nine different methods for prioritizing biomarkers as well as a hybrid approach combining best traditional and NBBD based feature selection. We also examine how the performance of the predictive models for IBD diagnosis varies as a function of the size of the data used for biomarker identification. Our results show that (i) NBBD is competitive with some of the state-of-the-art feature selection methods including Random Forest Feature Importance (RFFI) scores; and (ii) NBBD is especially effective in reliably identifying IBD biomarkers when the number of data samples available for biomarker discovery is small.


2019 ◽  
Vol 21 (1) ◽  
pp. 34-54
Author(s):  
Jianbo Fu ◽  
Yongchao Luo ◽  
Minjie Mou ◽  
Hongning Zhang ◽  
Jing Tang ◽  
...  

Background: Due to its prevalence and negative impacts on both the economy and society, the diabetes mellitus (DM) has emerged as a worldwide concern. In light of this, the label-free quantification (LFQ) proteomics and diabetic marker selection methods have been applied to elucidate the underlying mechanisms associated with insulin resistance, explore novel protein biomarkers, and discover innovative therapeutic protein targets. Objective: The purpose of this manuscript is to review and analyze the recent computational advances and development of label-free quantification and diabetic marker selection in diabetes proteomics. Methods: Web of Science database, PubMed database and Google Scholar were utilized for searching label-free quantification, computational advances, feature selection and diabetes proteomics. Results: In this study, we systematically review the computational advances of label-free quantification and diabetic marker selection methods which were applied to get the understanding of DM pathological mechanisms. Firstly, different popular quantification measurements and proteomic quantification software tools which have been applied to the diabetes studies are comprehensively discussed. Secondly, a number of popular manipulation methods including transformation, pretreatment (centering, scaling, and normalization), missing value imputation methods and a variety of popular feature selection techniques applied to diabetes proteomic data are overviewed with objective evaluation on their advantages and disadvantages. Finally, the guidelines for the efficient use of the computationbased LFQ technology and feature selection methods in diabetes proteomics are proposed. Conclusion: In summary, this review provides guidelines for researchers who will engage in proteomics biomarker discovery and by properly applying these proteomic computational advances, more reliable therapeutic targets will be found in the field of diabetes mellitus.


Author(s):  
Jing Tang ◽  
Minjie Mou ◽  
Yunxia Wang ◽  
Yongchao Luo ◽  
Feng Zhu

Abstract Metaproteomics suffers from the issues of dimensionality and sparsity. Data reduction methods can maximally identify the relevant subset of significant differential features and reduce data redundancy. Feature selection (FS) methods were applied to obtain the significant differential subset. So far, a variety of feature selection methods have been developed for metaproteomic study. However, due to FS’s performance depended heavily on the data characteristics of a given research, the well-suitable feature selection method must be carefully selected to obtain the reproducible differential proteins. Moreover, it is critical to evaluate the performance of each FS method according to comprehensive criteria, because the single criterion is not sufficient to reflect the overall performance of the FS method. Therefore, we developed an online tool named MetaFS, which provided 13 types of FS methods and conducted the comprehensive evaluation on the complex FS methods using four widely accepted and independent criteria. Furthermore, the function and reliability of MetaFS were systematically tested and validated via two case studies. In sum, MetaFS could be a distinguished tool for discovering the overall well-performed FS method for selecting the potential biomarkers in microbiome studies. The online tool is freely available at https://idrblab.org/metafs/.


Sign in / Sign up

Export Citation Format

Share Document