Finite multi-dimensional generalized Gamma Mixture Model Learning for feature selection

2021 ◽  
pp. 147-173
Author(s):  
Basim Alghabashi ◽  
Mohamed Al Mashrgy ◽  
Muhammad Azam ◽  
Nizar Bouguila
2019 ◽  
Vol 87 ◽  
pp. 269-284 ◽  
Author(s):  
Chi Liu ◽  
Heng-Chao Li ◽  
Kun Fu ◽  
Fan Zhang ◽  
Mihai Datcu ◽  
...  

BMC Genomics ◽  
2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Da Xu ◽  
Jialin Zhang ◽  
Hanxiao Xu ◽  
Yusen Zhang ◽  
Wei Chen ◽  
...  

Abstract Background The small number of samples and the curse of dimensionality hamper the better application of deep learning techniques for disease classification. Additionally, the performance of clustering-based feature selection algorithms is still far from being satisfactory due to their limitation in using unsupervised learning methods. To enhance interpretability and overcome this problem, we developed a novel feature selection algorithm. In the meantime, complex genomic data brought great challenges for the identification of biomarkers and therapeutic targets. The current some feature selection methods have the problem of low sensitivity and specificity in this field. Results In this article, we designed a multi-scale clustering-based feature selection algorithm named MCBFS which simultaneously performs feature selection and model learning for genomic data analysis. The experimental results demonstrated that MCBFS is robust and effective by comparing it with seven benchmark and six state-of-the-art supervised methods on eight data sets. The visualization results and the statistical test showed that MCBFS can capture the informative genes and improve the interpretability and visualization of tumor gene expression and single-cell sequencing data. Additionally, we developed a general framework named McbfsNW using gene expression data and protein interaction data to identify robust biomarkers and therapeutic targets for diagnosis and therapy of diseases. The framework incorporates the MCBFS algorithm, network recognition ensemble algorithm and feature selection wrapper. McbfsNW has been applied to the lung adenocarcinoma (LUAD) data sets. The preliminary results demonstrated that higher prediction results can be attained by identified biomarkers on the independent LUAD data set, and we also structured a drug-target network which may be good for LUAD therapy. Conclusions The proposed novel feature selection method is robust and effective for gene selection, classification, and visualization. The framework McbfsNW is practical and helpful for the identification of biomarkers and targets on genomic data. It is believed that the same methods and principles are extensible and applicable to other different kinds of data sets.


Author(s):  
Hermina Petric Maretic ◽  
Mireille El Gheche ◽  
Pascal Frossard
Keyword(s):  

2015 ◽  
Vol 26 (2) ◽  
pp. 400-408 ◽  
Author(s):  
Thanh Minh Nguyen ◽  
Q. M. Jonathan Wu ◽  
Hui Zhang

2020 ◽  
Vol 10 (5) ◽  
pp. 1033-1039
Author(s):  
Huihong Duan ◽  
Xu Wang ◽  
Xingyi He ◽  
Yonggang He ◽  
Litao Song ◽  
...  

Background: In the pulmonary nodules computer aided diagnosis systems (CAD), feature selection plays an important role in reducing the false positive rate and improving the system accuracy. To solve the problem of feature selection techniques by which the diversity of features was damaged in the process of distinguishing malignant pulmonary nodules from benign pulmonary nodules, this study developed a novel feature selection algorithm for improving the accuracy of traditional computer-aided differential diagnosis for benign and malignant classification of pulmonary nodules. Method: Firstly, we divided the extracted features of nodules into several groups by using Gaussian mixture model (GMM). Secondly, we applied Relief and sequential forward selection (SFS) algorithm to find local optimum features dataset for each group. Afterwards, we used the optimumpath forest (OPF) classifier with the found features dataset to obtain the classification results. Finally, the local optimum features dataset with the highest area under curve AUC in all groups were added into the final selected set. Results: According to collected pulmonary nodules on computed tomography (CT) scans, tested with two set of samples, we achieved an average accuracy of 89.5%, sensitivity of 87.1% and specificity of 90.9% on the first set of samples, and 90.1%, 88.7% and 92.1% on the second set of samples. The areas under the receiver operating characteristic (ROC) curves based on these two sample sets were 95.2%, and 96.3% respectively. Conclusions: This study shows that the proposed method was promising for improving the pulmonary nodules computer aided diagnosis systems performance of benign and malignant pulmonary nodules.


Sign in / Sign up

Export Citation Format

Share Document