scholarly journals ILRC : A Hybrid Biomarker Discovery Algorithm Based on Improved L1 Regularization and Clustering in Microarray Data

Author(s):  
Kun Yu ◽  
Weidong Xie ◽  
Linjie Wang ◽  
Wei Li

Abstract Background Finding significant genes or proteins from gene chip data for disease diagnosis and drug development is an important task, and the challenge comes from the curse of the data dimension. It is of great significance to use machine learning methods to find important features from the data and build an accurate classification model. Results The proposed Method has proved superior to the published advanced hybrid feature selection method and traditional feature selection method on different public microarray data sets. In addition, the results on the cleft lip and palate data set with known biomarkers provided by the cooperative hospital show that compared with other methods, our method can preferentially select these biomarkers. Method In this paper, a feature selection algorithm ILRC based on clustering and improved L1 regularization is proposed. In this method, the features are first clustered, and the redundant features in the sub-clusters are deleted. Then all the remaining features are iteratively evaluated using ILR, and the final result is output according to the cumulative weight reordering. Conclusion The proposed method can effectively remove redundant features. The algorithm's output has high stability and classification accuracy and can potentially select potential biomarkers.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Kun Yu ◽  
Weidong Xie ◽  
Linjie Wang ◽  
Wei Li

Abstract Background Finding significant genes or proteins from gene chip data for disease diagnosis and drug development is an important task. However, the challenge comes from the curse of the data dimension. It is of great significance to use machine learning methods to find important features from the data and build an accurate classification model. Results The proposed method has proved superior to the published advanced hybrid feature selection method and traditional feature selection method on different public microarray data sets. In addition, the biomarkers selected using our method show a match to those provided by the cooperative hospital in a set of clinical cleft lip and palate data. Method In this paper, a feature selection algorithm ILRC based on clustering and improved L1 regularization is proposed. The features are firstly clustered, and the redundant features in the sub-clusters are deleted. Then all the remaining features are iteratively evaluated using ILR. The final result is given according to the cumulative weight reordering. Conclusion The proposed method can effectively remove redundant features. The algorithm’s output has high stability and classification accuracy, which can potentially select potential biomarkers.


2021 ◽  
Author(s):  
Kun Yu ◽  
Weidong Xie ◽  
Linjie Wang ◽  
Wei Li

Abstract Background: Finding significant genes or proteins from gene chip data for disease diagnosis and drug development is an important task, and the challenge comes from the curse of the data dimension. It is of great significance to use machine learning methods to find important features from the data and build an accurate classification model. Results: The proposed Mehtod has proved superior to the published advanced hybrid feature selection method and traditional feature selection method on different public microarray data sets. In addition, the results on the cleft lip and palate data set with known biomarkers provided by the cooperative hospital show that compared with other methods, our method can preferentially select these biomarkers. Method: In this paper, a feature selection algorithm ILRC based on clustering and improved L1 regularization is proposed. In this method, the features are first clustered, and the redundant features in the sub-clusters are deleted. Then all the remaining features are iteratively evaluated using ILR, and the final result is output according to the cumulative weight reordering. Conclusion: The proposed method can effectively remove redundant features. The algorithm’s output has high stability and classification accuracy and can potentially select potential biomarkers.


Author(s):  
B. Venkatesh ◽  
J. Anuradha

In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.


2021 ◽  
pp. 1063293X2110160
Author(s):  
Dinesh Morkonda Gunasekaran ◽  
Prabha Dhandayudam

Nowadays women are commonly diagnosed with breast cancer. Feature based Selection method plays an important step while constructing a classification based framework. We have proposed Multi filter union (MFU) feature selection method for breast cancer data set. The feature selection process based on random forest algorithm and Logistic regression (LG) algorithm based union model is used for selecting important features in the dataset. The performance of the data analysis is evaluated using optimal features subset from selected dataset. The experiments are computed with data set of Wisconsin diagnostic breast cancer center and next the real data set from women health care center. The result of the proposed approach shows high performance and efficient when comparing with existing feature selection algorithms.


2010 ◽  
Vol 44-47 ◽  
pp. 1130-1134
Author(s):  
Sheng Li ◽  
Pei Lin Zhang ◽  
Bing Li

Feature selection is a key step in hydraulic system fault diagnosis. Some of the collected features are unrelated to classification model, and some are high correlated to other features. These features are harmful for establishing classification model. In order to solve this problem, genetic algorithm-partial least squares (GA-PLS) is proposed for selecting the representative and optimal features. K nearest neighbor algorithm (KNN) is used for diagnosing and classifying hydraulic system faults. For expressing better performance of GA-PLS, the original data of a model engineering hydraulic system is used, and the results of GA-PLS are compared with all feature used and GA. The experimental results show that, the proposed feature method can diagnose and classify hydraulic system faults more efficiently with using fewer features.


2009 ◽  
Vol 2009 ◽  
pp. 1-16 ◽  
Author(s):  
Nirmalya Bandyopadhyay ◽  
Tamer Kahveci ◽  
Steve Goodison ◽  
Y. Sun ◽  
Sanjay Ranka

Classification of cancers based on gene expressions produces better accuracy when compared to that of the clinical markers. Feature selection improves the accuracy of these classification algorithms by reducing the chance of overfitting that happens due to large number of features. We develop a new feature selection method called Biological Pathway-based Feature Selection (BPFS) for microarray data. Unlike most of the existing methods, our method integrates signaling and gene regulatory pathways with gene expression data to minimize the chance of overfitting of the method and to improve the test accuracy. Thus, BPFS selects a biologically meaningful feature set that is minimally redundant. Our experiments on published breast cancer datasets demonstrate that all of the top 20 genes found by our method are associated with cancer. Furthermore, the classification accuracy of our signature is up to 18% better than that of vant Veers 70 gene signature, and it is up to 8% better accuracy than the best published feature selection method, I-RELIEF.


2021 ◽  
Author(s):  
Marta Ferreira ◽  
Pierre Lovinfosse ◽  
Johanne Hermesse ◽  
Marjolein Decuypere ◽  
Caroline Rousseau ◽  
...  

Abstract Background Features reproducibility and the generalizability of the models are currently among the most important limitations when integrating radiomics into the clinics. Radiomic features are sensitive to imaging acquisition protocols, reconstruction algorithms and parameters, as well as by the different steps of the usual radiomics workflow. We propose a framework for comparing the reproducibility of different pre-processing steps in PET/CT radiomic analysis in the prediction of disease free survival (DFS) across multi-scanners/centers. Results We evaluated and compared the prediction performance of several models that differ in i) the type of intensity discretization, ii) feature selection method, iii) features type i.e, original or tumour to liver ratio radiomic features (OR or TLR). We trained our models using data from one scanner/center and tested on two external scanner/centers. Our results show that there is a low reproducibility in predictions across scanners and discretization methods. Despite of this, TLR based models were generally more robust than OR. Maximum relevance minimum redundancy (MRMR) forward feature selection with Pearson correlation was the feature selection method that had the best mean area under the precision recall curve when using it combining the features from all discretization’s bin’s number (D_All_FBN) with TLR features for two of the four classifiers. Conclusion We evaluated and compared the prediction performance of several models in a data set containing hundred fifty-eight patients with locally advanced cervical cancer (LACC) from three distinct scanners. In our cohort of LAAC patients pre-processing of radiomic features in [18F]FDG PET affects DFS predictions performances across scanners and combining the D_All_FBN TLR approach with the MRMR forward Pearson feature selection method might help increasing robustness of radiomic studies.


Sign in / Sign up

Export Citation Format

Share Document