The ant colony algorithm for feature selection in high-dimension gene expression data for disease classification

2007 ◽  
Vol 24 (4) ◽  
pp. 413-426 ◽  
Author(s):  
K. R. Robbins ◽  
W. Zhang ◽  
J. K. Bertrand ◽  
R. Rekaya
2019 ◽  
Vol 8 (2) ◽  
pp. 4763-4769

There are different types of fatal diseases that could possibly outspread to various parts of the body. It thus becomes obligatory to predict the existence of such anomalies, in order to prune the extent of their spread. Examining the characteristics of genes provides a deep intuition about the disease classification, as they play a vital role in influencing how an organism appears, behaves and survives in an environment. The detection of the abnormal genes could be efficiently modelled using statistical methods and machine learning approaches. Gene expression data derived from a microarray could act as an aid for this statistical computation. Microarray being a recent leap in molecular biology, provides a scope for hybridization of DNA samples that can be interpreted as values based on the gene expression level that the genome possesses. We propose an idea to select a subset of features from the huge number of samples retrieved from the gene expression profiles using Boruta feature selection algorithm. A comparative study with various supervised classification algorithms is made to categorize this subset to a normal and deviant gene. This serves to discover the most appropriate algorithm to classify the gene expression data. Hence assorting the abnormal genes in future could be accelerated with ease


2019 ◽  
Vol 21 (9) ◽  
pp. 631-645 ◽  
Author(s):  
Saeed Ahmed ◽  
Muhammad Kabir ◽  
Zakir Ali ◽  
Muhammad Arif ◽  
Farman Ali ◽  
...  

Aim and Objective: Cancer is a dangerous disease worldwide, caused by somatic mutations in the genome. Diagnosis of this deadly disease at an early stage is exceptionally new clinical application of microarray data. In DNA microarray technology, gene expression data have a high dimension with small sample size. Therefore, the development of efficient and robust feature selection methods is indispensable that identify a small set of genes to achieve better classification performance. Materials and Methods: In this study, we developed a hybrid feature selection method that integrates correlation-based feature selection (CFS) and Multi-Objective Evolutionary Algorithm (MOEA) approaches which select the highly informative genes. The hybrid model with Redial base function neural network (RBFNN) classifier has been evaluated on 11 benchmark gene expression datasets by employing a 10-fold cross-validation test. Results: The experimental results are compared with seven conventional-based feature selection and other methods in the literature, which shows that our approach owned the obvious merits in the aspect of classification accuracy ratio and some genes selected by extensive comparing with other methods. Conclusion: Our proposed CFS-MOEA algorithm attained up to 100% classification accuracy for six out of eleven datasets with a minimal sized predictive gene subset.


2020 ◽  
Vol 11 ◽  
Author(s):  
Shuhei Kimura ◽  
Ryo Fukutomi ◽  
Masato Tokuhisa ◽  
Mariko Okada

Several researchers have focused on random-forest-based inference methods because of their excellent performance. Some of these inference methods also have a useful ability to analyze both time-series and static gene expression data. However, they are only of use in ranking all of the candidate regulations by assigning them confidence values. None have been capable of detecting the regulations that actually affect a gene of interest. In this study, we propose a method to remove unpromising candidate regulations by combining the random-forest-based inference method with a series of feature selection methods. In addition to detecting unpromising regulations, our proposed method uses outputs from the feature selection methods to adjust the confidence values of all of the candidate regulations that have been computed by the random-forest-based inference method. Numerical experiments showed that the combined application with the feature selection methods improved the performance of the random-forest-based inference method on 99 of the 100 trials performed on the artificial problems. However, the improvement tends to be small, since our combined method succeeded in removing only 19% of the candidate regulations at most. The combined application with the feature selection methods moreover makes the computational cost higher. While a bigger improvement at a lower computational cost would be ideal, we see no impediments to our investigation, given that our aim is to extract as much useful information as possible from a limited amount of gene expression data.


Sign in / Sign up

Export Citation Format

Share Document