Cancer Data Classification by Quantum Inspired Immune Clone Optimization-based Optimal Feature Selection using Gene Expression Data: Deep Learning Approach (Preprint)

2020 ◽  
Author(s):  
Nageswara Rao Eluri

UNSTRUCTURED Gene selection is considered as the fundamental process under the bioinformatics field, as the cancer classification accuracy completely focused on the genes, which provides biological relevance to the classifying problems. The accurate classification of diverse types of tumor is seeking immense demand in the cancer diagnosis task. However, the existing methodologies pertain to cancer classification are mostly clinical basis, and so its diagnosis capability is limited. Nowadays, the significant problems of cancer diagnosis are solved by the utilization of gene expression data, by which, the researchers have been introducing many possibilities to diagnose cancer in an appropriate and effective way. This paper plans to develop the cancer data classification using gene expression data. Initially, five benchmark gene expression datasets, i.e., “Colon cancer, defused B-cell Lymphoma, Leukaemia, Wisconsin Diagnostic Breast Cancer and Wisconsin Breast Cancer Data” are collected for performing the experiment. The proposed classification model involves three main phases: “(a) Feature extraction, (b) Optimal Feature Selection, and (c) Classification”. From the collected gene expression data, the feature extraction is performed using the first order and second-order statistical measures after data pre-processing. In order to diminish the length of the feature vectors, optimal feature selection is performed, in which a new meta-heuristic algorithm termed as Quantum Inspired Immune Clone Optimization Algorithm (QICO) is used. Once the relevant features are selected, the classification is performed by a deep learning model called Recurrent Neural Network (RNN). Moreover, the number of hidden neurons of RNN is optimized by the same Q-ICOA. The optimal feature selection and classification is performed for selecting the most suitable features and thus maximizing the classification accuracy. Finally, the experimental analysis reveals that the proposed model outperforms the QICO-based feature selection over other heuristic-based feature selection and optimized RNN over other machine learning algorithms

2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Nageswara Rao Eluri ◽  
Gangadhara Rao Kancharla ◽  
Suresh Dara ◽  
Venkatesulu Dondeti

PurposeGene selection is considered as the fundamental process in the bioinformatics field. The existing methodologies pertain to cancer classification are mostly clinical basis, and its diagnosis capability is limited. Nowadays, the significant problems of cancer diagnosis are solved by the utilization of gene expression data. The researchers have been introducing many possibilities to diagnose cancer appropriately and effectively. This paper aims to develop the cancer data classification using gene expression data.Design/methodology/approachThe proposed classification model involves three main phases: “(1) Feature extraction, (2) Optimal Feature Selection and (3) Classification”. Initially, five benchmark gene expression datasets are collected. From the collected gene expression data, the feature extraction is performed. To diminish the length of the feature vectors, optimal feature selection is performed, for which a new meta-heuristic algorithm termed as quantum-inspired immune clone optimization algorithm (QICO) is used. Once the relevant features are selected, the classification is performed by a deep learning model called recurrent neural network (RNN). Finally, the experimental analysis reveals that the proposed QICO-based feature selection model outperforms the other heuristic-based feature selection and optimized RNN outperforms the other machine learning methods.FindingsThe proposed QICO-RNN is acquiring the best outcomes at any learning percentage. On considering the learning percentage 85, the accuracy of the proposed QICO-RNN was 3.2% excellent than RNN, 4.3% excellent than RF, 3.8% excellent than NB and 2.1% excellent than KNN for Dataset 1. For Dataset 2, at learning percentage 35, the accuracy of the proposed QICO-RNN was 13.3% exclusive than RNN, 8.9% exclusive than RF and 14.8% exclusive than NB and KNN. Hence, the developed QICO algorithm is performing well in classifying the cancer data using gene expression data accurately.Originality/valueThis paper introduces a new optimal feature selection model using QICO and QICO-based RNN for effective classification of cancer data using gene expression data. This is the first work that utilizes an optimal feature selection model using QICO and QICO-RNN for effective classification of cancer data using gene expression data.


2019 ◽  
Vol 21 (9) ◽  
pp. 631-645 ◽  
Author(s):  
Saeed Ahmed ◽  
Muhammad Kabir ◽  
Zakir Ali ◽  
Muhammad Arif ◽  
Farman Ali ◽  
...  

Aim and Objective: Cancer is a dangerous disease worldwide, caused by somatic mutations in the genome. Diagnosis of this deadly disease at an early stage is exceptionally new clinical application of microarray data. In DNA microarray technology, gene expression data have a high dimension with small sample size. Therefore, the development of efficient and robust feature selection methods is indispensable that identify a small set of genes to achieve better classification performance. Materials and Methods: In this study, we developed a hybrid feature selection method that integrates correlation-based feature selection (CFS) and Multi-Objective Evolutionary Algorithm (MOEA) approaches which select the highly informative genes. The hybrid model with Redial base function neural network (RBFNN) classifier has been evaluated on 11 benchmark gene expression datasets by employing a 10-fold cross-validation test. Results: The experimental results are compared with seven conventional-based feature selection and other methods in the literature, which shows that our approach owned the obvious merits in the aspect of classification accuracy ratio and some genes selected by extensive comparing with other methods. Conclusion: Our proposed CFS-MOEA algorithm attained up to 100% classification accuracy for six out of eleven datasets with a minimal sized predictive gene subset.


2018 ◽  
Vol 21 (6) ◽  
pp. 420-430 ◽  
Author(s):  
Shuaiqun Wang ◽  
Wei Kong ◽  
Aorigele ◽  
Jin Deng ◽  
Shangce Gao ◽  
...  

Aims and Objective: Redundant information of microarray gene expression data makes it difficult for cancer classification. Hence, it is very important for researchers to find appropriate ways to select informative genes for better identification of cancer. This study was undertaken to present a hybrid feature selection method mRMR-ICA which combines minimum redundancy maximum relevance (mRMR) with imperialist competition algorithm (ICA) for cancer classification in this paper. Materials and Methods: The presented algorithm mRMR-ICA utilizes mRMR to delete redundant genes as preprocessing and provide the small datasets for ICA for feature selection. It will use support vector machine (SVM) to evaluate the classification accuracy for feature genes. The fitness function includes classification accuracy and the number of selected genes. Results: Ten benchmark microarray gene expression datasets are used to test the performance of mRMR-ICA. Experimental results including the accuracy of cancer classification and the number of informative genes are improved for mRMR-ICA compared with the original ICA and other evolutionary algorithms. Conclusion: The comparison results demonstrate that mRMR-ICA can effectively delete redundant genes to ensure that the algorithm selects fewer informative genes to get better classification results. It also can shorten calculation time and improve efficiency.


2019 ◽  
Vol 41 (11) ◽  
pp. 1301-1313 ◽  
Author(s):  
Lokeswari Venkataramana ◽  
Shomona Gracia Jacob ◽  
Rajavel Ramadoss ◽  
Dodda Saisuma ◽  
Dommaraju Haritha ◽  
...  

PLoS ONE ◽  
2009 ◽  
Vol 4 (12) ◽  
pp. e8250 ◽  
Author(s):  
Qingzhong Liu ◽  
Andrew H. Sung ◽  
Zhongxue Chen ◽  
Jianzhong Liu ◽  
Xudong Huang ◽  
...  

2005 ◽  
Vol 03 (05) ◽  
pp. 1107-1136 ◽  
Author(s):  
LIANG GOH ◽  
NIKOLA KASABOV

This paper introduces a novel generic approach for classification problems with the objective of achieving maximum classification accuracy with minimum number of features selected. The method is illustrated with several case studies of gene expression data. Our approach integrates filter and wrapper gene selection methods with an added objective of selecting a small set of non-redundant genes that are most relevant for classification with the provision of bins for genes to be swapped in the search for their biological relevance. It is capable of selecting relatively few marker genes while giving comparable or better leave-one-out cross-validation accuracy when compared with gene ranking selection approaches. Additionally, gene profiles can be extracted from the evolving connectionist system, which provides a set of rules that can be further developed into expert systems. The approach uses an integration of Pearson correlation coefficient and signal-to-noise ratio methods with an adaptive evolving classifier applied through the leave-one-out method for validation. Datasets of gene expression from four case studies are used to illustrate the method. The results show the proposed approach leads to an improved feature selection process in terms of reducing the number of variables required and an increased in classification accuracy.


Energies ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 1238
Author(s):  
Supanat Chamchuen ◽  
Apirat Siritaratiwat ◽  
Pradit Fuangfoo ◽  
Puripong Suthisopapan ◽  
Pirat Khunkitti

Power quality disturbance (PQD) is an important issue in electrical distribution systems that needs to be detected promptly and identified to prevent the degradation of system reliability. This work proposes a PQD classification using a novel algorithm, comprised of the artificial bee colony (ABC) and the particle swarm optimization (PSO) algorithms, called “adaptive ABC-PSO” as the feature selection algorithm. The proposed adaptive technique is applied to a combination of ABC and PSO algorithms, and then used as the feature selection algorithm. A discrete wavelet transform is used as the feature extraction method, and a probabilistic neural network is used as the classifier. We found that the highest classification accuracy (99.31%) could be achieved through nine optimally selected features out of all 72 extracted features. Moreover, the proposed PQD classification system demonstrated high performance in a noisy environment, as well as the real distribution system. When comparing the presented PQD classification system’s performance to previous studies, PQD classification accuracy using adaptive ABC-PSO as the optimal feature selection algorithm is considered to be at a high-range scale; therefore, the adaptive ABC-PSO algorithm can be used to classify the PQD in a practical electrical distribution system.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Gaia Griguolo ◽  
Maria Vittoria Dieci ◽  
Laia Paré ◽  
Federica Miglietta ◽  
Daniele Giulio Generali ◽  
...  

AbstractLittle is known regarding the interaction between immune microenvironment and tumor biology in hormone receptor (HR)+/HER2− breast cancer (BC). We here assess pretreatment gene-expression data from 66 HR+/HER2− early BCs from the LETLOB trial and show that non-luminal tumors (HER2-enriched, Basal-like) present higher tumor-infiltrating lymphocyte levels than luminal tumors. Moreover, significant differences in immune infiltrate composition, assessed by CIBERSORT, were observed: non-luminal tumors showed a more proinflammatory antitumor immune infiltrate composition than luminal ones.


Sign in / Sign up

Export Citation Format

Share Document