L1 Least Square for Cancer Diagnosis using Gene Expression Data

Author(s):  
Xiyi Hang
2009 ◽  
Vol 6 (2) ◽  
pp. 165-190 ◽  
Author(s):  
Mou'ath Hourani ◽  
Emary El

Gene expression data often contain missing expression values. For the purpose of conducting an effective clustering analysis and since many algorithms for gene expression data analysis require a complete matrix of gene array values, choosing the most effective missing value estimation method is necessary. In this paper, the most commonly used imputation methods from literature are critically reviewed and analyzed to explain the proper use, weakness and point the observations on each published method. From the conducted analysis, we conclude that the Local Least Square (LLS) and Support Vector Regression (SVR) algorithms have achieved the best performances. SVR can be considered as a complement algorithm for LLS especially when applied to noisy data. However, both algorithms suffer from some deficiencies presented in choosing the value of Number of Selected Genes (K) and the appropriate kernel function. To overcome these drawbacks, the need for new method that automatically chooses the parameters of the function and it also has an appropriate computational complexity is imperative.


2009 ◽  
Vol 2009 ◽  
pp. 1-6 ◽  
Author(s):  
Xiyi Hang ◽  
Fang-Xiang Wu

Personalized drug design requires the classification of cancer patients as accurate as possible. With advances in genome sequencing and microarray technology, a large amount of gene expression data has been and will continuously be produced from various cancerous patients. Such cancer-alerted gene expression data allows us to classify tumors at the genomewide level. However, cancer-alerted gene expression datasets typically have much more number of genes (features) than that of samples (patients), which imposes a challenge for classification of tumors. In this paper, a new method is proposed for cancer diagnosis using gene expression data by casting the classification problem as finding sparse representations of test samples with respect to training samples. The sparse representation is computed by thel1-regularized least square method. To investigate its performance, the proposed method is applied to six tumor gene expression datasets and compared with various support vector machine (SVM) methods. The experimental results have shown that the performance of the proposed method is comparable with or better than those of SVMs. In addition, the proposed method is more efficient than SVMs as it has no need of model selection.


2004 ◽  
Vol 11 (2-3) ◽  
pp. 227-242 ◽  
Author(s):  
Balaji Krishnapuram ◽  
Lawrence Carin ◽  
Alexander J. Hartemink

2019 ◽  
Vol 20 (S9) ◽  
Author(s):  
Damiano Verda ◽  
Stefano Parodi ◽  
Enrico Ferrari ◽  
Marco Muselli

Abstract Background Logic Learning Machine (LLM) is an innovative method of supervised analysis capable of constructing models based on simple and intelligible rules. In this investigation the performance of LLM in classifying patients with cancer was evaluated using a set of eight publicly available gene expression databases for cancer diagnosis. LLM accuracy was assessed by summary ROC curve (sROC) analysis and estimated by the area under an sROC curve (sAUC). Its performance was compared in cross validation with that of standard supervised methods, namely: decision tree, artificial neural network, support vector machine (SVM) and k-nearest neighbor classifier. Results LLM showed an excellent accuracy (sAUC = 0.99, 95%CI: 0.98–1.0) and outperformed any other method except SVM. Conclusions LLM is a new powerful tool for the analysis of gene expression data for cancer diagnosis. Simple rules generated by LLM could contribute to a better understanding of cancer biology, potentially addressing therapeutic approaches.


2020 ◽  
Author(s):  
Nageswara Rao Eluri

UNSTRUCTURED Gene selection is considered as the fundamental process under the bioinformatics field, as the cancer classification accuracy completely focused on the genes, which provides biological relevance to the classifying problems. The accurate classification of diverse types of tumor is seeking immense demand in the cancer diagnosis task. However, the existing methodologies pertain to cancer classification are mostly clinical basis, and so its diagnosis capability is limited. Nowadays, the significant problems of cancer diagnosis are solved by the utilization of gene expression data, by which, the researchers have been introducing many possibilities to diagnose cancer in an appropriate and effective way. This paper plans to develop the cancer data classification using gene expression data. Initially, five benchmark gene expression datasets, i.e., “Colon cancer, defused B-cell Lymphoma, Leukaemia, Wisconsin Diagnostic Breast Cancer and Wisconsin Breast Cancer Data” are collected for performing the experiment. The proposed classification model involves three main phases: “(a) Feature extraction, (b) Optimal Feature Selection, and (c) Classification”. From the collected gene expression data, the feature extraction is performed using the first order and second-order statistical measures after data pre-processing. In order to diminish the length of the feature vectors, optimal feature selection is performed, in which a new meta-heuristic algorithm termed as Quantum Inspired Immune Clone Optimization Algorithm (QICO) is used. Once the relevant features are selected, the classification is performed by a deep learning model called Recurrent Neural Network (RNN). Moreover, the number of hidden neurons of RNN is optimized by the same Q-ICOA. The optimal feature selection and classification is performed for selecting the most suitable features and thus maximizing the classification accuracy. Finally, the experimental analysis reveals that the proposed model outperforms the QICO-based feature selection over other heuristic-based feature selection and optimized RNN over other machine learning algorithms


Sign in / Sign up

Export Citation Format

Share Document