An Improved Fuzzy Based Missing Value Estimation in DNA Microarray Validated by Gene Ranking

Most of the gene expression data analysis algorithms require the entire gene expression matrix without any missing values. Hence, it is necessary to devise methods which would impute missing data values accurately. There exist a number of imputation algorithms to estimate those missing values. This work starts with a microarray dataset containing multiple missing values. We first apply the modified version of the fuzzy theory based existing method LRFDVImpute to impute multiple missing values of time series gene expression data and then validate the result of imputation by genetic algorithm (GA) based gene ranking methodology along with some regular statistical validation techniques, like RMSE method. Gene ranking, as far as our knowledge, has not been used yet to validate the result of missing value estimation. Firstly, the proposed method has been tested on the very popular Spellman dataset and results show that error margins have been drastically reduced compared to some previous works, which indirectly validates the statistical significance of the proposed method. Then it has been applied on four other 2-class benchmark datasets, like Colorectal Cancer tumours dataset (GDS4382), Breast Cancer dataset (GSE349-350), Prostate Cancer dataset, and DLBCL-FL (Leukaemia) for both missing value estimation and ranking the genes, and the results show that the proposed method can reach 100% classification accuracy with very few dominant genes, which indirectly validates the biological significance of the proposed method.

Download Full-text

A novel interpolation based missing value estimation method to predict missing values in microarray gene expression data

2012 International Conference on Communications, Devices and Intelligent Systems (CODIS) ◽

10.1109/codis.2012.6422202 ◽

2012 ◽

Cited By ~ 3

Author(s):

Shilpi Bose ◽

Chandra Das ◽

Sourav Dutta ◽

Samiran Chattopadhyay

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Missing Values ◽

Estimation Method ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

Value Estimation ◽

Missing Value Estimation ◽

Microarray Gene

Download Full-text

Missing value estimation for DNA microarray gene expression data with principal curves

2010 International Conference on Bioinformatics and Biomedical Technology ◽

10.1109/icbbt.2010.5478964 ◽

2010 ◽

Author(s):

Jinlong Shi ◽

Zhigang Luo

Keyword(s):

Gene Expression ◽

Dna Microarray ◽

Gene Expression Data ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Principal Curves ◽

Microarray Gene Expression ◽

Value Estimation ◽

Missing Value Estimation ◽

Microarray Gene

Download Full-text

A weighted Local Least Squares Imputation method for missing value estimation in microarray gene expression data

International Journal of Data Mining and Bioinformatics ◽

10.1504/ijdmb.2010.033524 ◽

2010 ◽

Vol 4 (3) ◽

pp. 331 ◽

Cited By ~ 11

Author(s):

Wai Ki Ching ◽

min Li ◽

Nam Kiu Tsing ◽

Ching Wan Tai ◽

Tuen Wai Ng ◽

...

Keyword(s):

Gene Expression ◽

Least Squares ◽

Gene Expression Data ◽

Microarray Gene Expression Data ◽

Imputation Method ◽

Expression Data ◽

Microarray Gene Expression ◽

Value Estimation ◽

Missing Value Estimation ◽

Microarray Gene

Download Full-text

Microarray missing values imputation methods: Critical analysis review

Computer Science and Information Systems ◽

10.2298/csis0902165h ◽

2009 ◽

Vol 6 (2) ◽

pp. 165-190 ◽

Cited By ~ 6

Author(s):

Mou'ath Hourani ◽

Emary El

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Missing Values ◽

Gene Array ◽

Estimation Method ◽

Least Square ◽

Support Vector ◽

Expression Data ◽

Imputation Methods ◽

Value Estimation

Gene expression data often contain missing expression values. For the purpose of conducting an effective clustering analysis and since many algorithms for gene expression data analysis require a complete matrix of gene array values, choosing the most effective missing value estimation method is necessary. In this paper, the most commonly used imputation methods from literature are critically reviewed and analyzed to explain the proper use, weakness and point the observations on each published method. From the conducted analysis, we conclude that the Local Least Square (LLS) and Support Vector Regression (SVR) algorithms have achieved the best performances. SVR can be considered as a complement algorithm for LLS especially when applied to noisy data. However, both algorithms suffer from some deficiencies presented in choosing the value of Number of Selected Genes (K) and the appropriate kernel function. To overcome these drawbacks, the need for new method that automatically chooses the parameters of the function and it also has an appropriate computational complexity is imperative.

Download Full-text

Missing value estimation for DNA microarray gene expression data: local least squares imputation

Bioinformatics ◽

10.1093/bioinformatics/btk053 ◽

2006 ◽

Vol 22 (11) ◽

pp. 1410-1411 ◽

Cited By ~ 13

Author(s):

H. Kim ◽

G. H. Golub ◽

H. Park

Keyword(s):

Gene Expression ◽

Least Squares ◽

Dna Microarray ◽

Gene Expression Data ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

Value Estimation ◽

Missing Value Estimation ◽

Microarray Gene

Download Full-text

ITERATED LOCAL LEAST SQUARES MICROARRAY MISSING VALUE IMPUTATION

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720006002302 ◽

2006 ◽

Vol 04 (05) ◽

pp. 935-957 ◽

Cited By ~ 51

Author(s):

ZHIPENG CAI ◽

MAYSAM HEYDARI ◽

GUOHUI LIN

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Least Squares ◽

Gene Expression Data ◽

Missing Values ◽

Target Genes ◽

Accurate Estimation ◽

Expression Data ◽

Microarray Gene Expression ◽

Missing Value

Microarray gene expression data often contains multiple missing values due to various reasons. However, most of gene expression data analysis algorithms require complete expression data. Therefore, accurate estimation of the missing values is critical to further data analysis. In this paper, an Iterated Local Least Squares Imputation (ILLSimpute) method is proposed for estimating missing values. Two unique features of ILLSimpute method are: ILLSimpute method does not fix a common number of coherent genes for target genes for estimation purpose, but defines coherent genes as those within a distance threshold to the target genes. Secondly, in ILLSimpute method, estimated values in one iteration are used for missing value estimation in the next iteration and the method terminates after certain iterations or the imputed values converge. Experimental results on six real microarray datasets showed that ILLSimpute method performed at least as well as, and most of the time much better than, five most recent imputation methods.

Download Full-text

Missing value imputation for gene expression data by tailored nearest neighbors

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2015-0098 ◽

2017 ◽

Vol 16 (2) ◽

Cited By ~ 5

Author(s):

Shahla Faisal ◽

Gerhard Tutz

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Missing Values ◽

Human Cancer ◽

High Dimensional Data ◽

Nearest Neighbors ◽

High Dimensional ◽

Expression Data ◽

Missing Value ◽

Missing Value Imputation

AbstractHigh dimensional data like gene expression and RNA-sequences often contain missing values. The subsequent analysis and results based on these incomplete data can suffer strongly from the presence of these missing values. Several approaches to imputation of missing values in gene expression data have been developed but the task is difficult due to the high dimensionality (number of genes) of the data. Here an imputation procedure is proposed that uses weighted nearest neighbors. Instead of using nearest neighbors defined by a distance that includes all genes the distance is computed for genes that are apt to contribute to the accuracy of imputed values. The method aims at avoiding the curse of dimensionality, which typically occurs if local methods as nearest neighbors are applied in high dimensional settings. The proposed weighted nearest neighbors algorithm is compared to existing missing value imputation techniques like mean imputation, KNNimpute and the recently proposed imputation by random forests. We use RNA-sequence and microarray data from studies on human cancer to compare the performance of the methods. The results from simulations as well as real studies show that the weighted distance procedure can successfully handle missing values for high dimensional data structures where the number of predictors is larger than the number of samples. The method typically outperforms the considered competitors.

Download Full-text