scholarly journals Biclustering Gene Expression Data Using Genetic Simulated Annealing Algorithm

DNA microarray technology produces gene expression matrix that consists of an inexorably missing entries due to poor experimental procedures. The missing values are predicted in the matrix for gene expression data are considered to be essential, since most algorithms analyse the gene expression that usually needs a matrix without missing values. In order to address this issue, the present study biclustering Genetic based Simulated Annealing (Genetic SA) algorithm to predict the items that are missing in the gene expression data. The present study uses biclustering method that is considered to be essential for clustering the gene expression data. The performance evaluation shows that the proposed Genetic SA for gene data expression predicts the missing items in an accurate manner than the existing methods.

Author(s):  
Kohbalan Moorthy ◽  
Aws Naser Jaber ◽  
Mohd Arfian Ismail ◽  
Ferda Ernawan ◽  
Mohd Saberi Mohamad ◽  
...  

2019 ◽  
Author(s):  
Pei-Yau Lung ◽  
Xiaodong Pang ◽  
Yan Li ◽  
Jinfeng Zhang

AbstractReusability is part of the FAIR data principle, which aims to make data Findable, Accessible, Interoperable, and Reusable. One of the current efforts to increase the reusability of public genomics data has been to focus on the inclusion of quality metadata associated with the data. When necessary metadata are missing, most researchers will consider the data useless. In this study, we develop a framework to predict the missing metadata of gene expression datasets to maximize their reusability. We propose a new metric called Proportion of Cases Accurately Predicted (PCAP), which is optimized in our specifically-designed machine learning pipeline. The new approach performed better than pipelines using commonly used metrics such as F1-score in terms of maximizing the reusability of data with missing values. We also found that different variables might need to be predicted using different machine learning methods and/or different data processing protocols. Using differential gene expression analysis as an example, we show that when missing variables are accurately predicted, the corresponding gene expression data can be reliably used in downstream analyses.


2009 ◽  
Vol 6 (2) ◽  
pp. 165-190 ◽  
Author(s):  
Mou'ath Hourani ◽  
Emary El

Gene expression data often contain missing expression values. For the purpose of conducting an effective clustering analysis and since many algorithms for gene expression data analysis require a complete matrix of gene array values, choosing the most effective missing value estimation method is necessary. In this paper, the most commonly used imputation methods from literature are critically reviewed and analyzed to explain the proper use, weakness and point the observations on each published method. From the conducted analysis, we conclude that the Local Least Square (LLS) and Support Vector Regression (SVR) algorithms have achieved the best performances. SVR can be considered as a complement algorithm for LLS especially when applied to noisy data. However, both algorithms suffer from some deficiencies presented in choosing the value of Number of Selected Genes (K) and the appropriate kernel function. To overcome these drawbacks, the need for new method that automatically chooses the parameters of the function and it also has an appropriate computational complexity is imperative.


Sign in / Sign up

Export Citation Format

Share Document