On integrating multi-experiment microarray data

With the extensive use of microarray technology as a potential prognostic and diagnostic tool, the comparison and reproducibility of results obtained from the use of different platforms is of interest. The integration of those datasets can yield more informative results corresponding to numerous datasets and microarray platforms. We developed a novel integration technique for microarray gene-expression data derived by different studies for the purpose of a two-way Bayesian partition modelling which estimates co-expression profiles under subsets of genes and between biological samples or experimental conditions. The suggested methodology transforms disparate gene-expression data on a common probability scale to obtain inter-study-validated gene signatures. We evaluated the performance of our model using artificial data. Finally, we applied our model to six publicly available cancer gene-expression datasets and compared our results with well-known integrative microarray data methods. Our study shows that the suggested framework can relieve the limited sample size problem while reporting high accuracies by integrating multi-experiment data.

Download Full-text

A Novel Deep Learning Method for Identification of Cancer Genes From Gene Expression Dataset

Machine Learning and Deep Learning in Real-Time Applications - Advances in Computer and Electrical Engineering ◽

10.4018/978-1-7998-3095-5.ch006 ◽

2020 ◽

pp. 129-144

Author(s):

Pyingkodi Maran ◽

Shanthi S. ◽

Thenmozhi K. ◽

Hemalatha D. ◽

Nanthini K.

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Research Area ◽

Microarray Gene Expression Data ◽

Biological Information ◽

Expression Data ◽

Cancer Genes ◽

Experimental Conditions ◽

Microarray Gene Expression ◽

Maximum Similarity

Computational biology is the research area that contributes to the analysis of biological information. The selection of the subset of cancer-related genes is one amongst the foremost promising clinical research of gene expression data. Since a gene can take the role of various biological pathways that in turn can be active only under specific experimental conditions, the stacked denoising auto-encoder(SDAE) and the genetic algorithm were combined to perform biclustering of cancer genes from huge dimensional microarray gene expression data. The Genetic-SDAE proved superior to recently proposed biclustering methods and better to determine the maximum similarity of a set of biclusters of gene expression data with lower MSR and higher gene variance. This work also assesses the results with respect to the discovered genes and spot that the extracted set of biclusters are supported by biological evidence, such as enrichment of gene functions and biological processes.

Download Full-text

A NOVEL COHERENCE MEASURE FOR DISCOVERING SCALING BICLUSTERS FROM GENE EXPRESSION DATA

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720009004370 ◽

2009 ◽

Vol 07 (05) ◽

pp. 853-868 ◽

Cited By ~ 22

Author(s):

ANIRBAN MUKHOPADHYAY ◽

UJJWAL MAULIK ◽

SANGHAMITRA BANDYOPADHYAY

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Biological Significance ◽

Real Life ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Coherence Measure ◽

Experimental Conditions ◽

Microarray Gene Expression ◽

Input Dataset

Biclustering methods are used to identify a subset of genes that are co-regulated in a subset of experimental conditions in microarray gene expression data. Many biclustering algorithms rely on optimizing mean squared residue to discover biclusters from a gene expression dataset. Recently it has been proved that mean squared residue is only good in capturing constant and shifting biclusters. However, scaling biclusters cannot be detected using this metric. In this article, a new coherence measure called scaling mean squared residue (SMSR) is proposed. Theoretically it has been proved that the proposed new measure is able to detect the scaling patterns effectively and it is invariant to local or global scaling of the input dataset. The effectiveness of the proposed coherence measure in detecting scaling patterns has been demonstrated experimentally on artificial and real-life benchmark gene expression datasets. Moreover, biological significance tests have been conducted to show that the biclusters identified using the proposed measure are composed of functionally enriched sets of genes.

Download Full-text

ArraySolver: An Algorithm for Colour-Coded Graphical Display and Wilcoxon Signed-Rank Statistics for Comparing Microarray Gene Expression Data

Comparative and Functional Genomics ◽

10.1002/cfg.369 ◽

2004 ◽

Vol 5 (1) ◽

pp. 39-47 ◽

Cited By ~ 5

Author(s):

Haseeb Ahmad Khan

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Microarray Data ◽

Microarray Gene Expression Data ◽

Rank Test ◽

Graphical Display ◽

Expression Data ◽

Microarray Gene Expression ◽

Signed Rank ◽

Signed Rank Test

The massive surge in the production of microarray data poses a great challenge for proper analysis and interpretation. In recent years numerous computational tools have been developed to extract meaningful interpretation of microarray gene expression data. However, a convenient tool for two-groups comparison of microarray data is still lacking and users have to rely on commercial statistical packages that might be costly and require special skills, in addition to extra time and effort for transferring data from one platform to other. Various statistical methods, including thet-test, analysis of variance, Pearson test and Mann–Whitney U test, have been reported for comparing microarray data, whereas the utilization of the Wilcoxon signed-rank test, which is an appropriate test for two-groups comparison of gene expression data, has largely been neglected in microarray studies. The aim of this investigation was to build an integrated tool, ArraySolver, for colour-coded graphical display and comparison of gene expression data using the Wilcoxon signed-rank test. The results of software validation showed similar outputs with ArraySolver and SPSS for large datasets. Whereas the former program appeared to be more accurate for 25 or fewer pairs (n≤ 25), suggesting its potential application in analysing molecular signatures that usually contain small numbers of genes. The main advantages of ArraySolver are easy data selection, convenient report format, accurate statistics and the familiar Excel platform.

Download Full-text

A Hybrid Method of Feature Extraction for Tumor Classification Using Microarray Gene Expression Data

International Journal of Computer Science and Informatics ◽

10.47893/ijcsi.2011.1005 ◽

2011 ◽

pp. 23-27

Author(s):

Sitanshu Sekhar Sahu ◽

G. PANDA ◽

Ramchandra Barik

Keyword(s):

Gene Expression ◽

Feature Extraction ◽

Gene Expression Data ◽

Microarray Data ◽

Microarray Gene Expression Data ◽

Small Sample ◽

Expression Data ◽

Microarray Gene Expression ◽

Score Statistics ◽

Microarray Gene

Classification of disease phenotypes using microarray gene expression data faces a critical challenge due to its high dimensionality and small sample size nature. Hence there is a need to develop efficient dimension reduction techniques to improve the class prediction performance. In this paper we present a hybrid feature extraction method to combat the dimensionality problem by combining F-score statistics with autoregressive (AR) model. The F-score statistics preselect the discriminant genes from the raw microarray data and then this reduced set is modeled by the AR method to extract the relevant information. A low complexity radial basis function neural network (RBFNN) is also introduced to efficiently classify the microarray data. Exhaustive simulation study on six standard datasets shows the potentiality of the proposed method with the advantage of reduced computational complexity.

Download Full-text

Hybrid Genetic Algorithm and Simulated Annealing for Clustering Microarray Gene Expression data

Journal of Physics Conference Series ◽

10.1088/1742-6596/1767/1/012034 ◽

2021 ◽

Vol 1767 (1) ◽

pp. 012034

Author(s):

M Pandi ◽

T Sivakumar ◽

N Senthil Madasamy ◽

N Sadhasivam

Keyword(s):

Gene Expression ◽

Genetic Algorithm ◽

Simulated Annealing ◽

Gene Expression Data ◽

Hybrid Genetic Algorithm ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

Microarray Gene

Download Full-text

A class imbalance-aware Relief algorithm for the classification of tumors using microarray gene expression data

Computational Biology and Chemistry ◽

10.1016/j.compbiolchem.2019.03.017 ◽

2019 ◽

Vol 80 ◽

pp. 121-127 ◽

Cited By ~ 3

Author(s):

Yuanyu He ◽

Junhai Zhou ◽

Yaping Lin ◽

Tuanfei Zhu

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Class Imbalance ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

Relief Algorithm ◽

Classification Of Tumors ◽

Microarray Gene

Download Full-text

Cox Survival Analysis of Microarray Gene Expression Data Using Correlation Principal Component Regression

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1153 ◽

2007 ◽

Vol 6 (1) ◽

Cited By ~ 4

Author(s):

Qiang Zhao ◽

Jianguo Sun

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Principal Component Regression ◽

Predictive Ability ◽

Principal Component ◽

Microarray Gene Expression Data ◽

Expression Data ◽

Microarray Gene Expression ◽

New Approach ◽

Microarray Gene

Statistical analysis of microarray gene expression data has recently attracted a great deal of attention. One problem of interest is to relate genes to survival outcomes of patients with the purpose of building regression models for the prediction of future patients' survival based on their gene expression data. For this, several authors have discussed the use of the proportional hazards or Cox model after reducing the dimension of the gene expression data. This paper presents a new approach to conduct the Cox survival analysis of microarray gene expression data with the focus on models' predictive ability. The method modifies the correlation principal component regression (Sun, 1995) to handle the censoring problem of survival data. The results based on simulated data and a set of publicly available data on diffuse large B-cell lymphoma show that the proposed method works well in terms of models' robustness and predictive ability in comparison with some existing partial least squares approaches. Also, the new approach is simpler and easy to implement.

Download Full-text