scholarly journals Biclustering Analysis Using Plaid Model on Gene Expression Data of Colon Cancer

2021 ◽  
Vol 50 (5) ◽  
pp. 101-114
Author(s):  
Titin Siswantining ◽  
Achmad Eriza Aminanto ◽  
Devvi Sarwinda ◽  
Olivia Swasti

Unlike other typical clustering analysis, which considers column only, biclustering analysis processes a matrix into sub-matrices based on rows and columns simultaneously. One method of bicluster analysis uses the probabilistic model, like the plaid model, that provides overlapping bicluster. The plaid model calculates the value of an element given from a particular sub-matrix for each cell; thus, the value can be seen as the number of contributions of a particular bicluster. The algorithm begins with preparing the input data as a matrix, then an initial model is assessed and makes a residual matrix from the model. After that, we determine bicluster candidates, which are evaluated for its effect parameters and bicluster membership parameters. Finally, the bicluster candidate is pruned to give the optimal bicluster. We implemented the algorithm on gene expression dataset of colon cancer, where the rows and columns contain observations and types of genes, respectively. We carried out in six distinct scenarios in which each scenario uses different model parameters and threshold values. We measured the results using Jaccard index and coherence variance. Our experiments show that biclustering analysis on a model with mean, row, and column effects of colon cancer data output low coherence variance.

Symmetry ◽  
2020 ◽  
Vol 12 (1) ◽  
pp. 154 ◽  
Author(s):  
Ho Sun Shon ◽  
Erdenebileg Batbaatar ◽  
Kyoung Ok Kim ◽  
Eun Jong Cha ◽  
Kyung-Ah Kim

Recently, large-scale bioinformatics and genomic data have been generated using advanced biotechnology methods, thus increasing the importance of analyzing such data. Numerous data mining methods have been developed to process genomic data in the field of bioinformatics. We extracted significant genes for the prognosis prediction of 1157 patients using gene expression data from patients with kidney cancer. We then proposed an end-to-end, cost-sensitive hybrid deep learning (COST-HDL) approach with a cost-sensitive loss function for classification tasks on imbalanced kidney cancer data. Here, we combined the deep symmetric auto encoder; the decoder is symmetric to the encoder in terms of layer structure, with reconstruction loss for non-linear feature extraction and neural network with balanced classification loss for prognosis prediction to address data imbalance problems. Combined clinical data from patients with kidney cancer and gene data were used to determine the optimal classification model and estimate classification accuracy by sample type, primary diagnosis, tumor stage, and vital status as risk factors representing the state of patients. Experimental results showed that the COST-HDL approach was more efficient with gene expression data for kidney cancer prognosis than other conventional machine learning and data mining techniques. These results could be applied to extract features from gene biomarkers for prognosis prediction of kidney cancer and prevention and early diagnosis.


2019 ◽  
Vol 2019 ◽  
pp. 1-9 ◽  
Author(s):  
Anna Pačínková ◽  
Vlad Popovici

The dysfunction of the DNA mismatch repair system results in microsatellite instability (MSI). MSI plays a central role in the development of multiple human cancers. In colon cancer, despite being associated with resistance to 5-fluorouracil treatment, MSI is a favourable prognostic marker. In gastric and endometrial cancers, its prognostic value is not so well established. Nevertheless, recognising the MSI tumours may be important for predicting the therapeutic effect of immune checkpoint inhibitors. Several gene expression signatures were trained on microarray data sets to understand the regulatory mechanisms underlying microsatellite instability in colorectal cancer. A wealth of expression data already exists in the form of microarray data sets. However, the RNA-seq has become a routine for transcriptome analysis. A new MSI gene expression signature presented here is the first to be valid across two different platforms, microarrays and RNA-seq. In the case of colon cancer, its estimated performance was (i) AUC = 0.94, 95% CI = (0.90 – 0.97) on RNA-seq and (ii) AUC = 0.95, 95% CI = (0.92 – 0.97) on microarray. The 25-gene expression signature was also validated in two independent microarray colon cancer data sets. Despite being derived from colorectal cancer, the signature maintained good performance on RNA-seq and microarray gastric cancer data sets (AUC = 0.90, 95% CI = (0.85 – 0.94) and AUC = 0.83, 95% CI = (0.69 – 0.97), respectively). Furthermore, this classifier retained high concordance even when classifying RNA-seq endometrial cancers (AUC = 0.71, 95% CI = (0.62 – 0.81). These results indicate that the new signature was able to remove the platform-specific differences while preserving the underlying biological differences between MSI/MSS phenotypes in colon cancer samples.


2003 ◽  
Vol 19 (9) ◽  
pp. 1079-1089 ◽  
Author(s):  
G. Getz ◽  
H. Gal ◽  
I. Kela ◽  
D. A. Notterman ◽  
E. Domany

Genes ◽  
2018 ◽  
Vol 9 (8) ◽  
pp. 397 ◽  
Author(s):  
Wen-Hui Wang ◽  
Ting-Yan Xie ◽  
Guang-Lei Xie ◽  
Zhong-Lu Ren ◽  
Jin-Ming Li

Identifying molecular subtypes of colorectal cancer (CRC) may allow for more rational, patient-specific treatment. Various studies have identified molecular subtypes for CRC using gene expression data, but they are inconsistent and further research is necessary. From a methodological point of view, a progressive approach is needed to identify molecular subtypes in human colon cancer using gene expression data. We propose an approach to identify the molecular subtypes of colon cancer that integrates denoising by the Bayesian robust principal component analysis (BRPCA) algorithm, hierarchical clustering by the directed bubble hierarchical tree (DBHT) algorithm, and feature gene selection by an improved differential evolution based feature selection method (DEFSW) algorithm. In this approach, the normal samples being completely and exclusively clustered into one class is considered to be the standard of reasonable clustering subtypes, and the feature selection pays attention to imbalances of samples among subtypes. With this approach, we identified the molecular subtypes of colon cancer on the mRNA gene expression dataset of 153 colon cancer samples and 19 normal control samples of the Cancer Genome Atlas (TCGA) project. The colon cancer was clustered into 7 subtypes with 44 feature genes. Our approach could identify finer subtypes of colon cancer with fewer feature genes than the other two recent studies and exhibits a generic methodology that might be applied to identify the subtypes of other cancers.


2009 ◽  
Vol 27 (5) ◽  
pp. 911-914 ◽  
Author(s):  
Cungui CHENG ◽  
Wei XIONG ◽  
Yumei TIAN

2018 ◽  
Vol 19 (4) ◽  
pp. 444-465
Author(s):  
William Chad Young ◽  
Ka Yee Yeung ◽  
Adrian E Raftery

Gene regulatory network reconstruction is an essential task of genomics in order to further our understanding of how genes interact dynamically with each other. The most readily available data, however, are from steady-state observations. These data are not as informative about the relational dynamics between genes as knockout or over-expression experiments, which attempt to control the expression of individual genes. We develop a new framework for network inference using samples from the equilibrium distribution of a vector autoregressive (VAR) time-series model which can be applied to steady-state gene expression data. We explore the theoretical aspects of our method and apply the method to synthetic gene expression data generated using GeneNetWeaver.


Sign in / Sign up

Export Citation Format

Share Document