FACTOR ANALYSIS FOR CROSS-PLATFORM TUMOR CLASSIFICATION BASED ON GENE EXPRESSION PROFILES

2010 ◽  
Vol 19 (01) ◽  
pp. 243-258 ◽  
Author(s):  
SHU-LIN WANG ◽  
JIE GUI ◽  
XUELING LI

Previous studies on tumor classification based on feature extraction from gene expression profiles (GEP) were proven to be effective, but some of such methods lack biomedical meaning to some extent. To deal with this problem, we proposed a novel feature extraction method whose experimental results are of biomedical interpretability and helpful for gaining insight into the structure analysis of gene expression dataset. This method first applied rank sum test to roughly select a set of informative genes and then adopted factor analysis to extract latent factors for tumor classification. Experiments on three pairs of cross-platform tumor datasets indicated that the proposed method can obviously improve the performance of cross-platform classification and only several latent factors, which can represent a large number of informative genes, would obtain very high predictive accuracy on test set. The results also suggested that the classification model trained on one dataset can successfully predict another tumor dataset with the same tumor subtype obtained on different experimental platforms.

2020 ◽  
Author(s):  
Rui Zhang ◽  
Chen Chen ◽  
Qi Li ◽  
Jialu Fu ◽  
Dong Zhang ◽  
...  

Abstract Background: Immune-related genes (IRGs) play a crucial role in the initiation and progression of cholangiocarcinoma (CCA). However, immune signatures have rarely been used to predict prognosis of CCA. The aim of this study was to construct a novel model for CCA to predict survival based on IRGs expression data.Methods: The gene expression profiles and clinical data of CCA patients from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) database were integrated to establish and validate prognostic IRG signatures. Differentially expressed immune-related genes were screened. Univariate and multivariate Cox analysis were performed to identify prognostic IRGs, and the risk model that predicts outcomes was constructed. Furthermore, receiver operating characteristic (ROC) and Kaplan-Meier curve were plotted to examine predictive accuracy of the model, and a nomogram was constructed based on IRGs signature, combining with other clinical characteristics. Finally, CIBERSORT was used to analyze the association of immune cells infiltration with risk score.Results: We identified that 223 IRGs were significantly dysregulated in patients with CCA, among which five IRGs (AVPR1B, CST4, TDGF1, RAET1E and IL9R) were identified as robust indicators for overall survival (OS), and a prognostic model was built based on the IRGs signature. Meanwhile, patients with high risk had worse OS in training and validation cohort, and the area under the ROC was 0.898 and 0.846, respectively. Nomogram demonstrated that immune risk score contributed much more points than other clinicopathological variables, with a C-index of 0.819 (95% CI, 0.727-0.911). Finally, we found that IRGs signature was positively correlated with the proportion of CD8+ T cells, neurophils and T gamma delta, while negatively with that of CD4+ memory resting T cells.Conclusions: We established and validated an effective five IRGs-based prediction model for CCA, which could accurately classify patients into groups with low and high risk of poor prognosis.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Kota Fujisawa ◽  
Mamoru Shimo ◽  
Y.-H. Taguchi ◽  
Shinya Ikematsu ◽  
Ryota Miyata

AbstractCoronavirus disease 2019 (COVID-19) is raging worldwide. This potentially fatal infectious disease is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). However, the complete mechanism of COVID-19 is not well understood. Therefore, we analyzed gene expression profiles of COVID-19 patients to identify disease-related genes through an innovative machine learning method that enables a data-driven strategy for gene selection from a data set with a small number of samples and many candidates. Principal-component-analysis-based unsupervised feature extraction (PCAUFE) was applied to the RNA expression profiles of 16 COVID-19 patients and 18 healthy control subjects. The results identified 123 genes as critical for COVID-19 progression from 60,683 candidate probes, including immune-related genes. The 123 genes were enriched in binding sites for transcription factors NFKB1 and RELA, which are involved in various biological phenomena such as immune response and cell survival: the primary mediator of canonical nuclear factor-kappa B (NF-κB) activity is the heterodimer RelA-p50. The genes were also enriched in histone modification H3K36me3, and they largely overlapped the target genes of NFKB1 and RELA. We found that the overlapping genes were downregulated in COVID-19 patients. These results suggest that canonical NF-κB activity was suppressed by H3K36me3 in COVID-19 patient blood.


2019 ◽  
Author(s):  
Necla Koçhan ◽  
Gözde Yazgı Tütüncü ◽  
Göknur Giner

AbstractBackground and ObjectiveRecent developments in the next-generation sequencing (NGS) based on RNA-sequencing (RNA-Seq) allow researchers to measure the expression levels of thousands of genes for multiple samples simultaneously. In order to analyze these kind of data sets, many classification models have been proposed in the literature. Most of the existing classifiers assume that genes are independent; however, this is not a realistic approach for real RNA-Seq classification problems. For this reason, some other classification methods, which incorporates the dependence structure between genes into a model, are proposed. qtQDA proposed by Koçhan et al. [1] is one of those classifiers, which estimates covariance matrix by Maximum Likelihood Estimator.MethodsIn this study, we use a another approach based on local dependence function to estimate the covariance matrix to be used in the qtQDA classification model. We investigate the impact of different covariance estimates on RNA-Seq data classification.ResultsThe performances of qtQDA classifier based on two different covariance matrix estimates are compared over two real RNA-Seq data sets, in terms of classification error rates. The results show that using local dependence function approach yields a better estimate of covariance matrix and increases the performance of qtQDA classifier.ConclusionIncorporating the true/accurate covariance matrix into the classification model is an important and crucial step particularly for cancer prediction. The local covariance matrix estimate allows researchers to classify cancer patients based on gene expression profiles more accurately. R code for local dependence function is available at https://github.com/Necla/LocalDependence.


2015 ◽  
Vol 148 (2) ◽  
pp. 460-472 ◽  
Author(s):  
A. Francina Webster ◽  
Paul Zumbo ◽  
Jennifer Fostel ◽  
Jorge Gandara ◽  
Susan D. Hester ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document