scholarly journals Generation of patterns from gene expression data by assigning confidence to differentially expressed genes

2000 ◽  
Vol 16 (8) ◽  
pp. 685-698 ◽  
Author(s):  
E. Manduchi ◽  
G. R. Grant ◽  
S. E. McKenzie ◽  
G. C. Overton ◽  
S. Surrey ◽  
...  
Blood ◽  
2013 ◽  
Vol 122 (21) ◽  
pp. 2779-2779 ◽  
Author(s):  
Andrea Pellagatti ◽  
Moritz Gerstung ◽  
Elli Papaemmanuil ◽  
Luca Malcovati ◽  
Aristoteles Giagounidis ◽  
...  

Abstract A particular profile of gene expression can reflect an underlying molecular abnormality in malignancy. Distinct gene expression profiles and deregulated gene pathways can be driven by specific gene mutations and may shed light on the biology of the disease and lead to the identification of new therapeutic targets. We selected 143 cases from our large-scale gene expression profiling (GEP) dataset on bone marrow CD34+ cells from patients with myelodysplastic syndromes (MDS), for which matching genotyping data were obtained using next-generation sequencing of a comprehensive list of 111 genes involved in myeloid malignancies (including the spliceosomal genes SF3B1, SRSF2, U2AF1 and ZRSR2, as well as TET2, ASXL1and many other). The GEP data were then correlated with the mutational status to identify significantly differentially expressed genes associated with each of the most common gene mutations found in MDS. The expression levels of the mutated genes analyzed were generally lower in patients carrying a mutation than in patients wild-type for that gene (e.g. SF3B1, ASXL1 and TP53), with the exception of RUNX1 for which patients carrying a mutation showed higher expression levels than patients without mutation. Principal components analysis showed that the main directions of gene expression changes (principal components) tend to coincide with some of the common gene mutations, including SF3B1, SRSF2 and TP53. SF3B1 and STAG2 were the mutated genes showing the highest number of associated significantly differentially expressed genes, including ABCB7 as differentially expressed in association with SF3B1 mutation and SULT2A1 in association with STAG2 mutation. We found distinct differentially expressed genes associated with the four most common splicing gene mutations (SF3B1, SRSF2, U2AF1 and ZRSR2) in MDS, suggesting that different phenotypes associated with these mutations may be driven by different effects on gene expression and that the target gene may be different. We have also evaluated the prognostic impact of the GEP data in comparison with that of the genotype data and importantly we have found a larger contribution of gene expression data in predicting progression free survival compared to mutation-based multivariate survival models. In summary, this analysis correlating gene expression data with genotype data has revealed that the mutational status shapes the gene expression landscape. We have identified deregulated genes associated with the most common gene mutations in MDS and found that the prognostic power of gene expression data is greater than the prognostic power provided by mutation data. AP and MG contributed equally to this work. JB and PJC are co-senior authors. Disclosures: No relevant conflicts of interest to declare.


2020 ◽  
Vol 15 (4) ◽  
pp. 359-367
Author(s):  
Yong-Jing Hao ◽  
Mi-Xiao Hou ◽  
Ying-Lian Gao ◽  
Jin-Xing Liu ◽  
Xiang-Zhen Kong

Background: Non-negative Matrix Factorization (NMF) has been extensively used in gene expression data. However, most NMF-based methods have single-layer structures, which may achieve poor performance for complex data. Deep learning, with its carefully designed hierarchical structure, has shown significant advantages in learning data features. Objective: In bioinformatics, on the one hand, to discover differentially expressed genes in gene expression data; on the other hand, to obtain higher sample clustering results. It can provide the reference value for the prevention and treatment of cancer. Method: In this paper, we apply a deep NMF method called Deep Semi-NMF on the integrated gene expression data. In each layer, the coefficient matrix is directly decomposed into the basic and coefficient matrix of the next layer. We apply this factorization model on The Cancer Genome Atlas (TCGA) genomic data. Results: The experimental results demonstrate the superiority of Deep Semi-NMF method in identifying differentially expressed genes and clustering samples. Conclusion: The Deep Semi-NMF model decomposes a matrix into multiple matrices and multiplies them to form a matrix. It can also improve the clustering performance of samples while digging out more accurate key genes for disease treatment.


2015 ◽  
Vol 14 (1) ◽  
pp. 2146-2155 ◽  
Author(s):  
L.F. Ning ◽  
Y.Q. Yu ◽  
E.T. GuoJi ◽  
C.G. Kou ◽  
Y.H. Wu ◽  
...  

2003 ◽  
Vol 12 (2) ◽  
pp. 159-162 ◽  
Author(s):  
Chiara Romualdi ◽  
Stefania Bortoluzzi ◽  
Fabio d’Alessi ◽  
Gian Antonio Danieli

Here we present a novel web tool for the statistical analysis of gene expression data in multiple tag sampling experiments. Differentially expressed genes are detected by using six different test statistics. Result tables, linked to the GenBank, UniGene, or LocusLink database, can be browsed or searched in different ways. Software is freely available at the site: http://telethon.bio.unipd.it/bioinfo/IDEG6_form/ , together with additional information on statistical methodologies.


2019 ◽  
Vol 20 (S22) ◽  
Author(s):  
Chun-Mei Feng ◽  
Yong Xu ◽  
Mi-Xiao Hou ◽  
Ling-Yun Dai ◽  
Jun-Liang Shang

Abstract Background In recent years, identification of differentially expressed genes and sample clustering have become hot topics in bioinformatics. Principal Component Analysis (PCA) is a widely used method in gene expression data. However, it has two limitations: first, the geometric structure hidden in data, e.g., pair-wise distance between data points, have not been explored. This information can facilitate sample clustering; second, the Principal Components (PCs) determined by PCA are dense, leading to hard interpretation. However, only a few of genes are related to the cancer. It is of great significance for the early diagnosis and treatment of cancer to identify a handful of the differentially expressed genes and find new cancer biomarkers. Results In this study, a new method gLSPCA is proposed to integrate both graph Laplacian and sparse constraint into PCA. gLSPCA on the one hand improves the clustering accuracy by exploring the internal geometric structure of the data, on the other hand identifies differentially expressed genes by imposing a sparsity constraint on the PCs. Conclusions Experiments of gLSPCA and its comparison with existing methods, including Z-SPCA, GPower, PathSPCA, SPCArt, gLPCA, are performed on real datasets of both pancreatic cancer (PAAD) and head & neck squamous carcinoma (HNSC). The results demonstrate that gLSPCA is effective in identifying differentially expressed genes and sample clustering. In addition, the applications of gLSPCA on these datasets provide several new clues for the exploration of causative factors of PAAD and HNSC.


Sign in / Sign up

Export Citation Format

Share Document