scholarly journals Development and Validation of Predictive Indices for a Continuous Outcome Using Gene Expression Profiles

2010 ◽  
Vol 9 ◽  
pp. CIN.S3805 ◽  
Author(s):  
Yingdong Zhao ◽  
Richard Simon

There have been relatively few publications using linear regression models to predict a continuous response based on microarray expression profiles. Standard linear regression methods are problematic when the number of predictor variables exceeds the number of cases. We have evaluated three linear regression algorithms that can be used for the prediction of a continuous response based on high dimensional gene expression data. The three algorithms are the least angle regression (LAR), the least absolute shrinkage and selection operator (LASSO), and the averaged linear regression method (ALM). All methods are tested using simulations based on a real gene expression dataset and analyses of two sets of real gene expression data and using an unbiased complete cross validation approach. Our results show that the LASSO algorithm often provides a model with somewhat lower prediction error than the LAR method, but both of them perform more efficiently than the ALM predictor. We have developed a plug-in for BRB-ArrayTools that implements the LAR and the LASSO algorithms with complete cross-validation.


2017 ◽  
Vol 16 ◽  
pp. 117693511772851 ◽  
Author(s):  
Baishali Bandyopadhyay ◽  
Veda Chanda ◽  
Yupeng Wang

Background: Constructing gene co-expression networks from cancer expression data is important for investigating the genetic mechanisms underlying cancer. However, correlation coefficients or linear regression models are not able to model sophisticated relationships among gene expression profiles. Here, we address the 3-way interaction that 2 genes’ expression levels are clustered in different space locations under the control of a third gene’s expression levels. Results: We present xSyn, a software tool for identifying such 3-way interactions from cancer gene expression data based on an optimization procedure involving the usage of UPGMA (Unweighted Pair Group Method with Arithmetic Mean) and synergy. The effectiveness is demonstrated by application to 2 real gene expression data sets. Conclusions: xSyn is a useful tool for decoding the complex relationships among gene expression profiles. xSyn is available at http://www.bdxconsult.com/xSyn.html .



Processes ◽  
2019 ◽  
Vol 7 (5) ◽  
pp. 301
Author(s):  
Muying Wang ◽  
Satoshi Fukuyama ◽  
Yoshihiro Kawaoka ◽  
Jason E. Shoemaker

Motivation: Immune cell dynamics is a critical factor of disease-associated pathology (immunopathology) that also impacts the levels of mRNAs in diseased tissue. Deconvolution algorithms attempt to infer cell quantities in a tissue/organ sample based on gene expression profiles and are often evaluated using artificial, non-complex samples. Their accuracy on estimating cell counts given temporal tissue gene expression data remains not well characterized and has never been characterized when using diseased lung. Further, how to remove the effects of cell migration on transcript counts to improve discovery of disease factors is an open question. Results: Four cell count inference (i.e., deconvolution) tools are evaluated using microarray data from influenza-infected lung sampled at several time points post-infection. The analysis finds that inferred cell quantities are accurate only for select cell types and there is a tendency for algorithms to have a good relative fit (R 2 ) but a poor absolute fit (normalized mean squared error; NMSE), which suggests systemic biases exist. Nonetheless, using cell fraction estimates to adjust gene expression data, we show that genes associated with influenza virus replication and increased infection pathology are more likely to be identified as significant than when applying traditional statistical tests.



2015 ◽  
Vol 11 (1) ◽  
pp. 86-96 ◽  
Author(s):  
Aakash Chavan Ravindranath ◽  
Nolen Perualila-Tan ◽  
Adetayo Kasim ◽  
Georgios Drakakis ◽  
Sonia Liggi ◽  
...  

Integrating gene expression profiles with certain proteins can improve our understanding of the fundamental mechanisms in protein–ligand binding.



Author(s):  
Crescenzio Gallo

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this chapter, the authors examine various methods for analyzing gene expression data, addressing the important topics of (1) selecting the most differentially expressed genes, (2) grouping them by means of their relationships, and (3) classifying samples based on gene expressions.



eLife ◽  
2017 ◽  
Vol 6 ◽  
Author(s):  
Julien Racle ◽  
Kaat de Jonge ◽  
Petra Baumgaertner ◽  
Daniel E Speiser ◽  
David Gfeller

Immune cells infiltrating tumors can have important impact on tumor progression and response to therapy. We present an efficient algorithm to simultaneously estimate the fraction of cancer and immune cell types from bulk tumor gene expression data. Our method integrates novel gene expression profiles from each major non-malignant cell type found in tumors, renormalization based on cell-type-specific mRNA content, and the ability to consider uncharacterized and possibly highly variable cell types. Feasibility is demonstrated by validation with flow cytometry, immunohistochemistry and single-cell RNA-Seq analyses of human melanoma and colorectal tumor specimens. Altogether, our work not only improves accuracy but also broadens the scope of absolute cell fraction predictions from tumor gene expression data, and provides a unique novel experimental benchmark for immunogenomics analyses in cancer research (http://epic.gfellerlab.org).



2015 ◽  
Vol 13 (06) ◽  
pp. 1550019 ◽  
Author(s):  
Alexei A. Sharov ◽  
David Schlessinger ◽  
Minoru S. H. Ko

We have developed ExAtlas, an on-line software tool for meta-analysis and visualization of gene expression data. In contrast to existing software tools, ExAtlas compares multi-component data sets and generates results for all combinations (e.g. all gene expression profiles versus all Gene Ontology annotations). ExAtlas handles both users’ own data and data extracted semi-automatically from the public repository (GEO/NCBI database). ExAtlas provides a variety of tools for meta-analyses: (1) standard meta-analysis (fixed effects, random effects, z-score, and Fisher’s methods); (2) analyses of global correlations between gene expression data sets; (3) gene set enrichment; (4) gene set overlap; (5) gene association by expression profile; (6) gene specificity; and (7) statistical analysis (ANOVA, pairwise comparison, and PCA). ExAtlas produces graphical outputs, including heatmaps, scatter-plots, bar-charts, and three-dimensional images. Some of the most widely used public data sets (e.g. GNF/BioGPS, Gene Ontology, KEGG, GAD phenotypes, BrainScan, ENCODE ChIP-seq, and protein–protein interaction) are pre-loaded and can be used for functional annotations.



Cells ◽  
2019 ◽  
Vol 8 (7) ◽  
pp. 675 ◽  
Author(s):  
Xia ◽  
Liu ◽  
Zhang ◽  
Guo

High-throughput technologies generate a tremendous amount of expression data on mRNA, miRNA and protein levels. Mining and visualizing the large amount of expression data requires sophisticated computational skills. An easy to use and user-friendly web-server for the visualization of gene expression profiles could greatly facilitate data exploration and hypothesis generation for biologists. Here, we curated and normalized the gene expression data on mRNA, miRNA and protein levels in 23315, 9009 and 9244 samples, respectively, from 40 tissues (The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GETx)) and 1594 cell lines (Cancer Cell Line Encyclopedia (CCLE) and MD Anderson Cell Lines Project (MCLP)). Then, we constructed the Gene Expression Display Server (GEDS), a web-based tool for quantification, comparison and visualization of gene expression data. GEDS integrates multiscale expression data and provides multiple types of figures and tables to satisfy several kinds of user requirements. The comprehensive expression profiles plotted in the one-stop GEDS platform greatly facilitate experimental biologists utilizing big data for better experimental design and analysis. GEDS is freely available on http://bioinfo.life.hust.edu.cn/web/GEDS/.



Blood ◽  
2013 ◽  
Vol 122 (21) ◽  
pp. 2779-2779 ◽  
Author(s):  
Andrea Pellagatti ◽  
Moritz Gerstung ◽  
Elli Papaemmanuil ◽  
Luca Malcovati ◽  
Aristoteles Giagounidis ◽  
...  

Abstract A particular profile of gene expression can reflect an underlying molecular abnormality in malignancy. Distinct gene expression profiles and deregulated gene pathways can be driven by specific gene mutations and may shed light on the biology of the disease and lead to the identification of new therapeutic targets. We selected 143 cases from our large-scale gene expression profiling (GEP) dataset on bone marrow CD34+ cells from patients with myelodysplastic syndromes (MDS), for which matching genotyping data were obtained using next-generation sequencing of a comprehensive list of 111 genes involved in myeloid malignancies (including the spliceosomal genes SF3B1, SRSF2, U2AF1 and ZRSR2, as well as TET2, ASXL1and many other). The GEP data were then correlated with the mutational status to identify significantly differentially expressed genes associated with each of the most common gene mutations found in MDS. The expression levels of the mutated genes analyzed were generally lower in patients carrying a mutation than in patients wild-type for that gene (e.g. SF3B1, ASXL1 and TP53), with the exception of RUNX1 for which patients carrying a mutation showed higher expression levels than patients without mutation. Principal components analysis showed that the main directions of gene expression changes (principal components) tend to coincide with some of the common gene mutations, including SF3B1, SRSF2 and TP53. SF3B1 and STAG2 were the mutated genes showing the highest number of associated significantly differentially expressed genes, including ABCB7 as differentially expressed in association with SF3B1 mutation and SULT2A1 in association with STAG2 mutation. We found distinct differentially expressed genes associated with the four most common splicing gene mutations (SF3B1, SRSF2, U2AF1 and ZRSR2) in MDS, suggesting that different phenotypes associated with these mutations may be driven by different effects on gene expression and that the target gene may be different. We have also evaluated the prognostic impact of the GEP data in comparison with that of the genotype data and importantly we have found a larger contribution of gene expression data in predicting progression free survival compared to mutation-based multivariate survival models. In summary, this analysis correlating gene expression data with genotype data has revealed that the mutational status shapes the gene expression landscape. We have identified deregulated genes associated with the most common gene mutations in MDS and found that the prognostic power of gene expression data is greater than the prognostic power provided by mutation data. AP and MG contributed equally to this work. JB and PJC are co-senior authors. Disclosures: No relevant conflicts of interest to declare.



2009 ◽  
Vol 45 (2-3) ◽  
pp. 163-171 ◽  
Author(s):  
Marco Muselli ◽  
Massimiliano Costacurta ◽  
Francesca Ruffino


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Bárbara Andrade Barbosa ◽  
Saskia D. van Asten ◽  
Ji Won Oh ◽  
Arantza Farina-Sarasqueta ◽  
Joanne Verheij ◽  
...  

AbstractDeconvolution of bulk gene expression profiles into the cellular components is pivotal to portraying tissue’s complex cellular make-up, such as the tumor microenvironment. However, the inherently variable nature of gene expression requires a comprehensive statistical model and reliable prior knowledge of individual cell types that can be obtained from single-cell RNA sequencing. We introduce BLADE (Bayesian Log-normAl Deconvolution), a unified Bayesian framework to estimate both cellular composition and gene expression profiles for each cell type. Unlike previous comprehensive statistical approaches, BLADE can handle > 20 types of cells due to the efficient variational inference. Throughout an intensive evaluation with > 700 simulated and real datasets, BLADE demonstrated enhanced robustness against gene expression variability and better completeness than conventional methods, in particular, to reconstruct gene expression profiles of each cell type. In summary, BLADE is a powerful tool to unravel heterogeneous cellular activity in complex biological systems from standard bulk gene expression data.



Sign in / Sign up

Export Citation Format

Share Document