scholarly journals Evaluation of Genotype-Based Gene Expression Model Performance: A Cross-Framework and Cross-Dataset Study

Genes ◽  
2021 ◽  
Vol 12 (10) ◽  
pp. 1531
Author(s):  
Vânia Tavares ◽  
Joana Monteiro ◽  
Evangelos Vassos ◽  
Jonathan Coleman ◽  
Diana Prata

Predicting gene expression from genotyped data is valuable for studying inaccessible tissues such as the brain. Herein we present eGenScore, a polygenic/poly-variation method, and compare it with PrediXcan, a method based on regularized linear regression using elastic nets. While both methods have the same purpose of predicting gene expression based on genotype, they carry important methodological differences. We compared the performance of expression quantitative trait loci (eQTL) models to predict gene expression in the frontal cortex, comparing across these frameworks (eGenScore vs. PrediXcan) and training datasets (BrainEAC, which is brain-specific, vs. GTEx, which has data across multiple tissues). In addition to internal five-fold cross-validation, we externally validated the gene expression models using the CommonMind Consortium database. Our results showed that (1) PrediXcan outperforms eGenScore regardless of the training database used; and (2) when using PrediXcan, the performance of the eQTL models in frontal cortex is higher when trained with GTEx than with BrainEAC.

BIOMAT 2011 ◽  
2012 ◽  
pp. 153-177
Author(s):  
N. A. BARBOSA ◽  
H DÍAZ ◽  
A. RAMIREZ

2020 ◽  
Vol 106 (5) ◽  
pp. 1132-1133
Author(s):  
D. Adkins ◽  
J. Ley ◽  
N. LaFranzo ◽  
J. Hiken ◽  
I. Schillebeeckx ◽  
...  

2019 ◽  
Vol 9 (10) ◽  
Author(s):  
Marco Bolis ◽  
Mineko Terao ◽  
Linda Pattini ◽  
Enrico Garattini ◽  
Maddalena Fratelli

2014 ◽  
Vol 11 (2) ◽  
pp. 1-14 ◽  
Author(s):  
Markus List ◽  
Anne-Christin Hauschild ◽  
Qihua Tan ◽  
Torben A. Kruse ◽  
Jan Baumbach ◽  
...  

Summary Selecting the most promising treatment strategy for breast cancer crucially depends on determining the correct subtype. In recent years, gene expression profiling has been investigated as an alternative to histochemical methods. Since databases like TCGA provide easy and unrestricted access to gene expression data for hundreds of patients, the challenge is to extract a minimal optimal set of genes with good prognostic properties from a large bulk of genes making a moderate contribution to classification. Several studies have successfully applied machine learning algorithms to solve this so-called gene selection problem. However, more diverse data from other OMICS technologies are available, including methylation. We hypothesize that combining methylation and gene expression data could already lead to a largely improved classification model, since the resulting model will reflect differences not only on the transcriptomic, but also on an epigenetic level. We compared so-called random forest derived classification models based on gene expression and methylation data alone, to a model based on the combined features and to a model based on the gold standard PAM50. We obtained bootstrap errors of 10-20% and classification error of 1-50%, depending on breast cancer subtype and model. The gene expression model was clearly superior to the methylation model, which was also reflected in the combined model, which mainly selected features from gene expression data. However, the methylation model was able to identify unique features not considered as relevant by the gene expression model, which might provide deeper insights into breast cancer subtype differentiation on an epigenetic level.


Sign in / Sign up

Export Citation Format

Share Document