scholarly journals bcGST - an interactive bias-correction method to identify over-represented gene-sets in boutique arrays

2017 ◽  
Author(s):  
Kevin YX Wang ◽  
Alexander M Menzies ◽  
Ines P Silva ◽  
James S Wilmott ◽  
Yibing Yan ◽  
...  

AbstractMotivation: Gene annotation and pathway databases such as Gene Ontology and Kyoto Encyclopedia of Genes and Genomes are important tools in Gene Set Test (GST) that describe gene biological functions and associated pathways. GST aims to establish an association relationship between a gene set of interest and an annotation. Importantly, GST tests for over-representation of genes in an annotation term. One implicit assumption of GST is that the gene expression platform captures the complete or a very large proportion of the genome. However, this assumption is neither satisfied for the increasingly popular boutique array nor the custom designed gene expression profiling platform. Specifically, conventional GST is no longer appropriate due to the gene set selection bias induced during the construction of these platforms.Results: We propose bcGST, a bias-corrected Gene Set Test by introducing bias correction terms in the contingency table needed for calculating the Fisher’s Exact Test (FET). The adjustment method works by estimating the proportion of genes captured on the array with respect to the genome in order to assist filtration of annotation terms that would otherwise be falsely included or excluded. We illustrate the practicality of bcGST and its stability through multiple differential gene expression analyses in melanoma and TCGA cancer studies.Availability: The bcGST method is made available as a Shiny web application at http://shiny.maths.usyd.edu.au/bcGST/Contact:[email protected]


2020 ◽  
Vol 8 (1) ◽  
pp. 102 ◽  
Author(s):  
Tangcheng Li ◽  
Liying Yu ◽  
Bo Song ◽  
Yue Song ◽  
Ling Li ◽  
...  

Cataloging an accurate functional gene set for the Symbiodiniaceae species is crucial for addressing biological questions of dinoflagellate symbiosis with corals and other invertebrates. To improve the gene models of Fugacium kawagutii, we conducted high-throughput chromosome conformation capture (Hi-C) for the genome and Illumina combined with PacBio sequencing for the transcriptome to achieve a new genome assembly and gene prediction. A 0.937-Gbp assembly of F. kawagutii were obtained, with a N50 > 13 Mbp and the longest scaffold of 121 Mbp capped with telomere motif at both ends. Gene annotation produced 45,192 protein-coding genes, among which, 11,984 are new compared to previous versions of the genome. The newly identified genes are mainly enriched in 38 KEGG pathways including N-Glycan biosynthesis, mRNA surveillance pathway, cell cycle, autophagy, mitophagy, and fatty acid synthesis, which are important for symbiosis, nutrition, and reproduction. The newly identified genes also included those encoding O-methyltransferase (O-MT), 3-dehydroquinate synthase, homologous-pairing protein 2-like (HOP2) and meiosis protein 2 (MEI2), which function in mycosporine-like amino acids (MAAs) biosynthesis and sexual reproduction, respectively. The improved version of the gene set (Fugka_Geneset _V3) raised transcriptomic read mapping rate from 33% to 54% and BUSCO match from 29% to 55%. Further differential gene expression analysis yielded a set of stably expressed genes under variable trace metal conditions, of which 115 with annotated functions have recently been found to be stably expressed under three other conditions, thus further developing the “core gene set” of F. kawagutii. This improved genome will prove useful for future Symbiodiniaceae transcriptomic, gene structure, and gene expression studies, and the refined “core gene set” will be a valuable resource from which to develop reference genes for gene expression studies.



2016 ◽  
Vol 101 (3) ◽  
pp. 1034-1043 ◽  
Author(s):  
Aidan Flynn ◽  
Trisha Dwight ◽  
Jessica Harris ◽  
Diana Benn ◽  
Li Zhou ◽  
...  

Abstract Context: Pheochromocytomas and paragangliomas (PPGLs) are heritable neoplasms that can be classified into gene-expression subtypes corresponding to their underlying specific genetic drivers. Objective: This study aimed to develop a diagnostic and research tool (Pheo-type) capable of classifying PPGL tumors into gene-expression subtypes that could be used to guide and interpret genetic testing, determine surveillance programs, and aid in elucidation of PPGL biology. Design: A compendium of published microarray data representing 205 PPGL tumors was used for the selection of subtype-specific genes that were then translated to the Nanostring gene-expression platform. A support vector machine was trained on the microarray dataset and then tested on an independent Nanostring dataset representing 38 familial and sporadic cases of PPGL of known genotype (RET, NF1, TMEM127, MAX, HRAS, VHL, and SDHx). Different classifier models involving between three and six subtypes were compared for their discrimination potential. Results: A gene set of 46 genes and six endogenous controls was selected representing six known PPGL subtypes; RTK1–3 (RET, NF1, TMEM127, and HRAS), MAX-like, VHL, and SDHx. Of 38 test cases, 34 (90%) were correctly predicted to six subtypes based on the known genotype to gene-expression subtype association. Removal of the RTK2 subtype from training, characterized by an admixture of tumor and normal adrenal cortex, improved the classification accuracy (35/38). Consolidation of RTK and pseudohypoxic PPGL subtypes to four- and then three-class architectures improved the classification accuracy for clinical application. Conclusions: The Pheo-type gene-expression assay is a reliable method for predicting PPGL genotype using routine diagnostic tumor samples.



2015 ◽  
Vol 6 ◽  
Author(s):  
Holger Spiegel ◽  
Alexander Boes ◽  
Nadja Voepel ◽  
Veronique Beiss ◽  
Gueven Edgue ◽  
...  




2020 ◽  
Vol 22 (12) ◽  
pp. 1742-1756 ◽  
Author(s):  
Radia M Johnson ◽  
Heidi S Phillips ◽  
Carlos Bais ◽  
Cameron W Brennan ◽  
Timothy F Cloughesy ◽  
...  

Abstract Background We aimed to develop a gene expression–based prognostic signature for isocitrate dehydrogenase (IDH) wild-type glioblastoma using clinical trial datasets representative of glioblastoma clinical trial populations. Methods Samples were collected from newly diagnosed patients with IDH wild-type glioblastoma in the ARTE, TAMIGA, EORTC 26101 (referred to as “ATE”), AVAglio, and GLARIUS trials, or treated at UCLA. Transcriptional profiling was achieved with the NanoString gene expression platform. To identify genes prognostic for overall survival (OS), we built an elastic net penalized Cox proportional hazards regression model using the discovery ATE dataset. For validation in independent datasets (AVAglio, GLARIUS, UCLA), we combined elastic net–selected genes into a robust z-score signature (ATE score) to overcome gene expression platform differences between discovery and validation cohorts. Results NanoString data were available from 512 patients in the ATE dataset. Elastic net identified a prognostic signature of 9 genes (CHEK1, GPR17, IGF2BP3, MGMT, MTHFD1L, PTRH2, SOX11, S100A9, and TFRC). Translating weighted elastic net scores to the ATE score conserved the prognostic value of the genes. The ATE score was prognostic for OS in the ATE dataset (P < 0.0001), as expected, and in the validation cohorts (AVAglio, P < 0.0001; GLARIUS, P = 0.02; UCLA, P = 0.004). The ATE score remained prognostic following adjustment for O6-methylguanine-DNA methyltransferase (MGMT) promoter methylation status and corticosteroid use at baseline. A positive correlation between ATE score and proneural/proliferative subtypes was observed in patients with MGMT non-methylated promoter status. Conclusions The ATE score showed prognostic value and may enable clinical trial stratification for IDH wild-type glioblastoma.



Sign in / Sign up

Export Citation Format

Share Document