scholarly journals Variation-preserving normalization unveils blind spots in gene expression profiling

2017 ◽  
Vol 7 (1) ◽  
Author(s):  
Carlos P. Roca ◽  
Susana I. L. Gomes ◽  
Mónica J. B. Amorim ◽  
Janeck J. Scott-Fordsmand

Abstract RNA-Seq and gene expression microarrays provide comprehensive profiles of gene activity, but lack of reproducibility has hindered their application. A key challenge in the data analysis is the normalization of gene expression levels, which is currently performed following the implicit assumption that most genes are not differentially expressed. Here, we present a mathematical approach to normalization that makes no assumption of this sort. We have found that variation in gene expression is much larger than currently believed, and that it can be measured with available assays. Our results also explain, at least partially, the reproducibility problems encountered in transcriptomics studies. We expect that this improvement in detection will help efforts to realize the full potential of gene expression profiling, especially in analyses of cellular processes involving complex modulations of gene expression.


2015 ◽  
Author(s):  
Carlos P. Roca ◽  
Susana I. L. Gomes ◽  
Mónica J. B. Amorim ◽  
Janeck J. Scott-Fordsmand

RNA-Seq and gene expression microarrays provide comprehensive profiles of gene activity, but lack of reproducibility has hindered their application. A key challenge in the data analysis is the normalization of gene expression levels, which is currently performed following an implicit assumption that most genes are not differentially expressed. Here, we present a mathematical approach to normalization that makes no assumption of this sort. We have found that variation in gene expression is much greater than currently believed, and that it can be measured with available technologies. Our results also explain, at least partially, the problems encountered in transcriptomics studies. We expect this improvement in detection to help efforts to realize the full potential of gene expression profiling, especially in analyses of cellular processes involving complex modulations of gene expression.





Blood ◽  
2013 ◽  
Vol 122 (21) ◽  
pp. 534-534
Author(s):  
Venkata D Yellapantula ◽  
Christopher Murray ◽  
Winnie Liang ◽  
Daniel Auclair ◽  
Joan Levy ◽  
...  

Abstract The Multiple Myeloma Research Consortium (MMRC) has characterized over 300 hundred patient samples using a variety of platforms as part of the Multiple Myeloma Genomics Initiative (MMGI). Part of this large study includes a subset of 84 patients that were screened for somatic mutations using whole genome sequencing (WGS) or whole exome sequencing (WES) in combination with mRNA sequencing. This represents one of the first cohorts of myeloma patients with matched genome and transcriptome sequencing results. Given the historic value of microarray based gene expression profiling (GEP), this cohort provides the unique opportunity to compare gene expression measurements from the two platforms as Affymetrix U133Plus2.0 based GEP was performed on 42 of these samples. As part of the MMGI study, the Broad Institute has completed the genome sequencing, using WGS and WES, for 213 patients. A frequently mutated list of 9 genes including NRAS, KRAS, TP53, PNRC1, MAGED1, FAM46C, DIS3, CCND1 and ALOX12B were identified initially. Given the potential for RNAseq data to be used to define gene expression levels and to identify mutations in expressed genes we tested the feasibility of mutation calling on RNAseq alone. We independently called mutations on the entire transcriptome of the 84 patients and used a filtering method to eliminate likely germline variants in the absence of a matched normal control. We looked for point mutation concordance between, the calls identified by RNA-Seq alone and the previously reported variants through exome sequencing in the 9 frequently mutated genes. Out of the 66 SNV’s identified by these criteria using WGS or WES sequencing, 55(84%) were detected using RNA-Seq. Of the remaining 11 loci, 7(10%) were not detectably expressed and in 4(6%) cases the mutation was not detectable even though there was ample coverage. It is unclear if the last 6% represent false positives in the genome calls or the preferential expression of the wild-type allele. To interrogate the utility of RNAseq based GEP in myeloma we independently recapitulated many of common GEP measurements. First we independently used the 84 samples to define cutoffs for the implementation of the TC classification method. We compared our independent assignment of the 42 samples with matched gene expression array data, to their existing microarray assignments. This resulted in 40/42 (95%) samples being classified 40(95%) into identical TC classes. The two discordant samples MMRC0312 and MMRC0387 classified as TC class “none” by expression arrays were classified as other classes by RNAseq. MMRC0312 exhibited high CCND3 expression using RNA-Seq and was assigned to ‘6p21’ class. MMRC0387 exhibited elevated CCND1 expression and was classified as ‘D1’ using RNAseq. For the indexes we showed a strong correlation for the proliferation index (R2=0.971) and the NFKB index (R2=0.961) but only a moderate correlation for the 70-gene index (R2=0.761). The decreased correlation in the 70-gene index is clearly due to the large number of probesets used, which are associated with genes that are clearly not expressed by RNA-seq. One additional advantage of RNAseq over microarray based gene expression measurements is the potential to detect fusion transcripts. We have applied fusion transcript detection to this cohort of patients and 69 human myeloma cell lines, which were also screened by RNAseq and WES as part of the MMGI study. The most common fusion transcript detected is the @IGH-MMSET fusion characteristic of t(4;14). The next most common fusion we identified appears to be a promoter replacement event were the highly expressed gene, FCHSD2, is fused to multiple partners including known myeloma related genes, MMSET and MYC, and previously unreported genes in myeloma, CARNS1 and NCF2. Additional structural rearrangements involving FCHSD2 are also predicted based on the high frequency of copy number abnormalities encompassing the 5′ region of this gene as detected by comparative genomic hybridization in the MMGI study. This study should provide the basis for the migration of myeloma based gene expression profiling from microarrays to RNA sequencing based approaches. In the future RNA sequencing has the potential to provide novel classification schemes that leverage the multitude of measurements that can be made from this single assay. Disclosures: Levy: MMRC: Employment.



2006 ◽  
Vol 28 (1) ◽  
pp. 24-32 ◽  
Author(s):  
Tapan S. Mehta ◽  
Stanislav O. Zakharkin ◽  
Gary L. Gadbury ◽  
David B. Allison

Gene expression microarrays have been the vanguard of new analytic approaches in high-dimensional biology. Draft sequences of several genomes coupled with new technologies allow study of the influences and responses of entire genomes rather than isolated genes. This has opened a new realm of highly dimensional biology where questions involve multiplicity at unprecedented scales: thousands of genetic polymorphisms, gene expression levels, protein measurements, genetic sequences, or any combination of these and their interactions. Such situations demand creative approaches to the processes of inference, estimation, prediction, classification, and study design. Although bench scientists intuitively grasp the need for flexibility in the inferential process, the elaboration of formal supporting statistical frameworks is just at the very start. Here, we will discuss some of the unique statistical challenges facing investigators studying high-dimensional biology, describe some approaches being developed by statistical scientists, and offer an epistemological framework for the validation of proffered statistical procedures. A key theme will be the challenge in providing methods that a statistician judges to be sound and a biologist finds informative. The shift from family-wise error rate control to false discovery rate estimation and to assessment of ranking and other forms of stability will be portrayed as illustrative of approaches to this challenge.



2017 ◽  
Vol 32 (5) ◽  
pp. 817 ◽  
Author(s):  
Kyuwhan Jung ◽  
InSong Koh ◽  
Jeong-Hyun Kim ◽  
Hyun Sub Cheong ◽  
Taejin Park ◽  
...  


Blood ◽  
2019 ◽  
Vol 134 (Supplement_1) ◽  
pp. 5337-5337
Author(s):  
Xiangnan Jiang ◽  
Wanhui Yan ◽  
Yifeng Sun ◽  
Qinghua Xu ◽  
Xiaoyan Zhou ◽  
...  

Introduction Diffuse large B-cell lymphoma (DLBCL) is a group of heterogeneous disease with distinct molecular subtypes. The most established subtyping algorithm, the Cell-of-Origin (COO) model, categorizes DLBCL into activated B-cell (ABC) and germinal center B-cell (GCB)-like subgroups through gene expression profiling. COO subtyping is mandatory for every newly diagnosed DLBCL patients, as it is critical for determining the therapeutic and surveillance strategies. We evaluated a newly developed assay using 32-gene expression profiling to determine the COO of DLBCL with formalin-fixed paraffin-embedded (FFPE) tissue. Methods The DLBCL-COO Test is a qPCR-based 32-gene expression assay for COO determination in FFPE samples. Biopsy of DLBCL patients with paired FFPE and fresh tissue were identified to assign COO, based on the immunohistochemistry (IHC) algorithm (Han's algorithm), DLBCL-COO qPCR assay and global gene expression profiling with RNA-seq, respectively. The global gene expression profiling with RNA-seq was taken as the "gold standard" for reference. Clinical information including the survival data were collected. Results 160 cases of DLBCL with evaluable COO assignments with IHC, DLBCL-COO 32-gene assay and global gene expression profiling with RNA-seq were identified. Comparing with the 77.5% concordance between IHC algorithm and gold standard, there is 91.9% concordance between DLBCL-COO 32-gene assay and gold standard (P =0.005). 72 patients assigned as ABC subtype and 14 patients assigned as Type-3 subtype demonstrated a significantly inferior overall survival than 42 patients assigned as GCB subtype using DLBCL-COO assay (P =0.023). However, COO based the IHC algorithm failed to provide the predictive value regarding overall survival (P =0.09). Conclusions DLBCL-COO assay provides flexibility and accuracy in DLBCL subtype characterization. These subtype distinctions should help guide disease prognosis and treatment options within DLBCL clinical practice. Disclosures Sun: Canhelp Genomics: Employment. Xu:Canhelp Genomics: Employment.



Author(s):  
Naoko Yamaguchi ◽  
Junhua Xiao ◽  
Deven Narke ◽  
Devin Shaheen ◽  
Xianming Lin ◽  
...  

Background: Elevated intracardiac pressure due to heart failure induces electrical and structural remodeling in the left atrium (LA) that begets atrial myopathy and arrhythmias. The underlying molecular pathways that drive atrial remodeling during cardiac pressure overload are poorly defined. The purpose of this study is to characterize the response of the ETV1 signaling axis in the LA during cardiac pressure overload in humans and mouse models and explore the role of ETV1 in atrial electrical and structural remodeling. Methods: We performed gene expression profiling in 265 left atrial samples from patients who underwent cardiac surgery. Comparative gene expression profiling was performed between two murine models of cardiac pressure overload, transverse aortic constriction (TAC) banding and Angiotensin II (AngII) infusion, and a genetic model of Etv1 cardiomyocyte-selective knockout ( Etv1 f/f Mlc2a Cre/+ ). Results: Using the Cleveland Clinic biobank of human LA specimens, we found that ETV1 expression is decreased in patients with reduced ejection fraction. Consistent with its role as an important mediator of the Neuregulin-1 (NRG1) signaling pathway and activator of rapid conduction gene programming, we identified a direct correlation between ETV1 expression level and NRG1, ERBB4, SCN5A , and GJA5 levels in human LA samples. In a similar fashion to heart failure patients, we showed that left atrial ETV1 expression is downregulated at the RNA and protein levels in murine pressure overload models. Comparative analysis of LA RNA-seq datasets from TAC and AngII treated mice showed a high Pearson correlation, reflecting a highly ordered process by which the LA undergoes electrical and structural remodeling. Cardiac pressure overload produced a consistent downregulation of ErbB4, Etv1, Scn5a, and Gja5 and upregulation of profibrotic gene programming, which includes Tgfbr1/2, Igf1, and numerous collagen genes. Etv1 f/f Mlc2a Cre/+ mice displayed atrial conduction disease and arrhythmias. Correspondingly, the LA from Etv1 f/f Mlc2a Cre/+ mice showed downregulation of rapid conduction genes and upregulation of profibrotic gene programming, whereas analysis of a gain-of-function ETV1 RNA-seq dataset from neonatal rat ventricular myocytes transduced with Etv1 showed reciprocal changes. Conclusions: ETV1 is downregulated in the LA during cardiac pressure overload, contributing to both electrical and structural remodeling.



2001 ◽  
Vol 11 (7) ◽  
pp. 1256-1261
Author(s):  
Patrick P. Zarrinkar ◽  
James K. Mainquist ◽  
Matthew Zamora ◽  
David Stern ◽  
John B. Welsh ◽  
...  

Gene expression profiling using DNA arrays is rapidly becoming an essential tool for research and drug discovery and may soon play a central role in disease diagnosis. Although it is possible to make significant discoveries on the basis of a relatively small number of expression profiles, the full potential of this technology is best realized through more extensive collections of expression measurements. The generation of large numbers of expression profiles can be a time-consuming and labor-intensive process with current one-at-a-time technology. We have developed the ability to obtain expression profiles in a highly parallel yet straightforward format using glass wafers that contain 49 individual high-density oligonucleotide arrays. This arrays of arrays concept is generalizable and can be adapted readily to other types of arrays, including spotted cDNA microarrays. It is also scalable for use with hundreds and even thousands of smaller arrays on a single piece of glass. Using the arrays of arrays approach and parallel preparation of hybridization samples in 96-well plates, we were able to determine the patterns of gene expression in 27 ovarian carcinomas and 4 normal ovarian tissue samples, along with a number of control samples, in a single experiment. This new approach significantly increases the ease, efficiency, and throughput of microarray-based experiments and makes possible new applications of expression profiling that are currently impractical.



BMC Genomics ◽  
2014 ◽  
Vol 15 (1) ◽  
pp. 1108 ◽  
Author(s):  
Alexander V Tyakht ◽  
Elena N Ilina ◽  
Dmitry G Alexeev ◽  
Dmitry S Ischenko ◽  
Alexey Y Gorbachev ◽  
...  


Sign in / Sign up

Export Citation Format

Share Document