Exploratory Factor Analysis of Pathway Copy Number Data with an Application Towards the Integration with Gene Expression Data

2011 ◽  
Vol 18 (5) ◽  
pp. 729-741 ◽  
Author(s):  
Wessel N. Van Wieringen ◽  
Mark A. Van De Wiel
BMC Cancer ◽  
2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Maria Moksnes Bjaanæs ◽  
Gro Nilsen ◽  
Ann Rita Halvorsen ◽  
Hege G. Russnes ◽  
Steinar Solberg ◽  
...  

Abstract Background Genetic alterations are common in non-small cell lung cancer (NSCLC), and DNA mutations and translocations are targets for therapy. Copy number aberrations occur frequently in NSCLC tumors and may influence gene expression and further alter signaling pathways. In this study we aimed to characterize the genomic architecture of NSCLC tumors and to identify genomic differences between tumors stratified by histology and mutation status. Furthermore, we sought to integrate DNA copy number data with mRNA expression to find genes with expression putatively regulated by copy number aberrations and the oncogenic pathways associated with these affected genes. Methods Copy number data were obtained from 190 resected early-stage NSCLC tumors and gene expression data were available from 113 of the adenocarcinomas. Clinical and histopathological data were known, and EGFR-, KRAS- and TP53 mutation status was determined. Allele-specific copy number profiles were calculated using ASCAT, and regional copy number aberration were subsequently obtained and analyzed jointly with the gene expression data. Results The NSCLC tumors tissue displayed overall complex DNA copy number profiles with numerous recurrent aberrations. Despite histological differences, tissue samples from squamous cell carcinomas and adenocarcinomas had remarkably similar copy number patterns. The TP53-mutated lung adenocarcinomas displayed a highly aberrant genome, with significantly altered copy number profiles including gains, losses and focal complex events. The EGFR-mutant lung adenocarcinomas had specific arm-wise aberrations particularly at chromosome7p and 9q. A large number of genes displayed correlation between copy number and expression level, and the PI(3)K-mTOR pathway was highly enriched for such genes. Conclusions The genomic architecture in NSCLC tumors is complex, and particularly TP53-mutated lung adenocarcinomas displayed highly aberrant copy number profiles. We suggest to always include TP53-mutation status when studying copy number aberrations in NSCLC tumors. Copy number may further impact gene expression and alter cellular signaling pathways.


2021 ◽  
Author(s):  
Maria Moksnes Bjaanaes ◽  
Gro Nilsen ◽  
Ann Rita Halvorsen ◽  
Hege G. Russens ◽  
Steinar Solberg ◽  
...  

Abstract Background: Genetic alterations are common in non-small cell lung cancer (NSCLC), and DNA mutations and translocations are targets for therapy. Copy number aberrations occur frequently in NSCLC tumors and may influence gene expression and further alter signaling pathways. In this study we aimed to characterize the genomic architecture of NSCLC tumors and to identify genomic differences between tumors stratified by histology and mutation status. Furthermore, we sought to integrate DNA copy number data with mRNA expression to find genes with expression putatively regulated by copy number aberrations and the oncogenic pathways associated with these affected genes. Methods: Copy number data were obtained from 190 resected early-stage NSCLC tumors and gene expression data were available from 113 of the adenocarcinomas. Clinical and histopathological data were known, and EGFR-, KRAS- and TP53 mutation status was determined. Allele-specific copy number profiles were calculated using ASCAT, and regional copy number aberration were subsequently obtained and analyzed jointly with the gene expression data.Results:The NSCLC tumors tissue displayed overall complex DNA copy number profiles with numerous recurrent aberrations. Despite histological differences, tissue samples from squamous cell carcinomas and adenocarcinomas had remarkably similar copy number patterns. The TP53-mutated lung adenocarcinomas displayed a highly aberrant genome, with significantly altered copy number profiles including gains, losses and focal complex events. The EGFR-mutant lung adenocarcinomas had specific arm-wise aberrations particularly at chromosome7p and 9q. A large number of genes displayed correlation between copy number and expression level, and the PI(3)K-mTOR pathway was highly enriched for such genes.Conclusions: The genomic architecture in NSCLC tumors is complex, and particularly TP53-mutated lung adenocarcinomas displayed highly aberrant copy number profiles. We suggest to always include TP53-mutation status when studying copy number aberrations in NSCLC tumors. Copy number may further impact gene expression and alter cellular signaling pathways.


2012 ◽  
Vol 51 (7) ◽  
pp. 696-706 ◽  
Author(s):  
Marieke L. Kuijjer ◽  
Halfdan Rydbeck ◽  
Stine H. Kresse ◽  
Emilie P. Buddingh ◽  
Ana B. Lid ◽  
...  

Author(s):  
Jayakishan Meher ◽  
Ram Chandra Barik ◽  
Madhab Ranjan Panigrahi ◽  
Saroj Kumar Pradhan ◽  
Gananath Dash

2015 ◽  
Author(s):  
Andrew Anand Brown ◽  
Zhihao Ding ◽  
Ana Viñuela ◽  
Dan Glass ◽  
Leopold Parts ◽  
...  

Statistical factor analysis methods have previously been used to remove noise components from high dimensional data prior to genetic association mapping, and in a guided fashion to summarise biologically relevant sources of variation. Here we show how the derived factors summarising pathway expression can be used to analyse the relationships between expression, heritability and ageing. We used skin gene expression data from 647 twins from the MuTHER Consortium and applied factor analysis to concisely summarise patterns of gene expression, both to remove broad confounding influences and to produce concise pathway-level phenotypes. We derived 930 "pathway phenotypes" which summarised patterns of variation across 186 KEGG pathways (five phenotypes per pathway). We identified 69 significant associations of age with phenotype from 57 distinct KEGG pathways at a stringent Bonferroni threshold (P<5.38E-5). These phenotypes are more heritable (h^2=0.32) than gene expression levels. On average, expression levels of 16% of genes within these pathways are associated with age. Several significant pathways relate to metabolising sugars and fatty acids, others with insulin signalling. We have demonstrated that factor analysis methods combined with biological knowledge can produce more reliable phenotypes with less stochastic noise than the individual gene expression levels, which increases our power to discover biologically relevant associations. These phenotypes could also be applied to discover associations with other environmental factors.


2012 ◽  
Author(s):  
Marieke L. Kuijjer ◽  
Halfdan Rydbeck ◽  
Stine H. Kresse ◽  
Emilie P. Buddingh ◽  
Helene Roelofs ◽  
...  

Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 3465-3465
Author(s):  
Daphne R. Friedman ◽  
Joseph R. Nevins

Abstract Abstract 3465 Background: Chronic lymphocytic leukemia (CLL), aggressive B-cell non-Hodgkin lymphomas (NHL), and multiple myeloma (MM) are B-cell malignancies that display biological and clinical heterogeneity. Current investigations into the genetics and biology of these related disorders are using next generation whole genome or exome sequencing. The relatively high cost of these techniques has driven an experimental design in which a small group of samples are initially studied, specific genetic lesions are identified, and then larger cohorts are evaluated for those specific aberrations. Given the biological heterogeneity that is found in each of these disorders, such an approach could skew the direction of research towards results found in a small subset of patients. To determine the extent of genomic heterogeneity within and similarities between CLL, NHL, and MM, and their biologic and clinical relevance, we evaluated publicly available gene expression and single nucleotide polymorphism (SNP) array data from the NCBI Gene Expression Omnibus. Methods: We analyzed 893, 881, and 1744 unique gene expression data files that represent CLL, NHL, and MM, respectively. The gene expression data files represented 15, 11, and 10 distinct data sets, respectively. Prognostic, clinical outcome, and copy number variation data were available for a subset of the samples from each malignancy. Gene expression data were initially normalized using RMA and MAS5 algorithms and batch effect was eliminated using Bayesian Factor Regression Modeling. SNP array data were normalized using Chromosome Copy Number Analysis Tool and amplifications and deletions were identified with circular binary segmentation. Analyses were carried out using Bioconductor packages and the statistical environment R. Results: After elimination of batch effect, we evaluated the data using random subsampling and unsupervised hierarchical clustering to determine the lowest number of samples required to capture genomic heterogeneity. For CLL and NHL, there was no plateau reached for the number of groups defined by hierarchical clustering up through the total number of samples, indicating that a larger number of samples than available in this study are needed to fully document biological and genomic variability. For MM, there was a plateau reached at approximately 1200 samples. We then used unsupervised hierarchical clustering of the entire dataset for each malignancy to define groups of CLL, NHL, and MM based on their raw gene expression data. To evaluate the biological meaning of the groups defined by this process, we used tools such as Gene Set Enrichment Analysis (GSEA) and oncogenic pathway predictions (ScoreSignatures). Groups within each malignancy that were defined using raw gene expression data had differences in biological pathways involving receptor signaling, cell cycle, and stem cell properties. Notably, similarities in biological annotation were seen between groups that represented the different malignancies. Although prognostic data was not available for all the datasets, there appeared to be no differences in clinical prognostic markers between the genomic-defined groups. However, there were statistically significant differences in molecular prognostic data between these groups. In addition, specific regions of DNA copy number variation were enriched within the different genomic-defined groups. Together, these data highlight the biologic distinctions between groups that are defined by raw gene expression data. For datasets in which clinical outcome data were available, we found that genomic-defined groups had different outcomes such as time to first therapy or overall survival. However, the groups did not appear to predict response to chemotherapy or chemo-immunotherapy. Conclusions: CLL, NHL, and MM are heterogeneous malignancies, and very large numbers of patients must be studied to fully capture the genomic and biologic diversity that is present. Despite this limitation, evaluation of existing data reveals subgroups of these disorders are defined by their underlying biology, demonstrate overlap in biological processes, and are clinically relevant. These results have implications on future “omics” related research. Disclosures: No relevant conflicts of interest to declare.


Sign in / Sign up

Export Citation Format

Share Document