OPTIMIZATION OF BETWEEN GROUP ANALYSIS OF GENE EXPRESSION DISEASE CLASS PREDICTION

BIOMAT 2005 ◽  
2006 ◽  
Author(s):  
FLORENT BATY ◽  
MICHEL P. BIHL ◽  
MARTIN BRUTSCHE ◽  
AEDÍN C. CULHANE ◽  
GUY PERRIÈRE
2015 ◽  
Vol 16 (1) ◽  
Author(s):  
Putri W. Novianti ◽  
Victor L. Jong ◽  
Kit C. B. Roes ◽  
Marinus J. C. Eijkemans

2021 ◽  
Vol 39 (15_suppl) ◽  
pp. 559-559
Author(s):  
Nina D'Abreo ◽  
Abhinav Rohatgi ◽  
Douglas Kanter Marks ◽  
Heather Kling ◽  
Josien Haan ◽  
...  

559 Background: Lymphovascular invasion (LVI), the passage of carcinoma cells through lymphatic and blood vessels, is an important early step in metastasis; however, LVI is excluded from most breast cancer (BC) clinical risk assessments. Previous studies assessed the prognostic value of LVI to estimate clinical outcomes. To gain understanding of the molecular basis of LVI, we evaluated differentially expressed genes (DEGs) between tumors with LVI versus those without LVI, stratified by the 70-gene signature (MammaPrint/MP) and 80-gene molecular subtyping signature (BluePrint/BP). Methods: The prospective, observational FLEX Study (NCT03053193) includes stage I-III BC patients who receive MP/BP testing and consent to full transcriptome and clinical data collection. Patients with LVI (n=581) and without LVI (n=600, randomly selected), enrolled from 2017 to present, were included. LVI was assessed by local pathology laboratories. Differential gene expression analysis of 44k Agilent microarray data was performed with R limma package. DEGs were compared within all samples, BP Luminal subtype, MP risk groups (Low Risk [LR]/Luminal A and High Risk [HR]/Luminal B), and by lymph node (LN) status. DEGs with FDR<0.05 were considered significant. Results: Of tumors with LVI (LVI+), 66% were MP HR; notably, 51% of tumors without LVI (LVI-) were MP HR. LVI was associated with larger T stage, LN involvement, high grade, negative ER status by IHC, and younger patient age (LVI+ vs. LVI-, p<0.05 for all comparisons). Patient ethnicity, obesity, and tumor type did not differ by LVI status; however, prevalence of type 2 diabetes trended higher in patients with LVI+ HR tumors (21%), compared with LVI- HR (15%, p=0.09) and LVI+ LR (11%, p=0.004). There were significant transcriptomic differences between LVI+ and LVI, with most DEGs evident in the Luminal B subset. DEGs in LVI+, LN-negative (LN-) tumors overlapped substantially with the overall Luminal group analysis. Functional enrichment analysis showed dysregulation of cell cycle, extracellular matrix (ECM) organization, cell adhesion, and cytokine receptor pathways. Gene sets related to insulin growth factor pathways were also enriched in LVI+ tumors. Conclusions: DEGs associated with LVI were primarily found in MP HR Luminal, LN-negative tumors; enrichment analysis suggested dysregulation of ECM organization and cell adhesion pathways, consistent with previous reports. DEGs were not associated with LVI presence in LN+ tumors, suggesting that LVI assessment may be less relevant in LN+ breast cancer. Future studies will assess clinical outcomes, as well as LVI-associated gene expression in BP Basal- and HER2-type tumors. However, the current analysis indicates few DEGs in LVI+ MP LR tumors; thus, the potential prognostic information gained from LVI-associated gene expression is likely already captured by the MP and BP signatures. Clinical trial information: NCT03053193.


Author(s):  
Lei Yu ◽  
Huan Liu

The advent of gene expression microarray technology enables the simultaneous measurement of expression levels for thousands or tens of thousands of genes in a single experiment (Schena, et al., 1995). Analysis of gene expression microarray data presents unprecedented opportunities and challenges for data mining in areas such as gene clustering (Eisen, et al., 1998; Tamayo, et al., 1999), sample clustering and class discovery (Alon, et al., 1999; Golub, et al., 1999), sample class prediction (Golub, et al., 1999; Wu, et al., 2003), and gene selection (Xing, Jordan, & Karp, 2001; Yu & Liu, 2004). This article introduces the basic concepts of gene expression microarray data and describes relevant data-mining tasks. It briefly reviews the state-of-the-art methods for each data-mining task and identifies emerging challenges and future research directions in microarray data analysis.


2021 ◽  
Author(s):  
Ali Foroughi pour ◽  
Brian White ◽  
Jonghanne Park ◽  
Todd Sheridan ◽  
Jeffrey Chuang

Abstract Convolutional neural networks (CNNs) are revolutionizing digital pathology by enabling machine learning-based classification of a variety of phenotypes from hematoxylin and eosin (H&E) whole slide images (WSIs), but the interpretation of CNNs remains difficult. Most studies have considered interpretability in a post hoc fashion, e.g. by presenting example regions with strongly predicted class labels. However, such an approach does not explain the biological features that contribute to correct predictions. To address this problem, here we investigate the interpretability of H&E-derived CNN features (the feature weights in the final layer of a transfer-learning-based architecture), which we show can be construed as abstract morphological genes (“mones”) with strong independent associations to biological phenotypes. We observe that many mones are specific to individual cancer types, while others are found in multiple cancers especially from related tissue types. We also observe that mone-mone correlations are strong and robustly preserved across related cancers. Importantly, linear mone-based classifiers can very accurately separate 38 distinct classes (19 tumor types and their adjacent normals, AUC=97.1%±2.8% for each class prediction), and linear classifiers are also highly effective for universal tumor detection (AUC=99.2%±0.12%). This linearity provides evidence that individual mones or correlated mone clusters may be associated with interpretable histopathological features or other patient characteristics. In particular, the statistical similarity of mones to gene expression values allows integrative mone analysis via expression-based bioinformatics approaches. We observe strong correlations between individual mones and individual gene expression values, notably mones associated with collagen gene expression in ovarian cancer. Mone-expression comparisons also indicate that immunoglobulin expression can be identified using mones in colon adenocarcinoma and that immune activity can be identified across multiple cancer types, and we verify these findings by expert histopathological review. Our work demonstrates that mones provide a morphological H&E decomposition that can be effectively associated with diverse phenotypes, analogous to the interpretability of transcription via gene expression values.


2020 ◽  
Vol 21 (13) ◽  
pp. 4629
Author(s):  
Sylwia Szpak-Ulczok ◽  
Aleksandra Pfeifer ◽  
Dagmara Rusinek ◽  
Malgorzata Oczko-Wojciechowska ◽  
Malgorzata Kowalska ◽  
...  

Molecular mechanisms of distant metastases (M1) in papillary thyroid cancer (PTC) are poorly understood. We attempted to analyze the gene expression profile in PTC primary tumors to seek the genes associated with M1 status and characterize their molecular function. One hundred and twenty-three patients, including 36 M1 cases, were subjected to transcriptome oligonucleotide microarray analyses: (set A—U133, set B—HG 1.0 ST) at transcript and gene group level (limma, gene set enrichment analysis (GSEA)). An additional independent set of 63 PTCs, including 9 M1 cases, was used to validate results by qPCR. The analysis on dataset A detected eleven transcripts showing significant differences in expression between metastatic and non-metastatic PTC. These genes were validated on microarray dataset B. The differential expression was positively confirmed for only two genes: IGFBP3, (most significant) and ECM1. However, when analyzed on an independent dataset by qPCR, the IGFBP3 gene showed no differences in expression. Gene group analysis showed differences mainly among immune-related transcripts, indicating the potential influence of tumor immune infiltration or signal within the primary tumor. The differences in gene expression profile between metastatic and non-metastatic PTC, if they exist, are subtle and potentially detectable only in large datasets.


2011 ◽  
pp. 877-884
Author(s):  
Amira Djebbari ◽  
Aedín C. Culhane ◽  
Alice J. Armstrong ◽  
John Quackenbush

Biological systems can be viewed as information management systems, with a basic instruction set stored in each cell’s DNA as “genes.” For most genes, their information is enabled when they are transcribed into RNA which is subsequently translated into the proteins that form much of a cell’s machinery. Although details of the process for individual genes are known, more complex interactions between elements are yet to be discovered. What we do know is that diseases can result if there are changes in the genes themselves, in the proteins they encode, or if RNAs or proteins are made at the wrong time or in the wrong quantities. Recent advances in biotechnology led to the development of DNA microarrays, which quantitatively measure the expression of thousands of genes simultaneously and provide a snapshot of a cell’s response to a particular condition. Finding patterns of gene expression that provide insight into biological endpoints offers great opportunities for revolutionizing diagnostic and prognostic medicine and providing mechanistic insight in data-driven research in the life sciences, an area with a great need for advances, given the urgency associated with diseases. However, microarray data analysis presents a number of challenges, from noisy data to the curse of dimensionality (large number of features, small number of instances) to problems with no clear solutions (e.g. real world mappings of genes to traits or diseases that are not yet known). Finding patterns of gene expression in microarray data poses problems of class discovery, comparison, prediction, and network analysis which are often approached with AI methods. Many of these methods have been successfully applied to microarray data analysis in a variety of applications ranging from clustering of yeast gene expression patterns (Eisen et al., 1998) to classification of different types of leukemia (Golub et al., 1999). Unsupervised learning methods (e.g. hierarchical clustering) explore clusters in data and have been used for class discovery of distinct forms of diffuse large B-cell lymphoma (Alizadeh et al., 2000). Supervised learning methods (e.g. artificial neural networks) utilize a previously determined mapping between biological samples and classes (i.e. labels) to generate models for class prediction. A k-nearest neighbor (k-NN) approach was used to train a gene expression classifier of different forms of brain tumors and its predictions were able to distinguish biopsy samples with different prognosis suggesting that microarray profiles can predict clinical outcome and direct treatment (Nutt et al., 2003). Bayesian networks constructed from microarray data hold promise for elucidating the underlying biological mechanisms of disease (Friedman et al., 2000).


Sign in / Sign up

Export Citation Format

Share Document