Prediction of treatment (tx)-induced fatigue in breast cancer (BC) patients (pts) using machine learning on genome-wide association (GWAS) data in the prospective CANTO cohort.

2019 ◽  
Vol 37 (15_suppl) ◽  
pp. 11515-11515
Author(s):  
Sangkyu Lee ◽  
Joseph O. Deasy ◽  
Jung Hun Oh ◽  
Antonio Di Meglio ◽  
Sandrine Boyault ◽  
...  

11515 Background: Many BC survivors report fatigue. The relevant genomic correlates of fatigue after BC are not well understood. We applied a previously validated machine learning methodology (Oh 2017) to GWAS data to identify biological correlates of fatigue induced after tx. Methods: We analyzed 3825 BC pts with GWAS data (Illumina InfiniumExome24 v 1.1) from the CANTO study (NCT01993498). The outcome of this study was post-tx fatigue 1 year after the end of primary chemotherapy/radiotherapy/surgery using the EORTC C30 fatigue subscale (overall fatigue) and the EORTC FA 12 fatigue domains (physical/emotional/cognitive). For each domain, we limited the study group to those with zero baseline fatigue and defined severe fatigue change as score increase above the third quartile. We tested univariate correlations between severe fatigue in each domain and 496539 SNPs as well as relevant clinical variables. The machine learning prediction model based on preconditioning random forest regression (PRFR) (Oh et al., 2017), was then built using the SNPs with ancestry adjusted univariate p-value < 0.001 and clinical variables with Bonferroni adjusted p-value < 0.05. The model was validated in a holdout subset of the cohort. Gene set enrichment analysis (GSEA) was performed using MetaCore to identify key biological correlates relevant to tx-induced fatigue. Results: Distinct results were found by fatigue domain (table). GSEA showed that the cognitive fatigue model SNPs included biomarkers for cognitive disorders (p = 1.6 x 10-12) and glutamatergic synaptic transmission (p = 1.6 x 10-8). Conclusions: A SNP based model had differential performance by fatigue domain, with a potential genetic role on risk and biology for tx induced cognitive fatigue. Further research to explore biomarkers of tx induced fatigue are needed. [Table: see text]

2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Chao Guo ◽  
Ya-yue Gao ◽  
Qian-qian Ju ◽  
Min Wang ◽  
Chun-xia Zhang ◽  
...  

Abstract Background The transcriptomic signature has not been fully elucidated in PV, as well as mRNA markers for clinical variables (thrombosis, leukemic transformation, survival, etc.). We attempted to reveal and validate crucial co-expression modules and marker mRNAs correlating with polycythemia vera (PV) by weighted gene co-expression network analysis (WGCNA). Material and methods The GSE57793/26014/61629 datasets were downloaded from Gene Expression Omnibus (GEO) database and integrated into one fused dataset. By R software and ‘WGCNA’ package, the PV-specific co-expression module was identified, the pathway enrichment profile of which was obtained by over-representation analysis (ORA). Protein–protein interaction (PPI) network and hub gene analysis identified MAPK14 as our target gene. Then the distribution of MAPK14 expression in different disease/mutation types, were depicted based on external independent datasets. Genome-scale correlation analysis revealed the association of MAPK14 and JAK/STAT family genes. Then gene set enrichment analysis (GSEA) was performed to detect the activated and suppressed pathways associating with MAPK14 expression. Moreover, GSE47018 dataset was utilized to compare clinical variables (thrombosis, leukemic transformation, survival, etc.) between MAPK14-high and MAPK14-low groups. Results An integrated dataset including 177 samples (83 PV, 35 ET, 17 PMF and 42 normal donors) were inputted into WGCNA. The ‘tan’ module was identified as the PV-specific module (R2 = 0.56, p = 8e−16), the genes of which were dominantly enriched in pro-inflammatory pathways (Toll-like receptor (TLR)/TNF signaling, etc.). MAPK14 is identified as the top hub gene in PV-related PPI network with the highest betweenness. External datasets validated that the MAPK14 expression was significantly higher in PV than that of essential thrombocytosis (ET)/primary myelofibrosis (PMF) patients and normal donors. JAK2 homozygous mutation carriers have higher level of MAPK14 than that of other mutation types. The expression of JAK/STAT family genes significantly correlated with MAPK14, which also contributed to the activation of oxidated phosphorylation, interferon-alpha (IFNα) response and PI3K-Akt-mTOR signaling, etc. Moreover, MAPK14-high group have more adverse clinical outcomes (splenectomy, thrombosis, disease aggressiveness) and inferior survival than MAPK14-low group. Conclusion MAPK14 over-expression was identified as a transcriptomic feature of PV, which was also related to inferior clinical outcomes. The results provided novel insights for biomarkers and therapeutic targets for PV.


2016 ◽  
Vol 36 (suppl_1) ◽  
Author(s):  
James T McParland ◽  
Evanthia Pashos ◽  
Daniel J Rader ◽  
Marina Cuchel

Aim: The ATP-binding cassette transporter 1 (ABCA1) is a membrane protein well known for its role in cholesterol efflux and HDL formation. Recently, ABCA1 has been implicated as playing a key role in other processes, such as insulin secretion and inflammatory response. We sought to further investigate these potential roles through a quantitative proteomics approach. Specifically, we hypothesized that we could detect differential protein signatures in the plasma of Tangier patients that correspond to pathways involved in diabetes and inflammation. Methods: We used SOMAscan® technology (SomaLogic, Boulder, CO, USA) to analyze plasma collected from 5 Tangier disease patients (homozygotes or compound heterozygotes for functional ABCA1 mutations) and 7 normolipidemic controls. We tested for differences in the levels of approximately 1,000 plasma proteins using a nonparametric test (KS). We then performed Ingenuity Canonical Pathway analysis to examine if proteins linked to diabetes and inflammation pathways were significantly more likely to be differentially abundant in the plasma. We corroborated the results using Gene Set Enrichment Analysis (GSEA). Results: We found an enrichment in differentially abundant proteins involved in type II diabetes mellitus signaling (p-value=0.0002) and inflammatory pathways, such as granulocyte adhesion and diapedesis (p-value=2.2*10 -12 ). These results were also corroborated by GSEA, where gene sets corresponding to GO biological processes such as immune response (p-value=0.008) and inflammatory response (p-value=0.032) ranked at the top of the enrichment results. Conclusions: The results from this pilot study support the concept that ABCA1 is implicated in pathways affecting immune and inflammatory response and type II diabetes.


Blood ◽  
2016 ◽  
Vol 128 (22) ◽  
pp. 355-355 ◽  
Author(s):  
Kristine Misund ◽  
Niamh Keane ◽  
Yan W Asmann ◽  
Scott Van Wier ◽  
Daniel Riggs ◽  
...  

Abstract MYC expression is frequently dysregulated in multiple myeloma (MM). In one comprehensive study, MYC structural variations (SV) were found in nearly half of MM cases (Affer et al. Leukemia 2014). The prevalence was higher in hyperdiploid (HRD) tumors (65%) compared to non-hyperdiploid (NHRD) tumors (36%). The large amount of tumor DNA required for all of the genomic studies performed may have biased the samples analyzed (e.g., to those with higher tumor burdens). To validate the findings of recurrent MYC SV in another dataset, we analyzed the CoMMpass data. We analyzed long-insert whole genome sequencing (WGS) data from diagnostic samples in 420 patients from the IA7 release (dbGAP phs000748) for SV. The results of clinical data, processed WES (SNV), RNASeq (gene expression) and WGS (copy number) data from IA8 (http://research.themmrf.org) were used to calculate survival, RAS and NFKB pathway mutation (WES), TC class (RNA), NFKB index (RNA) and hyperdiploid index (WGS). MYC SVs were identified in 38% of tumors. They were present in 53% HRD and 28% NHRD, and by TC in 55% D1, 43% D2, 36% MMSET, 26% MAF, 13% CCND. Juxtaposition of an Ig enhancer (IgH, IgK, IgL) close to MYC was the most common MYC SV, representing ~40% of the MYC SV. Other enhancers identified have mostly been reported previously, with the most frequent being NSMCE2 (12% of SV) and TXNDC5 (5% of SV). Intrachromosomal SV (deletions, inversions, duplications) not associated with any known enhancer were also frequent (18% of SV). As expected, MYC expression was higher in tumors with MYC SV compared to those without (~2.4 fold, p-value<0.0001). We used Gene Set Enrichment Analysis (GSEA) to identify activated pathways that might substitute for MYC in HRD patients without MYC SV and observed significant activation of the NFkB pathway. Interestingly, examining patients with RAS/MAPK pathway (NRAS, KRAS, BRAF, FGFR3 - abbreviated RAS) mutations (identified in 50% of all patients) the same pattern was observed: in the absence of RAS mutation, there was a significantly higher NFkB index. In general, the HRD tumors seem to have either activation of RAS and/or MYC, or activation of NFkB (Figure). There was no difference in overall survival in patients with versus those without MYC SV. We developed a clinically applicable sequencing platform to identify MYC SV, which cannot be reliably identified by FISH. We sought to validate this targeted capture approach, where in addition to the coding exons of 81 interesting genes described previously (M3P), we also pulled down the region surrounding MYC (2 Mb), IgH (0.5 Mb), IgK (50 kb) and IgL (100 kb) allowing us to additionally identify SV in MYC and Ig loci. Using this approach we identified IgH translocations in 29/30 samples with translocations previously identified by FISH (97%). Moreover, we identified MYC SV in 19/22 patients with SV previously identified by mate-pair WGS (86%). Importantly, sequencing identified the precise translocation breakpoint, and identity of the enhancer dysregulating MYC, which may be important variables. In one informative patient two different MYC SV were present at diagnosis, only one of which was still present following a partial response to four cycles of lenalidomide and dexamethasone. This suggests that the two MYC SV are in different subclones, one of which was much more sensitive to the treatment. Interestingly, the enhancer dysregulating MYC in the sensitive subclone harbors 5 strong Ikaros binding sites identified by ChIPseq, suggesting one intriguing mechanism for sensitivity to lenalidomide. To summarize, we verified in a large dataset that MYC expression is frequently dysregulated by SV in MM (38%), and the RAS/MAPK (50%) and NFKB (23%) pathways are frequently activated by mutations. Surprisingly, given their generally good prognosis, nearly half of HRD tumors seem to be MYC driven, while this is true for only a quarter of NHRD tumors. HRD tumors not driven by MYC or RAS appear to be driven by NFKB. Remarkably, the pathways most commonly activated by mutation in MM: CCND, MYC, RAS, NFKB are common to many cancers and have been studied extensively individually. To understand their clinical impact in MM we have developed a comprehensive custom capture sequencing panel that identified 97% of IgH translocations, 86% of MYC SV, and as well as SNV and CNV of 81 recurrently mutated genes. It will be important to include such a comprehensive genetic analysis to complement clinical trials in the future. Figure. Figure. Disclosures Stewart: Bristol Myers Squibb: Consultancy; Celgene: Consultancy; Takeda Oncology: Consultancy; Janssen Pharmaceuticals: Consultancy.


2020 ◽  
Vol 40 (6) ◽  
Author(s):  
Xi-Juan Zhang ◽  
Zhong-Hua Cui ◽  
Yan Dong ◽  
Xiu-Wen Liang ◽  
Yan-Xin Zhao ◽  
...  

Abstract Osteoporosis (OP) is significant and debilitating comorbidity of chronic obstructive pulmonary disease (COPD). We hypothesize that genetic variance identified with OP may also play roles in COPD. We have conducted a large-scale relation data analysis to explore the genes implicated with either OP or COPD, or both. Each gene linked to OP but not to COPD was further explored in a mega-analysis and partial mega-analysis of 15 independently collected COPD RNA expression datasets, followed by gene set enrichment analysis (GSEA) and literature-based pathway analysis to explore their functional linked to COPD. A multiple linear regression (MLR) model was built to study the possible influence of sample size, population region, and study date on the gene expression data in COPD. At the first step of the analysis, we have identified 918 genes associated with COPD, 581 with OP, and a significant overlap (P&lt;2.30e-140; 210 overlapped genes). Partial mega-analysis showed that, one OP gene, GPNMB presented significantly increased expression in COPD patients (P-value = 0.0018; log fold change = 0.83). GPNMB was enriched in multiple COPD pathways and plays roles as a gene hub formulating multiple vicious COPD pathways included gene MMP9 and MYC. GPNMB could be a novel gene that plays roles in both COPD and OP. Partial mega-analysis is valuable in identify case-specific genes for COPD.


2021 ◽  
Vol 22 (3) ◽  
pp. 1423
Author(s):  
Yuri D’Alessandra ◽  
Mattia Chiesa ◽  
Vera Vigorelli ◽  
Veronica Ricci ◽  
Erica Rurali ◽  
...  

Hematopoietic stem/progenitor cells (HSPCs) participate in cardiovascular (CV) homeostasis and generate different types of blood cells including lymphoid and myeloid cells. Diabetes mellitus (DM) is characterized by chronic increase of pro-inflammatory mediators, which play an important role in the development of CV disease, and increased susceptibility to infections. Here, we aimed to evaluate the impact of DM on the transcriptional profile of HSPCs derived from bone marrow (BM). Total RNA of BM-derived CD34+ stem cells purified from sternal biopsies of patients undergoing coronary bypass surgery with or without DM (CAD and CAD-DM patients) was sequenced. The results evidenced 10566 expressed genes whose 79% were protein-coding genes, and 21% non-coding RNA. We identified 139 differentially expressed genes (p-value < 0.05 and |log2 FC| > 0.5) between the two comparing groups of CAD and CAD-DM patients. Gene Set Enrichment Analysis (GSEA), based on Gene Ontology biological processes (GO-BP) terms, led to the identification of fourteen overrepresented biological categories in CAD-DM samples. Most of the biological processes were related to lymphocyte activation, chemotaxis, peptidase activity, and innate immune response. Specifically, HSPCs from CAD-DM patients displayed reduced expression of genes coding for proteins regulating antibacterial and antivirus host defense as well as macrophage differentiation and lymphocyte emigration, proliferation, and differentiation. However, within the same biological processes, a consistent number of inflammatory genes coding for chemokines and cytokines were up-regulated. Our findings suggest that DM induces transcriptional alterations in HSPCs, which are potentially responsible of progeny dysfunction.


Genes ◽  
2019 ◽  
Vol 10 (12) ◽  
pp. 948
Author(s):  
Maria Oczkowicz ◽  
Tomasz Szmatoła ◽  
Małgorzata Świątkiewicz

It has been known for many years that excessive consumption of saturated fats has proatherogenic properties, contrary to unsaturated fats. However, the molecular mechanism covering these effects is not fully understood. In this paper, we aimed to identify differentially expressed genes (DEGs) using RNA-sequencing, following feeding pigs with different sources of fat. After comparison of adipose samples from three dietary groups (rapeseed oil (n = 6), beef tallow (n = 5), coconut oil (n = 5)), we identified 29 DEGs (adjusted p-value < 0.05, fold change > 1.3) between beef tallow and rapeseed oil and 2 genes between coconut oil and rapeseed oil groups. No differentially expressed genes were observed between coconut oil and beef tallow groups. Almost all 29 DEGs between rapeseed oil and beef tallow groups are connected to neurodegenerative, cardiovascular diseases, or cancer (e.g., PLAU, CYBB, NCF2, ZNF217, CHAC1, CTCFL). Functional analysis of these genes revealed that they are associated with fluid shear stress response, complement and coagulation cascade, ROS signaling, neurogenesis, and regulation of protein binding and protein catabolic processes. Furthermore, gene set enrichment analysis (GSEA) of the whole datasets from all three comparisons suggests that both beef tallow and coconut oil may trigger changes in the expression level of genes crucial in the pathogenesis of civilization diseases.


Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 3448-3448
Author(s):  
Harumi Kato ◽  
Kazuhito Yamamoto ◽  
Kennosuke Karube ◽  
Miyuki Katayama ◽  
Shinobu Tsuzuki ◽  
...  

Abstract Abstract 3448 Age-related EBV-associated B-cell lymphoproliferative disorder (AR-EBLPD) is classified as a subtype of diffuse large cell lymphoma (DLBCL) according to the WHO classification. However, molecular genetic characterization of AR-EBLPD remains largely unknown. We studied expression profiles of 5 AR-EBLPD and 8 EB-negative DLBCL samples using the Agilent 44K human oligonucleotide microarray. Total RNA was extracted from fresh-frozen tumor samples. Each microarray slide was converted into datasets using the Agilent Micro Array Scanner and Feature extractions. Data was standardized with Z-scores. Differences in mRNA expression levels between two sample groups were calculated using a two-sided t-test. A total of 1973 probes showed a p-value less than 0.05 with less than a 25% false discovery rate (FDR). These probes included 1688 genes. The number of probes showing high expression in AR-EBLPD and EB-negative DLBCL was 804 (693 genes) and 1169 (995 genes), respectively. First, we selected the top 300 differentially expressed genes. Genes highly expressed in AR-EBLPD included IL6, TNFAIP3, HOPX, and SLAMF1. IL6 is known as a gene encoding a cytokine which functions in inflammation and the maturation of B lymphocytes, and TNFAIP3 is known as a negative regulatory gene of the NF-kB pathway. HOPX and SLAMF1 are reported as genes related to lymphocyte function or the immune system (Schwartzberg et al. Nature immunology 2009, Hawiger et al. Nature immunology 2011). For better characterization, we next performed Gene Ontology Analysis using the WEB-based GEne SeT AnaLysis Toolkit and found that categories of external stimulus and inflammatory responses were enriched in AR-EBLPD. The Kyoto Encyclopedia of Genes and Genomes (KEGG)-signaling analyses showed that pathways of the NOD-like receptor (p-value =1.30e-06), JAK-STAT (p-value =9.01e-06), and Toll-like receptor (p-value =0.0002) were characteristic of AR-EBLPD. These results implied that inflammation would be prominent in AR-EBLPD cases. For validation, we next performed Gene Set Enrichment Analysis (GSEA) using all the database of KEGG pathways (186 gene sets). Dominant gene sets in AR-EBLPD included the cytokine-cytokine receptor interaction [Normalized Enrichment Score (NES) =2.66, p-value<0.001], NOD-like receptor pathway (NES =2.26, p-value<0.001), TOLL-like receptor pathway (NES =2.14, p-value<0.001), and JAK-STAT pathway (NES =1.79, p-value<0.001). Since all the pathways were related to the NF-kB pathway, inflammatory responses were suggested to activate the NF-kB pathway or vice versa. For confirmation, we finally performed GSEA using gene sets of the NF-kB pathway, which were obtained from a gene set reported by an NIH group (Puente et al. Nature 2011) and 30 gene sets in the GSEA database, and found that the gene sets of the NF-kB pathway were enriched in AR-EBLPD (Figure 1). Our results suggested that the inflammatory and immune-related genes were enriched in AR-EBLPD and that activation of the genes may be associated with NF-kB activation. Aberrant immune and inflammatory responses could define the clinical presentations of AR-EBLPD cases. (Figure 1) Gene Set Enrichment Analysis of 5 AR-EBLPD and 8 EB-negative DLBCL samples. The NF-kB signature reported from an NIH group (Puente et al. Nature 2011) was enriched in AR-EBLPD [Normalized Enrichment Score (NES) =2.20, p-value<0.001]. Disclosures: No relevant conflicts of interest to declare.


2019 ◽  
Vol 2019 ◽  
pp. 1-10
Author(s):  
Jayeon Lim ◽  
SoYoun Bang ◽  
Jiyeon Kim ◽  
Cheolyong Park ◽  
JunSang Cho ◽  
...  

As a large amount of genetic data are accumulated, an effective analytical method and a significant interpretation are required. Recently, various methods of machine learning have emerged to process genetic data. In addition, machine learning analysis tools using statistical models have been proposed. In this study, we propose adding an integrated layer to the deep learning structure, which would enable the effective analysis of genetic data and the discovery of significant biomarkers of diseases. We conducted a simulation study in order to compare the proposed method with metalogistic regression and meta-SVM methods. The objective function with lasso penalty is used for parameter estimation, and the Youden J index is used for model comparison. The simulation results indicate that the proposed method is more robust for the variance of the data than metalogistic regression and meta-SVM methods. We also conducted real data (breast cancer data (TCGA)) analysis. Based on the results of gene set enrichment analysis, we obtained that TCGA multiple omics data involve significantly enriched pathways which contain information related to breast cancer. Therefore, it is expected that the proposed method will be helpful to discover biomarkers.


2020 ◽  
Vol 48 (10) ◽  
pp. 030006052096127
Author(s):  
Zuo Zhou ◽  
Bing Wang

Objective To identify male infertility-related long non-coding (lnc)RNAs and an lncRNA-related competing endogenous (ce)RNA network. Methods Expression data including 13 normospermic and eight teratozoospermic samples from postmortem donors were downloaded from the GEO database (GSE6872). The limma R package was used to discriminate dysregulated lncRNA and micro (m)RNA profiles. Gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses of differentially expressed (DE) mRNAs were performed using the clusterProfiler R package. The ceRNA network of dysregulated genes was visualized by Cytoscape. Results A total of 101 DE lncRNAs and 1722 mRNAs were identified as male infertility-specific RNAs with thresholds of |log2FoldChange| >2.0 and adjusted P-value <0.05. GO and KEGG pathways were analyzed for DE mRNAs. Gene set enrichment analysis revealed that DE genes were enriched in embryonic skeletal system development and cytokine–cytokine receptor interactions. A ceRNA network was constructed with 26 key lncRNAs, 33 microRNAs, and 133 mRNAs. DE lncRNAs in male sterility were mainly associated with transferring phosphorus-containing groups and complexes of histone methyltransferases, methyltransferases, PcG proteins, and serine/threonine protein kinases. Conclusion This provides a novel perspective to study lncRNA-related ceRNA networks in male infertility and assist in identifying new potential biomarkers for diagnostic purposes.


2021 ◽  
Author(s):  
Jing Tang ◽  
Hongqaun Ye ◽  
Wan qi

Abstract Background: Tumor infiltration, is known to associate with various cancer initiations and progressions, is potential therapeutic target for this aggressive skin cancer.Methods: single sample gene set enrichment analysis (ssGSEA) algorithm was applied to assess the relative expression of 24 types of immune cell from public database. Firstly, the differentially expressed immune cells between melanomas and normal samples were identified. Next, multiple machine learning algorithms were performed to evaluate the efficiency of immune cells in diagnosis of melanoma. In addition, the feature selection in machine learning methods was used to figure out the most important prognostic immune cells for developing biomarker to predict the prognosis of melanoma.Results: In comparison with the expression of immune cells in tumors and normal controls, we built the immune diagnostic models in training dataset, which can accurately classify melanoma patients from normal (LR AUC= 0.965, RF AUC= 0.99, SVM AUC=0.963, LASSO AUC= 0.964 and NNET AUC=0.989). These diagnostic models also validated in three outside datasets and suggested over 90% sensitivity and specificity to distinguish melanomas from normal patients. Moreover, we also developed a robust immune cell biomarker which could estimate the prognosis of melanoma. This biomarker also further validated in internal and external datasets. Next, we constructed nomogram combined risk score of biomarker and clinical characteristics, which showed good accuracies in predicting 3 and 5 years’ survival. The decision curve of nomogram model manifested a higher net benefit than tumor stage. In addition, melanoma patients divided into high and low risk subgroups by applied risk score system. The high risk group have a significantly shorter survival time than the low risk subgroup. Gene Set Enrichment Analysis (GSEA) analysis revealed that complement, epithelial mesenchymal transition and inflammatory response and so on significantly activated in high risk group. Conclusions: We constructed immune cell related diagnostic and prognostic models, which could provide new clinical applications for diagnosing and predicting the survival of melanoma patients.


Sign in / Sign up

Export Citation Format

Share Document