scholarly journals Supervised Hierarchical Autoencoders for Multi-Omics Integration in Cancer Survival Models

2021 ◽  
Author(s):  
David Wissel ◽  
Daniel Rowson ◽  
Valentina Boeva

With the increasing amount of high-throughput sequencing data becoming available, the proper integration of differently sized and heterogeneous molecular and clinical groups of variables has become crucial in cancer survival models. Due to the difficulty of multi-omics integration, the Cox Proportional-Hazards (Cox PH) model using clinical data has remained one of the best-performing methods [Herrmann et al., 2021]. This motivates the need for new models which can successfully perform multi-omics integration in survival models and outperform the Cox PH model. Furthermore, there is a strong need to make multi-omics models more sparse and interpretable to encourage their usage in clinical settings. We developed a neural architecture, termed Supervised Hierarchical Autoencoder (SHAE), based on supervised autoencoders and Sparse-Group-Lasso regularization. Our new method performed competitively with the best performing statistical models used for multi-omics survival analysis. Moreover, it outperformed the Cox PH model using clinical data across all 17 cancers from The Cancer Genome Atlas (TCGA) considered in our work. We further showed that surrogate linear models for SHAE trained on a subset of multi-omics groups achieved competitive performance at consistently high sparsity levels, enabling usage within clinics. Alternatively, surrogate models can act as a feature selection step, permitting improved performance in arbitrary downstream survival models. Code for the reproduction of our results is available on Github.

2020 ◽  
Vol 38 (15_suppl) ◽  
pp. 6082-6082
Author(s):  
Luyang Zhao ◽  
Dai Yibo ◽  
Yuanjin Hu ◽  
Zhiqi Wang ◽  
Chengcheng Li ◽  
...  

6082 Background: Endometrial cancers have been categorized into four genomic classes by The Cancer Genome Atlas Research Network (TCGA) with comprehensive genomic analysis. However, TCGA molecular subtypes are hard to utilize in clinic as the expensive cost and a simply version of POLE, TP53 genes cannot fully differentiate the four subtypes. Therefore, more convenient and reliable biomarkers need to be identified for clinical practice. Methods: Whole-exome sequencing and RNA sequencing data for 515 patients with endometrial carcinomas were downloaded from TCGA. Mutations in 48 genes of homologous recombination repair (HR) signaling were defined as HR mutation. Associations between HR mutation and survival and RNA expression were analyzed.Gene set enrichment analysis (GSEA) were used to invesgate the gene signaling. Results: HR mutation was associated with a prolonged disease specific survival (DSS) (HR, 0.39; 95% CI, 0.22-0.71; P = 0.002), progression-free survival (PFS) (HR, 0.46; 95% CI, 0.31-0.68; P < 0.001) and overall survival (OS) (HR, 0.45; 95% CI, 0.28-0.72; P = 0.001) in endometrial cancers. HR mutation was related with clinical characteristics including histological types (P < 0.05). In the multivariable cox proportional hazards regression model including FIGO 2008, histology types, tumor grade and TCGA subtypes, TP53 mutation, POLE mutation, the association between HR mutation and PFS was still significant (HR, 0.48; 95% CI, 0.27-0.86; P < 0.05), which indicating the HR mutation is an independent prognostic factor for PFS. HR mutations were associated with a higher tumor mutation burden. GSEA suggested that HR mutation was involved with the increase of genes related to activated T cells, immune cytolytic activity, and IFN-γ release. In MSS endometrial cancers, HR mutation still showed a longer PFS (HR, 0.57; 95% CI, 0.34-0.98; P = 0.04), suggested HR mutation may help predict the effect of immunotherapy in MSS endometrial carcinoma. Conclusions: HR mutation was related with a favorable prognosis through increasing T cells signature. Identification of HR mutation by genomic profiling provides a potentially novel and convenient approach for endometrial cancer patients to predict the prognosis independent of TCGA four subtype classifications and provides an inspiration for screening patients who may benefit from ICBs in endometrial cancer in the future.


Biomedicines ◽  
2021 ◽  
Vol 9 (5) ◽  
pp. 453
Author(s):  
Yu-Han Wang ◽  
Shih-Ching Chang ◽  
Muhamad Ansar ◽  
Chin-Sheng Hung ◽  
Ruo-Kai Lin

Colorectal cancer (CRC) arises from chromosomal instability, resulting from aberrant hypermethylation in tumor suppressor genes. This study identified hypermethylated genes in CRC and investigated how they affect clinical outcomes. Methylation levels of specific genes were analyzed from The Cancer Genome Atlas dataset and 20 breast cancer, 16 esophageal cancer, 33 lung cancer, 15 uterine cancer, 504 CRC, and 9 colon polyp tissues and 102 CRC plasma samples from a Taiwanese cohort. In the Asian cohort, Eps15 homology domain-containing protein 3 (EHD3) had twofold higher methylation in 44.4% of patients with colonic polyps, 37.3% of plasma from CRC patients, and 72.6% of CRC tissues, which was connected to vascular invasion and high microsatellite instability. Furthermore, EHD3 hypermethylation was detected in other gastrointestinal cancers. In the Asian CRC cohort, low EHD3 mRNA expression was found in 45.1% of patients and was connected to lymph node metastasis. Multivariate Cox proportional-hazards survival analysis revealed that hypermethylation in women and low mRNA expression were associated with overall survival. In the Western CRC cohort, EHD3 hypermethylation was also connected to overall survival and lower chemotherapy and antimetabolite response rates. In conclusion, EHD3 hypermethylation contributes to the development of CRC in both Asian and Western populations.


2021 ◽  
Vol 11 (8) ◽  
pp. 1306-1312
Author(s):  
Li Song ◽  
Ningchao Du ◽  
Haitao Luo ◽  
Furong Li

This study aimed to identify the association of protein coding and long non coding RNA genes with immunotherapy response in melanoma. Based on RNA sequencing data of melanoma specimens, the expression levels of protein coding and long non coding RNA genes were calculated using the Kallisto RNA-seq quantification method, and differently expressed genes were detected using the DESeq2 method. Cox proportional hazards regression was used to evaluate the effects of gene expression on survival. According to the clinical data of 14 patients with drug response and 11 patients without drug response, 18 protein coding genes and 14 long non coding RNAs showed differential expressions (multiple of difference > 2 and P < 0.01 after correction), among which the coding genes of differential expression were significantly enriched through the process of cell adhesion (P < 0.01). The results of survival analysis showed that 18 coding genes and 14 long non coding RNA genes had significant effects on patient survival (P < 0.01). In this study, magnetic nanoparticles can be used to extract genomic DNA and total RNA due to their paramagnetism and biocompatibility, then transcriptome high-throughput sequencing was performed. The method has the advantages of removing dangerous reagents such as phenol and chloroform, replacing inorganic coating such as silica with organic oil, and shortening reaction time. Protein coding and long non coding RNA genes as well as magnetic nanoparticles may serve as potential cancer immune biomarker targets for developing future oncological treatments.


2019 ◽  
Author(s):  
Wikum Dinalankara ◽  
Qian Ke ◽  
Donald Geman ◽  
Luigi Marchionni

AbstractGiven the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with sample high throughput sequencing data from the Cancer Genome Atlas.


2021 ◽  
Author(s):  
Wei Yan ◽  
Dan-dan Wang ◽  
He-da Zhang ◽  
Jinny Huang ◽  
Jun-Chen Hou ◽  
...  

Abstract Background: The structural maintenance of chromosome (SMC) gene family, comprising 6 members, is involved in a wide spectrum of biological functions in many types of human cancers. However, there is little research on the expression profile and prognostic values of SMC genes in hepatocellular carcinoma (HCC). Based on updated public resources and integrative bioinformatics analysis, we tried to determine the value of SMC gene expression in predicting the risk of developing HCC. Methods and materials: The expression data of SMC family members were obtained from The Cancer Genome Atlas (TCGA). The prognostic values of SMC members and clinical features were identified. A gene set enrichment analysis (GSEA) was conducted to explore the mechanism underlying the involvement of SMC members in liver cancer. The associations between tumor immune infiltrating cells (TIICs) and the SMC family members were evaluated using the Tumor Immune Estimation Resource (TIMER) database. Results: Our analysis demonstrated that mRNA downregulation of SMC genes was common alteration in HCC patients. SMC1A, SMC2, SMC3, SMC4, SMC6 were upregulated in HCC. Upregulation of SMC2, SMC3 and SMC4, along with clinical stage, were associated with a poor HCC prognosis based on the results of univariate and multivariate Cox proportional hazards regression analyses. SMC2, SMC3 and SMC4 are also related to tumor purity and immune infiltration levels of HCC. The GSEA results indicated that SMC members participate in multiple biological processes underlying tumorigenesis. Conclusion: This study comprehensively analyzed the expression of SMC gene family members in patients with HCC. This can provide insights for further investigation of the SMC family members as potential targets in HCC and suggest that the use of SMC inhibitor targeting SMC2, SMC3 and SMC4 may be an effective strategy for HCC therapy.


2018 ◽  
Vol 28 (5) ◽  
pp. 1523-1539
Author(s):  
Simon Bussy ◽  
Agathe Guilloux ◽  
Stéphane Gaïffas ◽  
Anne-Sophie Jannot

We introduce a supervised learning mixture model for censored durations (C-mix) to simultaneously detect subgroups of patients with different prognosis and order them based on their risk. Our method is applicable in a high-dimensional setting, i.e. with a large number of biomedical covariates. Indeed, we penalize the negative log-likelihood by the Elastic-Net, which leads to a sparse parameterization of the model and automatically pinpoints the relevant covariates for the survival prediction. Inference is achieved using an efficient Quasi-Newton Expectation Maximization algorithm, for which we provide convergence properties. The statistical performance of the method is examined on an extensive Monte Carlo simulation study and finally illustrated on three publicly available genetic cancer datasets with high-dimensional covariates. We show that our approach outperforms the state-of-the-art survival models in this context, namely both the CURE and Cox proportional hazards models penalized by the Elastic-Net, in terms of C-index, AUC( t) and survival prediction. Thus, we propose a powerful tool for personalized medicine in cancerology.


2020 ◽  
Vol 49 (4) ◽  
pp. 1366-1377 ◽  
Author(s):  
Xiaoyan Wang ◽  
Rohit P Ojha ◽  
Sonia Partap ◽  
Kimberly J Johnson

Abstract Background Differences in access, delivery and utilisation of health care may impact childhood and adolescent cancer survival. We evaluated whether insurance coverage impacts survival among US children and adolescents with cancer diagnoses, overall and by age group, and explored potential mechanisms. Methods Data from 58 421 children (aged ≤14 years) and adolescents (15–19 years), diagnosed with cancer from 2004 to 2010, were obtained from the National Cancer Database. We examined associations between insurance status at initial diagnosis or treatment and diagnosis stage; any treatment received; and mortality using logistic regression, Cox proportional hazards (PH) regression, restricted mean survival time (RMST) and mediation analyses. Results Relative to privately insured individuals, the hazard of death (all-cause) was increased and survival months were decreased in those with Medicaid [hazard ratio (HR) = 1.27, 95% confidence interval (CI): 1.22 to 1.33; and −1.73 months, 95% CI: −2.07 to −1.38] and no insurance (HR = 1.32, 95% CI: 1.20 to 1.46; and −2.13 months, 95% CI: −2.91 to −1.34). The HR for Medicaid vs. private insurance was larger (pinteraction &lt;0.001) in adolescents (HR = 1.52, 95% CI: 1.41 to 1.64) than children (HR = 1.16, 95% CI: 1.10 to 1.23). Despite statistical evidence violation of the PH assumption, RMST results supported all interpretations. Earlier diagnosis for staged cancers in the Medicaid and uninsured populations accounted for an estimated 13% and 19% of the survival deficit, respectively, vs. the privately insured population. Any treatment received did not account for insurance-associated survival differences in children and adolescents with cancer. Conclusions Children and adolescents without private insurance had a higher risk of death and shorter survival within 5 years following cancer diagnosis. Additional research is needed to understand underlying mechanisms.


Author(s):  
Josje D. Schoufour ◽  
Alyt Oppewal ◽  
Hanne J.K. van der Maarl ◽  
Heidi Hermans ◽  
Heleen M. Evenhuis ◽  
...  

Abstract We studied the association between multimorbidity, polypharmacy, and mortality in 1,050 older adults (50+) with intellectual disability (ID). Multimorbidity (presence of ≥ 4 chronic health conditions) and polypharmacy (presence ≥ 5 chronic medication prescriptions) were collected at baseline. Multimorbidity included a wide range of disorders, including hearing impairment, thyroid dysfunction, autism, and cancer. Mortality data were collected during a 5-year follow-up period. Cox proportional hazards models were used to determine the independent association between multimorbidity and polypharmacy with survival. Models were adjusted for age, sex, level of ID, and the presence of Down syndrome. We observed that people classified as having multimorbidity or polypharmacy at baseline were 2.60 (95% CI = 1.86–3.66) and 2.32 (95% CI = 1.70–3.16) times more likely to decease during the follow-up period, respectively, independent of age, sex, level of ID, and the presence of Down syndrome. Although slightly attenuated, we found similar hazard ratios if the model for multimorbidity was adjusted for polypharmacy and vice versa. We showed for the first time that multimorbidity and polypharmacy are strong predictors for mortality in people with ID. Awareness and screening of these conditions is important to start existing treatments as soon as possible. Future research is required to develop interventions for older people with ID, aiming to reduce the incidence of polypharmacy and multimorbidity.


Neurosurgery ◽  
2019 ◽  
Vol 87 (1) ◽  
pp. 63-70
Author(s):  
Haruhisa Fukuda ◽  
Daisuke Sato ◽  
Yoriko Kato ◽  
Wataro Tsuruta ◽  
Masahiro Katsumata ◽  
...  

Abstract BACKGROUND Flow diverters (FDs) have marked the beginning of innovations in the endovascular treatment of large unruptured intracranial aneurysms, but no multi-institutional studies have been conducted on these devices from both the clinical and economic perspectives. OBJECTIVE To compare retreatment rates and healthcare expenditures between FDs and conventional coiling-based treatments in all eligible cases in Japan. METHODS We identified patients who had undergone endovascular treatments during the study period (October 2015-March 2018) from a national-level claims database. The outcome measures were retreatment rates and 1-yr total healthcare expenditures, which were compared among patients who had undergone FD, coiling, and stent-assisted coiling (SAC) treatments. The coiling and SAC groups were further categorized according to the number of coils used. Retreatment rates were analyzed using Cox proportional hazards models, and total expenditures were analyzed using multilevel mixed-effects generalized linear models. RESULTS The study sample comprised 512 FD patients, 1499 coiling patients, and 711 SAC patients. The coiling groups with ≥10 coils and ≥9 coils had significantly higher retreatment rates than the FD group with hazard ratios of 2.75 (1.30-5.82) and 2.52 (1.24-5.09), respectively. In addition, the coiling group with ≥10 coils and SAC group with ≥10 coils had significantly higher 1-year expenditures than the FD group with cost ratios (95% CI) of 1.30 (1.13-1.49) and 1.31 (1.15-1.50), respectively. CONCLUSION In this national-level study, FDs demonstrated significantly lower retreatment rates and total expenditures than conventional coiling with ≥ 9 coils.


Genetics ◽  
2019 ◽  
Vol 213 (4) ◽  
pp. 1209-1224 ◽  
Author(s):  
Juho A. J. Kontio ◽  
Mikko J. Sillanpää

Gaussian process (GP)-based automatic relevance determination (ARD) is known to be an efficient technique for identifying determinants of gene-by-gene interactions important to trait variation. However, the estimation of GP models is feasible only for low-dimensional datasets (∼200 variables), which severely limits application of the GP-based ARD method for high-throughput sequencing data. In this paper, we provide a nonparametric prescreening method that preserves virtually all the major benefits of the GP-based ARD method and extends its scalability to the typical high-dimensional datasets used in practice. In several simulated test scenarios, the proposed method compared favorably with existing nonparametric dimension reduction/prescreening methods suitable for higher-order interaction searches. As a real-data example, the proposed method was applied to a high-throughput dataset downloaded from the cancer genome atlas (TCGA) with measured expression levels of 16,976 genes (after preprocessing) from patients diagnosed with acute myeloid leukemia.


Sign in / Sign up

Export Citation Format

Share Document