scholarly journals Pan-cancer machine learning predictors of primary site of origin and molecular subtype

2018 ◽  
Author(s):  
William F. Flynn ◽  
Sandeep Namburi ◽  
Carolyn A. Paisie ◽  
Honey V. Reddi ◽  
Sheng Li ◽  
...  

ABSTRACTBackgroundIt is estimated by the American Cancer Society that approximately 5% of all metastatic tumors have no defined primary site (tissue) of origin and are classified as cancers of unknown primary (CUPs). The current standard of care for CUP patients depends on immunohistochemistry (IHC) based approaches to identify the primary site. The addition of post-mortem evaluation to IHC based tests helps to reveal the identity of the primary site for only 25% of the CUPs, emphasizing the acute need for better methods of determination of the site of origin. CUP patients are therefore given generic chemotherapeutic agents resulting in poor prognosis. When the tissue of origin is known, patients can be given site specific therapy with significant improvement in clinical outcome. Similarly, identifying the primary site of origin of metastatic cancer is of great importance for designing treatment.Identification of the primary site of origin is an import first step but may not be sufficient information for optimal treatment of the patient. Recent studies, primarily from The Cancer Genome Atlas (TCGA) project, and others, have revealed molecular subtypes in several cancer types with distinct clinical outcome. The molecular subtype captures the fundamental mechanisms driving the cancer and provides information that is essential for the optimal treatment of a cancer. Thus, along with primary site of origin, molecular subtype of a tumor is emerging as a criterion for personalized medicine and patient entry into clinical trials.However, there is no comprehensive toolset available for precise identification of tissue of origin or molecular subtype for precision medicine and translational research.Methods and FindingsWe posited that metastatic tumors will harbor the gene expression profiles of the primary site of origin of the cancer. Therefore, we decided to learn the molecular characteristics of the primary tumors using the large number of cancer genome profiles available from the TCGA project. Our predictors were trained for 33 cancer types and for the 11 cancers where there are established molecular subtypes. We estimated the accuracy of several machine learning models using cross-validation methods. The extensive testing using independent test sets revealed that the predictors had a median sensitivity and specificity of 97.2% and 99.9% respectively without losing classification of any tumor. Subtype classifiers achieved median sensitivity of 87.7% and specificity of 94.5% via cross validation and presented median sensitivity of 79.6% and specificity of 94.6% in two external datasets of 1,999 total samples. Importantly, these external data shows that our classifiers can robustly predict the primary site of origin from external microarray data, metastatic cancer data, and patient-derived xenograft (PDX) data.ConclusionWe have demonstrated the utility of gene expression profiles to solve the important clinical challenge of identifying the primary site of origin and the molecular subtype of cancers based on machine learning algorithms. We show, for the first time to our knowledge, that our pan-cancer classifiers can predict multiple cancers’ primary site of origin from metastatic samples. The predictors will be made available as open source software, freely available for academic non-commercial use.

Cancers ◽  
2021 ◽  
Vol 13 (15) ◽  
pp. 3768
Author(s):  
Vijayachitra Modhukur ◽  
Shakshi Sharma ◽  
Mainak Mondal ◽  
Ankita Lawarde ◽  
Keiu Kask ◽  
...  

Metastatic cancers account for up to 90% of cancer-related deaths. The clear differentiation of metastatic cancers from primary cancers is crucial for cancer type identification and developing targeted treatment for each cancer type. DNA methylation patterns are suggested to be an intriguing target for cancer prediction and are also considered to be an important mediator for the transition to metastatic cancer. In the present study, we used 24 cancer types and 9303 methylome samples downloaded from publicly available data repositories, including The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO). We constructed machine learning classifiers to discriminate metastatic, primary, and non-cancerous methylome samples. We applied support vector machines (SVM), Naive Bayes (NB), extreme gradient boosting (XGBoost), and random forest (RF) machine learning models to classify the cancer types based on their tissue of origin. RF outperformed the other classifiers, with an average accuracy of 99%. Moreover, we applied local interpretable model-agnostic explanations (LIME) to explain important methylation biomarkers to classify cancer types.


2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Julien Laffaire ◽  
Anna Luisa Di Stefano ◽  
Olivier Chinot ◽  
Ahmed Idbaih ◽  
Jaime Gallego Perez-Larraya ◽  
...  

Background. We performed a retrospective study to assess whether the initial molecular characteristics of glioblastomas (GBMs) were associated with the response to the bevacizumab/irinotecan chemotherapy regimen given at recurrence.Results. Comparison of the genomic and gene expression profiles of the responders (n=12) and nonresponders (n=13) demonstrated only slight differences and could not identify any robust biomarkers associated with the response. In contrast, a significant association was observed between GBMs molecular subtypes and response rates. GBMs assigned to molecular subtype IGS-18 and to classical subtype had a lower response rate than those assigned to other subtypes. In an independent series of 33 patients, neither EGFR amplification nor CDKN2A deletion (which are frequent in IGS-18 and classical GBMs) was significantly associated with the response rate, suggesting that these two alterations are unlikely to explain the lower response rate of these GBMs molecular subtypes.Conclusion. Despite its limited sample size, the present study suggests that comparing the initial molecular profiles of responders and nonresponders might not be an effective strategy to identify biomarkers of the response to bevacizumab given at recurrence. Yet it suggests that the response rate might differ among GBMs molecular subtypes.


2021 ◽  
Author(s):  
Qian Yan ◽  
Baoqian Ye ◽  
Boqing Wang ◽  
Wenjiang Zheng ◽  
Xiongwen Wang

Abstract The purpose of this study is to analyze the DNA methylation and gene expression profiles of immune-related CpG sites to identify the molecular subtypes and CpG sites related to the prognosis of HCC. In this study, the DNA methylation and gene expression datasets were downloaded from The Cancer Genome Atlas database, together with immune-related genes downloaded from the immunology database and analysis portal database to explore the prognostic molecular subtypes of HCC. By performing consistent clustering analysis on 830 immune-related CpG sites, we identified seven subgroups with significant differences in overall survival. Finally, 16 classifiers of immune-related CpG sites were constructed and used in the testing set to verify the prognosis of DNA methylation subgroups, and the results were consistent with the training set. Using the TIMER database, we analyzed 16 immune-related CpG sites expression with the abundance of six types of immune infiltrating cells and found that most are positively correlated with the level of infiltration of multiple immune cells in HCC. This study screened potential immune-related prognostic methylation sites and established a new prognosis model of HCC based on DNA methylation molecular subtype, which may help in the early diagnosis of HCC and developing more effective personalized treatments.


Author(s):  
ShiJian Ding ◽  
Hao Li ◽  
Yu-Hang Zhang ◽  
XianChao Zhou ◽  
KaiYan Feng ◽  
...  

There are many types of cancers. Although they share some hallmarks, such as proliferation and metastasis, they are still very different from many perspectives. They grow on different organ or tissues. Does each cancer have a unique gene expression pattern that makes it different from other cancer types? After the Cancer Genome Atlas (TCGA) project, there are more and more pan-cancer studies. Researchers want to get robust gene expression signature from pan-cancer patients. But there is large variance in cancer patients due to heterogeneity. To get robust results, the sample size will be too large to recruit. In this study, we tried another approach to get robust pan-cancer biomarkers by using the cell line data to reduce the variance. We applied several advanced computational methods to analyze the Cancer Cell Line Encyclopedia (CCLE) gene expression profiles which included 988 cell lines from 20 cancer types. Two feature selection methods, including Boruta, and max-relevance and min-redundancy methods, were applied to the cell line gene expression data one by one, generating a feature list. Such list was fed into incremental feature selection method, incorporating one classification algorithm, to extract biomarkers, construct optimal classifiers and decision rules. The optimal classifiers provided good performance, which can be useful tools to identify cell lines from different cancer types, whereas the biomarkers (e.g. NCKAP1, TNFRSF12A, LAMB2, FKBP9, PFN2, TOM1L1) and rules identified in this work may provide a meaningful and precise reference for differentiating multiple types of cancer and contribute to the personalized treatment of tumors.


2021 ◽  
Vol 20 ◽  
pp. 117693512110024
Author(s):  
Jason D Wells ◽  
Jacqueline R Griffin ◽  
Todd W Miller

Motivation: Despite increasing understanding of the molecular characteristics of cancer, chemotherapy success rates remain low for many cancer types. Studies have attempted to identify patient and tumor characteristics that predict sensitivity or resistance to different types of conventional chemotherapies, yet a concise model that predicts chemosensitivity based on gene expression profiles across cancer types remains to be formulated. We attempted to generate pan-cancer models predictive of chemosensitivity and chemoresistance. Such models may increase the likelihood of identifying the type of chemotherapy most likely to be effective for a given patient based on the overall gene expression of their tumor. Results: Gene expression and drug sensitivity data from solid tumor cell lines were used to build predictive models for 11 individual chemotherapy drugs. Models were validated using datasets from solid tumors from patients. For all drug models, accuracy ranged from 0.81 to 0.93 when applied to all relevant cancer types in the testing dataset. When considering how well the models predicted chemosensitivity or chemoresistance within individual cancer types in the testing dataset, accuracy was as high as 0.98. Cell line–derived pan-cancer models were able to statistically significantly predict sensitivity in human tumors in some instances; for example, a pan-cancer model predicting sensitivity in patients with bladder cancer treated with cisplatin was able to significantly segregate sensitive and resistant patients based on recurrence-free survival times ( P = .048) and in patients with pancreatic cancer treated with gemcitabine ( P = .038). These models can predict chemosensitivity and chemoresistance across cancer types with clinically useful levels of accuracy.


2021 ◽  
Vol 23 (Supplement_6) ◽  
pp. vi139-vi139
Author(s):  
Jan Lost ◽  
Tej Verma ◽  
Niklas Tillmanns ◽  
W R Brim ◽  
Harry Subramanian ◽  
...  

Abstract PURPOSE Identifying molecular subtypes in gliomas has prognostic and therapeutic value, traditionally after invasive neurosurgical tumor resection or biopsy. Recent advances using artificial intelligence (AI) show promise in using pre-therapy imaging for predicting molecular subtype. We performed a systematic review of recent literature on AI methods used to predict molecular subtypes of gliomas. METHODS Literature review conforming to PRSIMA guidelines was performed for publications prior to February 2021 using 4 databases: Ovid Embase, Ovid MEDLINE, Cochrane trials (CENTRAL), and Web of Science core-collection. Keywords included: artificial intelligence, machine learning, deep learning, radiomics, magnetic resonance imaging, glioma, and glioblastoma. Non-machine learning and non-human studies were excluded. Screening was performed using Covidence software. Bias analysis was done using TRIPOD guidelines. RESULTS 11,727 abstracts were retrieved. After applying initial screening exclusion criteria, 1,135 full text reviews were performed, with 82 papers remaining for data extraction. 57% used retrospective single center hospital data, 31.6% used TCIA and BRATS, and 11.4% analyzed multicenter hospital data. An average of 146 patients (range 34-462 patients) were included. Algorithms predicting IDH status comprised 51.8% of studies, MGMT 18.1%, and 1p19q 6.0%. Machine learning methods were used in 71.4%, deep learning in 27.4%, and 1.2% directly compared both methods. The most common algorithm for machine learning were support vector machine (43.3%), and for deep learning convolutional neural network (68.4%). Mean prediction accuracy was 76.6%. CONCLUSION Machine learning is the predominant method for image-based prediction of glioma molecular subtypes. Major limitations include limited datasets (60.2% with under 150 patients) and thus limited generalizability of findings. We recommend using larger annotated datasets for AI network training and testing in order to create more robust AI algorithms, which will provide better prediction accuracy to real world clinical datasets and provide tools that can be translated to clinical practice.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5285 ◽  
Author(s):  
Mei Sze Tan ◽  
Siow-Wee Chang ◽  
Phaik Leng Cheah ◽  
Hwa Jen Yap

Although most of the cervical cancer cases are reported to be closely related to the Human Papillomavirus (HPV) infection, there is a need to study genes that stand up differentially in the final actualization of cervical cancers following HPV infection. In this study, we proposed an integrative machine learning approach to analyse multiple gene expression profiles in cervical cancer in order to identify a set of genetic markers that are associated with and may eventually aid in the diagnosis or prognosis of cervical cancers. The proposed integrative analysis is composed of three steps: namely, (i) gene expression analysis of individual dataset; (ii) meta-analysis of multiple datasets; and (iii) feature selection and machine learning analysis. As a result, 21 gene expressions were identified through the integrative machine learning analysis which including seven supervised and one unsupervised methods. A functional analysis with GSEA (Gene Set Enrichment Analysis) was performed on the selected 21-gene expression set and showed significant enrichment in a nine-potential gene expression signature, namely PEG3, SPON1, BTD and RPLP2 (upregulated genes) and PRDX3, COPB2, LSM3, SLC5A3 and AS1B (downregulated genes).


Cancers ◽  
2019 ◽  
Vol 11 (5) ◽  
pp. 723 ◽  
Author(s):  
Roberta Noberini ◽  
Camilla Restellini ◽  
Evelyn Oliva Savoia ◽  
Francesco Raimondi ◽  
Lavinia Ghiani ◽  
...  

Aberrations in histone post-translational modifications (PTMs), as well as in the histone modifying enzymes (HMEs) that catalyze their deposition and removal, have been reported in many tumors and many epigenetic inhibitors are currently under investigation for cancer treatment. Therefore, profiling epigenetic features in cancer could have important implications for the discovery of both biomarkers for patient stratification and novel epigenetic targets. In this study, we employed mass spectrometry-based approaches to comprehensively profile histone H3 PTMs in a panel of normal and tumoral tissues for different cancer types, identifying various changes, some of which appear to be a consequence of the increased proliferation rate of tumors, while others are cell-cycle independent. Histone PTM changes found in tumors partially correlate with alterations of the gene expression profiles of HMEs obtained from publicly available data and are generally lost in culture conditions. Through this analysis, we identified tumor- and subtype-specific histone PTM changes, but also widespread changes in the levels of histone H3 K9me3 and K14ac marks. In particular, H3K14ac showed a cell-cycle independent decrease in all the seven tumor/tumor subtype models tested and could represent a novel epigenetic hallmark of cancer.


Sign in / Sign up

Export Citation Format

Share Document