scholarly journals Identification of 12 cancer types through genome deep learning

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Yingshuai Sun ◽  
Sitao Zhu ◽  
Kailong Ma ◽  
Weiqing Liu ◽  
Yao Yue ◽  
...  

AbstractCancer is a major cause of death worldwide, and an early diagnosis is required for a favorable prognosis. Histological examination is the gold standard for cancer identification; however, large amount of inter-observer variability exists in histological diagnosis. Numerous studies have shown cancer genesis is accompanied by an accumulation of harmful mutations, potentiating the identification of cancer based on genomic information. We have proposed a method, GDL (genome deep learning), to study the relationship between genomic variations and traits based on deep neural networks. We analyzed 6,083 samples’ WES (Whole Exon Sequencing) mutations files from 12 cancer types obtained from the TCGA (The Cancer Genome Atlas) and 1,991 healthy samples’ WES data from the 1000 Genomes project. We constructed 12 specific models to distinguish between certain type of cancer and healthy tissues, a total-specific model that can identify healthy and cancer tissues, and a mixture model to distinguish between all 12 types of cancer based on GDL. We demonstrate that the accuracy of specific, mixture and total specific model are 97.47%, 70.08% and 94.70% for cancer identification. We developed an efficient method for the identification of cancer based on genomic information that offers a new direction for disease diagnosis.

2019 ◽  
Author(s):  
Yingshuai Sun ◽  
Sitao Zhu ◽  
Kailong Ma ◽  
Weiqing Liu ◽  
Yao Yue ◽  
...  

AbstractMotivationCancer is a major cause of death worldwide, and an early diagnosis is required for a favorable prognosis. Histological examination is the gold standard for cancer identification; however, there is a large amount of inter-observer variability in histological diagnosis. Numerous studies have shown that cancer genesis is accompanied by an accumulation of harmful mutations within patients’ genome, potentiating the identification of cancer based on genomic information. We have proposed a method, GDL (genome deep learning), to study the relationship between genomic variations and traits based on deep neural networks with multiple hidden layers and nonlinear transformations.ResultWe analyzed 6,083 samples from 12 cancer types obtained from the TCGA (The Cancer Genome Atlas) and 1,991 healthy samples from the 1000 Genomes project(Genomes Project, et al., 2010). We constructed 12 specific models to distinguish between certain types of cancers and healthy tissues, a specific model that can identify healthy vs diseased tissues, and a mixture model to distinguish between all 12 types of cancer based on GDL. We present the success obtained with GDL when applied to the challenging problem of cancer based on genomic variations and demonstrate state-of-the-art results (97%, 70.08% and 94.70%) for cancer identification. The mixture model achieved a comparable performance. With the development of new molecular and sequencing technologies, we can now collect circulating tumor DNA (ctDNA) from blood and monitor the cancer risk in real time, and using our model, we can also target cancerous tissue that may develop in the future. We developed a new and efficient method for the identification of cancer based on genomic information that offers a new direction for disease diagnosis while providing a new method to predict traits based on that information.Contact:[email protected]


2021 ◽  
Vol 11 ◽  
Author(s):  
Yi-Hong Liu ◽  
Yu-Lian Chen ◽  
Ting-Yu Lai ◽  
Ying-Chieh Ko ◽  
Yu-Fu Chou ◽  
...  

BackgroundPartial epithelial-mesenchymal transition (p-EMT) is a distinct clinicopathological feature prevalent in oral cavity tumors of The Cancer Genome Atlas. Located at the invasion front, p-EMT cells require additional support from the tumor stroma for collective cell migration, including track clearing, extracellular matrix remodeling and immune evasion. The pathological roles of otherwise nonmalignant cancer-associated fibroblasts (CAFs) in cancer progression are emerging.MethodsGene set enrichment analysis was used to reveal differentially enriched genes and molecular pathways in OC3 and TW2.6 xenograft tissues, representing mesenchymal and p-EMT tumors, respectively. R packages of genomic data science were executed for statistical evaluations and data visualization. Immunohistochemistry and Alcian blue staining were conducted to validate the bioinformatic results. Univariate and multivariate Cox proportional hazards models were performed to identify covariates significantly associated with overall survival in clinical datasets. Kaplan–Meier curves of estimated overall survival were compared for statistical difference using the log-rank test.ResultsCompared to mesenchymal OC3 cells, tumor stroma derived from p-EMT TW2.6 cells was significantly enriched in microvessel density, tumor-excluded macrophages, inflammatory CAFs, and extracellular hyaluronan deposition. By translating these results to clinical transcriptomic datasets of oral cancer specimens, including the Puram single-cell RNA-seq cohort comprising ~6000 cells, we identified the expression of stromal TGFBI and HYAL1 as independent poor and protective biomarkers, respectively, for 40 Taiwanese oral cancer tissues that were all derived from betel quid users. In The Cancer Genome Atlas, TGFBI was a poor marker not only for head and neck cancer but also for additional six cancer types and HYAL1 was a good indicator for four tumor cohorts, suggesting common stromal effects existing in different cancer types.ConclusionsAs the tumor stroma coevolves with cancer progression, the cellular origins of molecular markers identified from conventional whole tissue mRNA-based analyses should be cautiously interpreted. By incorporating disease-matched xenograft tissue and single-cell RNA-seq results, we suggested that TGFBI and HYAL1, primarily expressed by stromal CAFs and endothelial cells, respectively, could serve as robust prognostic biomarkers for oral cancer control.


Cancers ◽  
2021 ◽  
Vol 13 (15) ◽  
pp. 3811
Author(s):  
Hyun-Jong Jang ◽  
In-Hye Song ◽  
Sung-Hak Lee

Histomorphologic types of gastric cancer (GC) have significant prognostic values that should be considered during treatment planning. Because the thorough quantitative review of a tissue slide is a laborious task for pathologists, deep learning (DL) can be a useful tool to support pathologic workflow. In the present study, a fully automated approach was applied to distinguish differentiated/undifferentiated and non-mucinous/mucinous tumor types in GC tissue whole-slide images from The Cancer Genome Atlas (TCGA) stomach adenocarcinoma dataset (TCGA-STAD). By classifying small patches of tissue images into differentiated/undifferentiated and non-mucinous/mucinous tumor tissues, the relative proportion of GC tissue subtypes can be easily quantified. Furthermore, the distribution of different tissue subtypes can be clearly visualized. The patch-level areas under the curves for the receiver operating characteristic curves for the differentiated/undifferentiated and non-mucinous/mucinous classifiers were 0.932 and 0.979, respectively. We also validated the classifiers on our own GC datasets and confirmed that the generalizability of the classifiers is excellent. The results indicate that the DL-based tissue classifier could be a useful tool for the quantitative analysis of cancer tissue slides. By combining DL-based classifiers for various molecular and morphologic variations in tissue slides, the heterogeneity of tumor tissues can be unveiled more efficiently.


2021 ◽  
Author(s):  
Xiao-Cheng Wang ◽  
Ya Liu ◽  
Fei-Wu Long ◽  
Liang-Ren Liu ◽  
Chuan-Wen Fan

Background: The relationship between long noncoding RNAs (lncRNAs) and the mRNA stemness index (mRNAsi) in colorectal cancer (CRC) is still unclear. Materials & methods: The mRNAsi, mRNAsi-related lncRNAs and their clinical significance were analyzed by bioinformatic approaches in The Cancer Genome Atlas (TCGA)-COREAD dataset. Results: mRNAsi was negatively related to pathological features but positively related to overall survival and recurrence-free survival in CRC. A five mRNAsi-related lncRNAs prognostic signature was further developed and showed independent prognostic factors related to overall survival in CRC patients, due to the five mRNAsi-related lncRNAs involved in several pathways of the cancer stem cells and malignant cancer cell phenotypes. Conclusion: The present study highlights the potential roles of mRNAsi-related lncRNAs as alternative prognostic markers.


2021 ◽  
Author(s):  
Mengjun Zhang ◽  
Hao Li ◽  
Yuan Liu ◽  
Siyu Hou ◽  
Ping Cui ◽  
...  

Abstract Background: The purpose of this study was to determine the value of MAFK as a biomarker of cervical cancer prognosis and to explore its methylation and possible cellular signaling pathways. Methods: We analyzed the cervical cancer data of The Cancer Genome Atlas (TCGA) through bioinformatics, including MAFK expression, methylation, prognosis and genome enrichment analysis. Results: MAFK expression was higher in cervical cancer tissues and was negatively correlated with the methylation levels of five CpG sites. MAFK is an independent prognostic factor of cervical cancer and is involved in the Nod-like receptor signaling pathway. CMap analysis screened four drug candidates for cervical cancer treatment. Conclusions: We confirmed that MAFK is a novel prognostic biomarker for cervical cancer and aberrant methylation may also affect MAFK expression and carcinogenesis. This study provides a new molecular target for the prognostic evaluation and treatment of cervical cancer.


2021 ◽  
Author(s):  
Jun Du ◽  
Jinguo Wang

Abstract Background: The expression and molecular mechanism of cysteine rich transmembrane module containing 1 (CYSTM1) in human tumor cells remains unclear. The aim of this study was to determine whether CYSTM1 could be used as a potential prognostic biomarker for hepatocellular carcinoma (HCC).Methods: We first demonstrated the relationship between CYSTM1 expression and HCC in various public databases. Secondly, Kaplan–Meier analysis and Cox proportional hazard regression model were performed to evaluate the relationship between the expression of CYSTM1 and the survival of HCC patients which data was downloaded in the cancer genome atlas (TCGA) database. Finally, we used the expression data of CYSTM1 in TCGA database to predict CYSTM1-related signaling pathways through bioinformatics analysis.Results: The expression level of CYSTM1 in HCC tissues was significantly correlated with T stage (p = 0.039). In addition, Kaplan–Meier analysis showed that the expression of CYSTM1 was significantly associated with poor prognosis in patients with early-stage HCC (p = 0.003). Multivariate analysis indicated that CYSTM1 is a potential predictor of poor prognosis in HCC patients (p = 0.036). The results of biosynthesis analysis demonstrated that the data set of CYSTM1 high expression was mainly enriched in neurodegeneration and oxidative phosphorylation pathways.Conclusion: CYSTM1 is an effective biomarker for the prognosis of patients with early-stage HCC and may play a key role in the occurrence and progression of HCC.


2020 ◽  
Vol 21 (17) ◽  
pp. 6087
Author(s):  
Yunzhen Wei ◽  
Limeng Zhou ◽  
Yingzhang Huang ◽  
Dianjing Guo

Long noncoding RNA (lncRNA)/microRNA(miRNA)/mRNA triplets contribute to cancer biology. However, identifying significative triplets remains a major challenge for cancer research. The dynamic changes among factors of the triplets have been less understood. Here, by integrating target information and expression datasets, we proposed a novel computational framework to identify the triplets termed as “lncRNA-perturbated triplets”. We applied the framework to five cancer datasets in The Cancer Genome Atlas (TCGA) project and identified 109 triplets. We showed that the paired miRNAs and mRNAs were widely perturbated by lncRNAs in different cancer types. LncRNA perturbators and lncRNA-perturbated mRNAs showed significantly higher evolutionary conservation than other lncRNAs and mRNAs. Importantly, the lncRNA-perturbated triplets exhibited high cancer specificity. The pan-cancer perturbator OIP5-AS1 had higher expression level than that of the cancer-specific perturbators. These lncRNA perturbators were significantly enriched in known cancer-related pathways. Furthermore, among the 25 lncRNA in the 109 triplets, lncRNA SNHG7 was identified as a stable potential biomarker in lung adenocarcinoma (LUAD) by combining the TCGA dataset and two independent GEO datasets. Results from cell transfection also indicated that overexpression of lncRNA SNHG7 and TUG1 enhanced the expression of the corresponding mRNA PNMA2 and CDC7 in LUAD. Our study provides a systematic dissection of lncRNA-perturbated triplets and facilitates our understanding of the molecular roles of lncRNAs in cancers.


NAR Cancer ◽  
2020 ◽  
Vol 2 (1) ◽  
Author(s):  
Julianne K David ◽  
Sean K Maden ◽  
Benjamin R Weeder ◽  
Reid F Thompson ◽  
Abhinav Nellore

Abstract This study probes the distribution of putatively cancer-specific junctions across a broad set of publicly available non-cancer human RNA sequencing (RNA-seq) datasets. We compared cancer and non-cancer RNA-seq data from The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression (GTEx) Project and the Sequence Read Archive. We found that (i) averaging across cancer types, 80.6% of exon–exon junctions thought to be cancer-specific based on comparison with tissue-matched samples (σ = 13.0%) are in fact present in other adult non-cancer tissues throughout the body; (ii) 30.8% of junctions not present in any GTEx or TCGA normal tissues are shared by multiple samples within at least one cancer type cohort, and 87.4% of these distinguish between different cancer types; and (iii) many of these junctions not found in GTEx or TCGA normal tissues (15.4% on average, σ = 2.4%) are also found in embryological and other developmentally associated cells. These findings refine the meaning of RNA splicing event novelty, particularly with respect to the human neoepitope repertoire. Ultimately, cancer-specific exon–exon junctions may have a substantial causal relationship with the biology of disease.


mSystems ◽  
2018 ◽  
Vol 3 (5) ◽  
Author(s):  
Sara R. Selitsky ◽  
David Marron ◽  
Lisle E. Mose ◽  
Joel S. Parker ◽  
Dirk P. Dittmer

ABSTRACTEpstein-Barr virus (EBV) is convincingly associated with gastric cancer, nasopharyngeal carcinoma, and certain lymphomas, but its role in other cancer types remains controversial. To test the hypothesis that there are additional cancer types with high prevalence of EBV, we determined EBV viral expression in all the Cancer Genome Atlas Project (TCGA) mRNA sequencing (mRNA-seq) samples (n= 10,396) from 32 different tumor types. We found that EBV was present in gastric adenocarcinoma and lymphoma, as expected, and was also present in >5% of samples in 10 additional tumor types. For most samples, EBV transcript levels were low, which suggests that EBV was likely present due to infected infiltrating B cells. In order to determine if there was a difference in the B-cell populations, we assembled B-cell receptors for each sample and found B-cell receptor abundance (P≤ 1.4 × 10−20) and diversity (P≤ 8.3 × 10−27) were significantly higher in EBV-positive samples. Moreover, diversity was independent of B-cell abundance, suggesting that the presence of EBV was associated with an increased and altered B-cell population.IMPORTANCEAround 20% of human cancers are associated with viruses. Epstein-Barr virus (EBV) contributes to gastric cancer, nasopharyngeal carcinoma, and certain lymphomas, but its role in other cancer types remains controversial. We assessed the prevalence of EBV in RNA-seq from 32 tumor types in the Cancer Genome Atlas Project (TCGA) and found EBV to be present in >5% of samples in 12 tumor types. EBV infects epithelial cells and B cells and in B cells causes proliferation. We hypothesized that the low expression of EBV in most of the tumor types was due to infiltration of B cells into the tumor. The increase in B-cell abundance and diversity in subjects where EBV was detected in the tumors strengthens this hypothesis. Overall, we found that EBV was associated with an increased and altered immune response. This result is not evidence of causality, but a potential novel biomarker for tumor immune status.


2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Ruobing Wang ◽  
Yan Jiao ◽  
Yanqing Li ◽  
Siyang Ye ◽  
Guoqiang Pan ◽  
...  

Liver cancer is a devastating disease for humans with poor prognosis. Although the survival rate of patients with liver cancer has improved in the past decades, the recurrence and metastasis of liver cancer are still obstacles for us. Inositol polyphosphate-5-phosphatase K (INPP5K) belongs to the family of phosphoinositide 5-phosphatases (PI 5-phosphatases), which have been reported to be associated with cell migration, polarity, adhesion, and cell invasion, especially in cancers. However, there have been few studies on the correlation of INPP5K and liver cancer. In this study, we explored the prognostic significance of INPP5K in liver cancer through bioinformatics analysis of data collected from The Cancer Genome Atlas (TCGA) database. Chi-square and Fisher exact tests were used to evaluate the relationship between INPP5K expression and clinical characteristics. Our results showed that low INPP5K expression was correlated with poor outcomes in liver cancer patients. Univariate and multivariate Cox analyses demonstrated that low INPP5K mRNA expression played a significant role in shortening overall survival (OS) and relapse-free survival (RFS), which might serve as the useful biomarker and prognostic factor for liver cancer. In conclusion, low INPP5K mRNA expression is an independent risk factor for poor prognosis in liver cancer.


Sign in / Sign up

Export Citation Format

Share Document