scholarly journals Machine learning analysis of TCGA cancer data

2021 ◽  
Vol 7 ◽  
pp. e584
Author(s):  
Jose Liñares-Blanco ◽  
Alejandro Pazos ◽  
Carlos Fernandez-Lozano

In recent years, machine learning (ML) researchers have changed their focus towards biological problems that are difficult to analyse with standard approaches. Large initiatives such as The Cancer Genome Atlas (TCGA) have allowed the use of omic data for the training of these algorithms. In order to study the state of the art, this review is provided to cover the main works that have used ML with TCGA data. Firstly, the principal discoveries made by the TCGA consortium are presented. Once these bases have been established, we begin with the main objective of this study, the identification and discussion of those works that have used the TCGA data for the training of different ML approaches. After a review of more than 100 different papers, it has been possible to make a classification according to following three pillars: the type of tumour, the type of algorithm and the predicted biological problem. One of the conclusions drawn in this work shows a high density of studies based on two major algorithms: Random Forest and Support Vector Machines. We also observe the rise in the use of deep artificial neural networks. It is worth emphasizing, the increase of integrative models of multi-omic data analysis. The different biological conditions are a consequence of molecular homeostasis, driven by both protein coding regions, regulatory elements and the surrounding environment. It is notable that a large number of works make use of genetic expression data, which has been found to be the preferred method by researchers when training the different models. The biological problems addressed have been classified into five types: prognosis prediction, tumour subtypes, microsatellite instability (MSI), immunological aspects and certain pathways of interest. A clear trend was detected in the prediction of these conditions according to the type of tumour. That is the reason for which a greater number of works have focused on the BRCA cohort, while specific works for survival, for example, were centred on the GBM cohort, due to its large number of events. Throughout this review, it will be possible to go in depth into the works and the methodologies used to study TCGA cancer data. Finally, it is intended that this work will serve as a basis for future research in this field of study.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Dejun Wu ◽  
Zhenhua Yin ◽  
Yisheng Ji ◽  
Lin Li ◽  
Yunxin Li ◽  
...  

AbstractLncRNAs play a pivotal role in tumorigenesis and development. However, the potential involvement of lncRNAs in colon adenocarcinoma (COAD) needs to be further explored. All the data used in this study were obtained from The Cancer Genome Atlas database, and all analyses were conducted using R software. Basing on the seven prognosis-related lncRNAs finally selected, we developed a prognosis-predicting model with powerful effectiveness (training cohort, 1 year: AUC = 0.70, 95% Cl = 0.57–0.78; 3 years: AUC = 0.71, 95% Cl = 0.6–0.8; 5 years: AUC = 0.76, 95% Cl = 0.66–0.87; validation cohort, 1 year: AUC = 0.70, 95% Cl = 0.58–0.8; 3 years: AUC = 0.73, 95% Cl = 0.63–0.82; 5 years: AUC = 0.68, 95% Cl = 0.5–0.85). The VEGF and Notch pathway were analyzed through GSEA analysis, and low immune and stromal scores were found in high-risk patients (immune score, cor =  − 0.15, P < 0.001; stromal score, cor =  − 0.18, P < 0.001) , which may partially explain the poor prognosis of patients in the high-risk group. We screened lncRNAs that are significantly associated with the survival of patients with COAD and possibly participate in autophagy regulation. This study may provide direction for future research.


2021 ◽  
Vol 22 (11) ◽  
pp. 6091
Author(s):  
Kristina Daniunaite ◽  
Arnas Bakavicius ◽  
Kristina Zukauskaite ◽  
Ieva Rauluseviciute ◽  
Juozas Rimantas Lazutka ◽  
...  

The molecular diversity of prostate cancer (PCa) has been demonstrated by recent genome-wide studies, proposing a significant number of different molecular markers. However, only a few of them have been transferred into clinical practice so far. The present study aimed to identify and validate novel DNA methylation biomarkers for PCa diagnosis and prognosis. Microarray-based methylome data of well-characterized cancerous and noncancerous prostate tissue (NPT) pairs was used for the initial screening. Ten protein-coding genes were selected for validation in a set of 151 PCa, 51 NPT, as well as 17 benign prostatic hyperplasia samples. The Prostate Cancer Dataset (PRAD) of The Cancer Genome Atlas (TCGA) was utilized for independent validation of our findings. Methylation frequencies of ADAMTS12, CCDC181, FILIP1L, NAALAD2, PRKCB, and ZMIZ1 were up to 91% in our study. PCa specific methylation of ADAMTS12, CCDC181, NAALAD2, and PRKCB was demonstrated by qualitative and quantitative means (all p < 0.05). In agreement with PRAD, promoter methylation of these four genes was associated with the transcript down-regulation in the Lithuanian cohort (all p < 0.05). Methylation of ADAMTS12, NAALAD2, and PRKCB was independently predictive for biochemical disease recurrence, while NAALAD2 and PRKCB increased the prognostic power of multivariate models (all p < 0.01). The present study identified methylation of ADAMTS12, NAALAD2, and PRKCB as novel diagnostic and prognostic PCa biomarkers that might guide treatment decisions in clinical practice.


2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Meiwei Mu ◽  
Yi Tang ◽  
Zheng Yang ◽  
Yuling Qiu ◽  
Xiaohong Li ◽  
...  

Objective. To explore the expression of immune-related lncRNAs in colon adenocarcinoma and find out the effect on how these lncRNAs influence the development and prognosis of colon adenocarcinoma. Method. Transcriptome data of colon adenocarcinoma from The Cancer Genome Atlas (TCGA) were downloaded, and gene sets “IMMUNE RESPONSE” and “IMMUNE SYSTEM PROCESS” were sought from the Molecular Signatures Database (MSigDB). The expression of immune-related genes was extracted that were immune-related mRNAs. Then, the immune-related lncRNAs were sought out by utilizing of the above data. Clinical traits were combined with immune-related lncRNAs, so that prognostic-related lncRNAs were identified by Cox regression. Multivariate Cox regression was built to calculate risk scores. Relationships between clinical traits and immune-related lncRNAs were also calculated. Result. A total of 480 colorectal adenocarcinoma patients and 41 normal control patients’ transcriptome sequencing data of tissue samples were obtained from TCGA database. 918 immune-related lncRNAs were screened. Cox regression showed that 34 immune-related lncRNAs were associated with colon adenocarcinoma prognosis. Seven lncRNAs were independent risk factors. Conclusion. This study revealed that some lncRNAs can affect the development and prognosis of colon adenocarcinoma. It may provide new theory evidence of molecular mechanism for the future research and molecular targeted therapy of colon adenocarcinoma.


2021 ◽  
Author(s):  
Mengjun Zhang ◽  
Hao Li ◽  
Yuan Liu ◽  
Siyu Hou ◽  
Ping Cui ◽  
...  

Abstract Background: The purpose of this study was to determine the value of MAFK as a biomarker of cervical cancer prognosis and to explore its methylation and possible cellular signaling pathways. Methods: We analyzed the cervical cancer data of The Cancer Genome Atlas (TCGA) through bioinformatics, including MAFK expression, methylation, prognosis and genome enrichment analysis. Results: MAFK expression was higher in cervical cancer tissues and was negatively correlated with the methylation levels of five CpG sites. MAFK is an independent prognostic factor of cervical cancer and is involved in the Nod-like receptor signaling pathway. CMap analysis screened four drug candidates for cervical cancer treatment. Conclusions: We confirmed that MAFK is a novel prognostic biomarker for cervical cancer and aberrant methylation may also affect MAFK expression and carcinogenesis. This study provides a new molecular target for the prognostic evaluation and treatment of cervical cancer.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e10884
Author(s):  
Xin Yu ◽  
Qian Yang ◽  
Dong Wang ◽  
Zhaoyang Li ◽  
Nianhang Chen ◽  
...  

Applying the knowledge that methyltransferases and demethylases can modify adjacent cytosine-phosphorothioate-guanine (CpG) sites in the same DNA strand, we found that combining multiple CpGs into a single block may improve cancer diagnosis. However, survival prediction remains a challenge. In this study, we developed a pipeline named “stacked ensemble of machine learning models for methylation-correlated blocks” (EnMCB) that combined Cox regression, support vector regression (SVR), and elastic-net models to construct signatures based on DNA methylation-correlated blocks for lung adenocarcinoma (LUAD) survival prediction. We used methylation profiles from the Cancer Genome Atlas (TCGA) as the training set, and profiles from the Gene Expression Omnibus (GEO) as validation and testing sets. First, we partitioned the genome into blocks of tightly co-methylated CpG sites, which we termed methylation-correlated blocks (MCBs). After partitioning and feature selection, we observed different diagnostic capacities for predicting patient survival across the models. We combined the multiple models into a single stacking ensemble model. The stacking ensemble model based on the top-ranked block had the area under the receiver operating characteristic curve of 0.622 in the TCGA training set, 0.773 in the validation set, and 0.698 in the testing set. When stratified by clinicopathological risk factors, the risk score predicted by the top-ranked MCB was an independent prognostic factor. Our results showed that our pipeline was a reliable tool that may facilitate MCB selection and survival prediction.


Author(s):  
A. B.M. Shawkat Ali

From the beginning, machine learning methodology, which is the origin of artificial intelligence, has been rapidly spreading in the different research communities with successful outcomes. This chapter aims to introduce for system analysers and designers a comparatively new statistical supervised machine learning algorithm called support vector machine (SVM). We explain two useful areas of SVM, that is, classification and regression, with basic mathematical formulation and simple demonstration to make easy the understanding of SVM. Prospects and challenges of future research in this emerging area are also described. Future research of SVM will provide improved and quality access to the users. Therefore, developing an automated SVM system with state-of-the-art technologies is of paramount importance, and hence, this chapter will link up an important step in the system analysis and design perspective to this evolving research arena.


2020 ◽  
Vol 16 (25) ◽  
pp. 1921-1930
Author(s):  
Zhou Xu ◽  
Lin Zhuang ◽  
Xiaoyin Wang ◽  
Qianrong Li ◽  
Yan Sang ◽  
...  

Aim: To explore FBXW7 protein-coding transcript isoform (α, β and γ) expression, their functions and prognostic value in ovarian serous cystadenocarcinoma (OSC). Materials & methods: FBXW7 transcript data were collected from The Cancer Genome Atlas and the Genotype-Tissue Expression project. IOSE, A2780 and SKOV3 cells were used for in vitro and in vivo studies. Results: FBXW7α and FBXW7γ are dominant protein-coding transcripts that were downregulated in OSC. FBXW7γ overexpression reduced the protein expression of c-Myc, Notch1 and Yap1 and suppressed OSC cell growth in vitro and in vivo. FBXW7γ expression was an independent indicator of longer disease-specific survival (HR: 0.588; 95% CI: 0.449–0.770) and progression-free survival (HR: 0.708; 95% CI: 0.562–0.892). Conclusion: FBXW7γ is a tumor-suppressive and might be the only prognosis-related FBXW7 transcript in OSC.


2018 ◽  
Vol 8 (8) ◽  
pp. 1280 ◽  
Author(s):  
Yong Kim ◽  
Youngdoo Son ◽  
Wonjoon Kim ◽  
Byungki Jin ◽  
Myung Yun

Sitting on a chair in an awkward posture or sitting for a long period of time is a risk factor for musculoskeletal disorders. A postural habit that has been formed cannot be changed easily. It is important to form a proper postural habit from childhood as the lumbar disease during childhood caused by their improper posture is most likely to recur. Thus, there is a need for a monitoring system that classifies children’s sitting postures. The purpose of this paper is to develop a system for classifying sitting postures for children using machine learning algorithms. The convolutional neural network (CNN) algorithm was used in addition to the conventional algorithms: Naïve Bayes classifier (NB), decision tree (DT), neural network (NN), multinomial logistic regression (MLR), and support vector machine (SVM). To collect data for classifying sitting postures, a sensing cushion was developed by mounting a pressure sensor mat (8 × 8) inside children’s chair seat cushion. Ten children participated, and sensor data was collected by taking a static posture for the five prescribed postures. The accuracy of CNN was found to be the highest as compared with those of the other algorithms. It is expected that the comprehensive posture monitoring system would be established through future research on enhancing the classification algorithm and providing an effective feedback system.


Author(s):  
Xudong Tang ◽  
Mengyan Zhang ◽  
Liang Sun ◽  
Fengyan Xu ◽  
Xin Peng ◽  
...  

Long non-coding RNAs (lncRNAs) play key roles in tumors and function not only as important molecular markers for cancer prognosis, but also as molecular characteristics at the pan-cancer level. Because of the poor prognosis of pancreatic cancer, accurate assessment of prognosis is a key issue in the development of treatment plans for pancreatic cancer. Here we analyzed pancreatic cancer data from The Cancer Genome Atlas and The Genotype Tissue Expression database using Cox regression and lasso regression in analyses using a combination of the two databases as well as only The Cancer Genome Atlas database (Cancer Genome Atlas Research Network et al., 2013). A prognostic risk score model with significant correlation with pancreatic cancer survival was constructed, and two lncRNAs were investigated. Additional analysis of 33 cancers using the two lncRNAs showed that lncRNA TsPOAP1-AS1 was a prognostic marker of seven cancers, among which pancreatic cancer was the most significant, and lncRNA mi600hg was a prognostic marker of ovarian cancer and pancreatic cancer. LncRNA TsPOAP1-AS1 is associated with clinical stage and tumor mutation burden of some cancers as well as a strong degree of immune infiltration in many cancers, while a strong correlation between lncRNA mi600hg and microsatellite instability was observed in several cancers. The results of this study help further our understanding of the different functions of lncRNAs in cancer and may aid in the clinical application of lncRNAs as prognostic factors for cancer.


Sign in / Sign up

Export Citation Format

Share Document