MEpurity: estimating tumor purity using DNA methylation data

Bioinformatics ◽

10.1093/bioinformatics/btz555 ◽

2019 ◽

Author(s):

Bowen Liu ◽

Xiaofei Yang ◽

Tingjie Wang ◽

Jiadong Lin ◽

Yongyong Kang ◽

...

Keyword(s):

Fundamental Property ◽

Cancer Cell Line ◽

The Cancer Genome Atlas ◽

Supplementary Information ◽

Estimation Methods ◽

Normal Sample ◽

Cancer Data ◽

Tumor Purity ◽

Cancer Genome Atlas ◽

Beta Mixture Model

Abstract Motivation Tumor purity is a fundamental property of each cancer sample and affects downstream investigations. Current tumor purity estimation methods either require matched normal sample or report moderately high tumor purity even on normal samples. It is critical to develop a novel computational approach to estimate tumor purity with sufficient precision based on tumor-only sample. Results In this study, we developed MEpurity, a beta mixture model-based algorithm, to estimate the tumor purity based on tumor-only Illumina Infinium 450k methylation microarray data. We applied MEpurity to both The Cancer Genome Atlas (TCGA) cancer data and cancer cell line data, demonstrating that MEpurity reports low tumor purity on normal samples and comparable results on tumor samples with other state-of-art methods. Availability and implementation MEpurity is a C++ program which is available at https://github.com/xjtu-omics/MEpurity. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Ordino: a visual cancer analysis tool for ranking and exploring genes, cell lines and tissue samples

Bioinformatics ◽

10.1093/bioinformatics/btz009 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3140-3142 ◽

Cited By ~ 7

Author(s):

Marc Streit ◽

Samuel Gratzl ◽

Holger Stitz ◽

Andreas Wernitznig ◽

Thomas Zichner ◽

...

Keyword(s):

Cell Lines ◽

Cancer Genomics ◽

Cancer Cell Line ◽

The Cancer Genome Atlas ◽

Supplementary Information ◽

Analysis Tool ◽

Tissue Samples ◽

Web Based ◽

Prioritization Process ◽

Cancer Genome Atlas

Abstract Summary Ordino is a web-based analysis tool for cancer genomics that allows users to flexibly rank, filter and explore genes, cell lines and tissue samples based on pre-loaded data, including The Cancer Genome Atlas, the Cancer Cell Line Encyclopedia and manually uploaded information. Interactive tabular data visualization that facilitates the user-driven prioritization process forms a core component of Ordino. Detail views of selected items complement the exploration. Findings can be stored, shared and reproduced via the integrated session management. Availability and implementation Ordino is publicly available at https://ordino.caleydoapp.org. The source code is released at https://github.com/Caleydo/ordino under the Mozilla Public License 2.0. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

NExUS: Bayesian simultaneous network estimation across unequal sample sizes

Bioinformatics ◽

10.1093/bioinformatics/btz636 ◽

2019 ◽

Vol 36 (3) ◽

pp. 798-804

Author(s):

Priyam Das ◽

Christine B Peterson ◽

Kim-Anh Do ◽

Rehan Akbani ◽

Veerabhadran Baladandayuthapani

Keyword(s):

The Cancer Genome Atlas ◽

Supplementary Information ◽

Estimation Methods ◽

Sample Sizes ◽

Multiple Networks ◽

Proteomic Data ◽

Network Similarity ◽

Cancer Genome Atlas ◽

Network Estimation ◽

Unequal Sample Sizes

Abstract Motivation Network-based analyses of high-throughput genomics data provide a holistic, systems-level understanding of various biological mechanisms for a common population. However, when estimating multiple networks across heterogeneous sub-populations, varying sample sizes pose a challenge in the estimation and inference, as network differences may be driven by differences in power. We are particularly interested in addressing this challenge in the context of proteomic networks for related cancers, as the number of subjects available for rare cancer (sub-)types is often limited. Results We develop NExUS (Network Estimation across Unequal Sample sizes), a Bayesian method that enables joint learning of multiple networks while avoiding artefactual relationship between sample size and network sparsity. We demonstrate through simulations that NExUS outperforms existing network estimation methods in this context, and apply it to learn network similarity and shared pathway activity for groups of cancers with related origins represented in The Cancer Genome Atlas (TCGA) proteomic data. Availability and implementation The NExUS source code is freely available for download at https://github.com/priyamdas2/NExUS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

The Prognostic Significance of MAFK and its Methylation in Cervical Cancer

10.21203/rs.3.rs-582069/v1 ◽

2021 ◽

Author(s):

Mengjun Zhang ◽

Hao Li ◽

Yuan Liu ◽

Siyu Hou ◽

Ping Cui ◽

...

Keyword(s):

Cervical Cancer ◽

Prognostic Significance ◽

Enrichment Analysis ◽

The Cancer Genome Atlas ◽

Cancer Prognosis ◽

Aberrant Methylation ◽

Cancer Data ◽

Drug Candidates ◽

Cancer Genome Atlas ◽

Cancer Tissues

Abstract Background: The purpose of this study was to determine the value of MAFK as a biomarker of cervical cancer prognosis and to explore its methylation and possible cellular signaling pathways. Methods: We analyzed the cervical cancer data of The Cancer Genome Atlas (TCGA) through bioinformatics, including MAFK expression, methylation, prognosis and genome enrichment analysis. Results: MAFK expression was higher in cervical cancer tissues and was negatively correlated with the methylation levels of five CpG sites. MAFK is an independent prognostic factor of cervical cancer and is involved in the Nod-like receptor signaling pathway. CMap analysis screened four drug candidates for cervical cancer treatment. Conclusions: We confirmed that MAFK is a novel prognostic biomarker for cervical cancer and aberrant methylation may also affect MAFK expression and carcinogenesis. This study provides a new molecular target for the prognostic evaluation and treatment of cervical cancer.

Download Full-text

Rapid advancement in cancer genomic big data in the pursuit of precision oncology

Medical Journal of Indonesia ◽

10.13181/mji.rev.204250 ◽

2021 ◽

Author(s):

Tiara Bunga Mayang Permata ◽

Sri Mutya Sekarutami ◽

Endang Nuryadi ◽

Angela Giselvania ◽

Soehartati Gondhowiardjo

Keyword(s):

Big Data ◽

Open Access ◽

Cancer Cell ◽

Cancer Cell Line ◽

Genomic Data ◽

The Cancer Genome Atlas ◽

Clinical Samples ◽

Precision Oncology ◽

Cancer Data ◽

User Friendly

In the current big data era, massive genomic cancer data are available for open access from anywhere in the world. They are obtained from popular platforms, such as The Cancer Genome Atlas, which provides genetic information from clinical samples, and Cancer Cell Line Encyclopedia, which offers genomic data of cancer cell lines. For convenient analysis, user-friendly tools, such as the Tumor Immune Estimation Resource (TIMER), which can be used to analyze tumor-infiltrating immune cells comprehensively, are also emerging. In clinical practice, clinical sequencing has been recommended for patients with cancer in many countries. Despite its many challenges, it enables the application of precision medicine, especially in medical oncology. In this review, several efforts devoted to accomplishing precision oncology and applying big data for use in Indonesia are discussed. Utilizing open access genomic data in writing research articles is also described.

Download Full-text

IsoformSwitchAnalyzeR: analysis of changes in genome-wide patterns of alternative splicing and its functional consequences

Bioinformatics ◽

10.1093/bioinformatics/btz247 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4469-4471 ◽

Cited By ~ 21

Author(s):

Kristoffer Vitting-Seerup ◽

Albin Sandelin

Keyword(s):

Alternative Splicing ◽

The Cancer Genome Atlas ◽

Supplementary Information ◽

Rna Seq ◽

Genome Wide ◽

Functional Consequences ◽

Cancer Genome Atlas ◽

Health And Disease ◽

Splicing Patterns

Abstract Summary Alternative splicing is an important mechanism involved in health and disease. Recent work highlights the importance of investigating genome-wide changes in splicing patterns and the subsequent functional consequences. Current computational methods only support such analysis on a gene-by-gene basis. Therefore, we extended IsoformSwitchAnalyzeR R library to enable analysis of genome-wide changes in specific types of alternative splicing and predicted functional consequences of the resulting isoform switches. As a case study, we analyzed RNA-seq data from The Cancer Genome Atlas and found systematic changes in alternative splicing and the consequences of the associated isoform switches. Availability and implementation Windows, Linux and Mac OS: http://bioconductor.org/packages/IsoformSwitchAnalyzeR. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

The Biological Function Delineated Across Pan-Cancer Levels Through lncRNA-Based Prognostic Risk Assessment Factors for Pancreatic Cancer

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.694652 ◽

2021 ◽

Vol 9 ◽

Author(s):

Xudong Tang ◽

Mengyan Zhang ◽

Liang Sun ◽

Fengyan Xu ◽

Xin Peng ◽

...

Keyword(s):

Pancreatic Cancer ◽

Prognostic Marker ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Research Network ◽

Molecular Characteristics ◽

Cancer Data ◽

Cancer Genome Atlas ◽

Pan Cancer ◽

Genome Atlas

Long non-coding RNAs (lncRNAs) play key roles in tumors and function not only as important molecular markers for cancer prognosis, but also as molecular characteristics at the pan-cancer level. Because of the poor prognosis of pancreatic cancer, accurate assessment of prognosis is a key issue in the development of treatment plans for pancreatic cancer. Here we analyzed pancreatic cancer data from The Cancer Genome Atlas and The Genotype Tissue Expression database using Cox regression and lasso regression in analyses using a combination of the two databases as well as only The Cancer Genome Atlas database (Cancer Genome Atlas Research Network et al., 2013). A prognostic risk score model with significant correlation with pancreatic cancer survival was constructed, and two lncRNAs were investigated. Additional analysis of 33 cancers using the two lncRNAs showed that lncRNA TsPOAP1-AS1 was a prognostic marker of seven cancers, among which pancreatic cancer was the most significant, and lncRNA mi600hg was a prognostic marker of ovarian cancer and pancreatic cancer. LncRNA TsPOAP1-AS1 is associated with clinical stage and tumor mutation burden of some cancers as well as a strong degree of immune infiltration in many cancers, while a strong correlation between lncRNA mi600hg and microsatellite instability was observed in several cancers. The results of this study help further our understanding of the different functions of lncRNAs in cancer and may aid in the clinical application of lncRNAs as prognostic factors for cancer.

Download Full-text

The Analysis of Gene Expression Data Incorporating Tumor Purity Information

Frontiers in Genetics ◽

10.3389/fgene.2021.642759 ◽

2021 ◽

Vol 12 ◽

Author(s):

Seungjun Ahn ◽

Tyler Grimes ◽

Somnath Datta

Keyword(s):

Gene Expression ◽

Tumor Cells ◽

Gene Expression Data ◽

The Cancer Genome Atlas ◽

Data Sets ◽

Expression Data ◽

Tumor Purity ◽

Robust Model ◽

Differential Network ◽

Cancer Genome Atlas

The tumor microenvironment is composed of tumor cells, stroma cells, immune cells, blood vessels, and other associated non-cancerous cells. Gene expression measurements on tumor samples are an average over cells in the microenvironment. However, research questions often seek answers about tumor cells rather than the surrounding non-tumor tissue. Previous studies have suggested that the tumor purity (TP)—the proportion of tumor cells in a solid tumor sample—has a confounding effect on differential expression (DE) analysis of high vs. low survival groups. We investigate three ways incorporating the TP information in the two statistical methods used for analyzing gene expression data, namely, differential network (DN) analysis and DE analysis. Analysis 1 ignores the TP information completely, Analysis 2 uses a truncated sample by removing the low TP samples, and Analysis 3 uses TP as a covariate in the underlying statistical models. We use three gene expression data sets related to three different cancers from the Cancer Genome Atlas (TCGA) for our investigation. The networks from Analysis 2 have greater amount of differential connectivity in the two networks than that from Analysis 1 in all three cancer datasets. Similarly, Analysis 1 identified more differentially expressed genes than Analysis 2. Results of DN and DE analyses using Analysis 3 were mostly consistent with those of Analysis 1 across three cancers. However, Analysis 3 identified additional cancer-related genes in both DN and DE analyses. Our findings suggest that using TP as a covariate in a linear model is appropriate for DE analysis, but a more robust model is needed for DN analysis. However, because true DN or DE patterns are not known for the empirical datasets, simulated datasets can be used to study the statistical properties of these methods in future studies.

Download Full-text

HotSpot3D web server: an integrated resource for mutation analysis in protein 3D structures

Bioinformatics ◽

10.1093/bioinformatics/btaa258 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3944-3946 ◽

Cited By ~ 2

Author(s):

Shanyu Chen ◽

Xiaoyu He ◽

Ruilin Li ◽

Xiaohong Duan ◽

Beifang Niu

Keyword(s):

Mutation Analysis ◽

Web Server ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Supplementary Information ◽

Supplementary Data ◽

3D Structures ◽

One Stop ◽

Cancer Genome Atlas ◽

Genome Atlas

Abstract Motivation HotSpot3D is a widely used software for identifying mutation hotspots on the 3D structures of proteins. To further assist users, we developed a new HotSpot3D web server to make this software more versatile, convenient and interactive. Results The HotSpot3D web server performs data pre-processing, clustering, visualization and log-viewing on one stop. Users can interactively explore each cluster and easily re-visualize the mutational clusters within browsers. We also provide a database that allows users to search and visualize proximal mutations from 33 cancers in the Cancer Genome Atlas. Availability and implementation http://niulab.scgrid.cn/HotSpot3D/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Modelling cancer progression using Mutual Hazard Networks

Bioinformatics ◽

10.1093/bioinformatics/btz513 ◽

2019 ◽

Vol 36 (1) ◽

pp. 241-249 ◽

Cited By ~ 2

Author(s):

Rudolf Schill ◽

Stefan Solbrig ◽

Tilo Wettig ◽

Rainer Spang

Keyword(s):

Cancer Progression ◽

Learning Algorithm ◽

Directed Acyclic Graphs ◽

The Cancer Genome Atlas ◽

Supplementary Information ◽

Cross Sectional ◽

Acyclic Graphs ◽

Cancer Genome Atlas ◽

Occurrence State ◽

Occurrence Patterns

Abstract Motivation Cancer progresses by accumulating genomic events, such as mutations and copy number alterations, whose chronological order is key to understanding the disease but difficult to observe. Instead, cancer progression models use co-occurrence patterns in cross-sectional data to infer epistatic interactions between events and thereby uncover their most likely order of occurrence. State-of-the-art progression models, however, are limited by mathematical tractability and only allow events to interact in directed acyclic graphs, to promote but not inhibit subsequent events, or to be mutually exclusive in distinct groups that cannot overlap. Results Here we propose Mutual Hazard Networks (MHN), a new Machine Learning algorithm to infer cyclic progression models from cross-sectional data. MHN model events by their spontaneous rate of fixation and by multiplicative effects they exert on the rates of successive events. MHN compared favourably to acyclic models in cross-validated model fit on four datasets tested. In application to the glioblastoma dataset from The Cancer Genome Atlas, MHN proposed a novel interaction in line with consecutive biopsies: IDH1 mutations are early events that promote subsequent fixation of TP53 mutations. Availability and implementation Implementation and data are available at https://github.com/RudiSchill/MHN. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Machine learning analysis of TCGA cancer data

PeerJ Computer Science ◽

10.7717/peerj-cs.584 ◽

2021 ◽

Vol 7 ◽

pp. e584

Author(s):

Jose Liñares-Blanco ◽

Alejandro Pazos ◽

Carlos Fernandez-Lozano

Keyword(s):

Machine Learning ◽

Regulatory Elements ◽

The Cancer Genome Atlas ◽

Future Research ◽

Support Vector ◽

Clear Trend ◽

Protein Coding ◽

Cancer Data ◽

Cancer Genome Atlas ◽

Omic Data

In recent years, machine learning (ML) researchers have changed their focus towards biological problems that are difficult to analyse with standard approaches. Large initiatives such as The Cancer Genome Atlas (TCGA) have allowed the use of omic data for the training of these algorithms. In order to study the state of the art, this review is provided to cover the main works that have used ML with TCGA data. Firstly, the principal discoveries made by the TCGA consortium are presented. Once these bases have been established, we begin with the main objective of this study, the identification and discussion of those works that have used the TCGA data for the training of different ML approaches. After a review of more than 100 different papers, it has been possible to make a classification according to following three pillars: the type of tumour, the type of algorithm and the predicted biological problem. One of the conclusions drawn in this work shows a high density of studies based on two major algorithms: Random Forest and Support Vector Machines. We also observe the rise in the use of deep artificial neural networks. It is worth emphasizing, the increase of integrative models of multi-omic data analysis. The different biological conditions are a consequence of molecular homeostasis, driven by both protein coding regions, regulatory elements and the surrounding environment. It is notable that a large number of works make use of genetic expression data, which has been found to be the preferred method by researchers when training the different models. The biological problems addressed have been classified into five types: prognosis prediction, tumour subtypes, microsatellite instability (MSI), immunological aspects and certain pathways of interest. A clear trend was detected in the prediction of these conditions according to the type of tumour. That is the reason for which a greater number of works have focused on the BRCA cohort, while specific works for survival, for example, were centred on the GBM cohort, due to its large number of events. Throughout this review, it will be possible to go in depth into the works and the methodologies used to study TCGA cancer data. Finally, it is intended that this work will serve as a basis for future research in this field of study.

Download Full-text