Using association signal annotations to boost similarity network fusion

Abstract Motivation Recent technology developments have made it possible to generate various kinds of omics data, which provides opportunities to better solve problems such as disease subtyping or disease mapping using more comprehensive omics data jointly. Among many developed data-integration methods, the similarity network fusion (SNF) method has shown a great potential to identify new disease subtypes through separating similar subjects using multi-omics data. SNF effectively fuses similarity networks with pairwise patient similarity measures from different types of omics data into one fused network using both shared and complementary information across multiple types of omics data. Results In this article, we proposed an association-signal-annotation boosted similarity network fusion (ab-SNF) method, adding feature-level association signal annotations as weights aiming to up-weight signal features and down-weight noise features when constructing subject similarity networks to boost the performance in disease subtyping. In various simulation studies, the proposed ab-SNF outperforms the original SNF approach without weights. Most importantly, the improvement in the subtyping performance due to association-signal-annotation weights is amplified in the integration process. Applications to somatic mutation data, DNA methylation data and gene expression data of three cancer types from The Cancer Genome Atlas project suggest that the proposed ab-SNF method consistently identifies new subtypes in each cancer that more accurately predict patient survival and are more biologically meaningful. Availability and implementation The R package abSNF is freely available for downloading from https://github.com/pfruan/abSNF. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

An R Package for Divergence Analysis of Omics Data

10.1101/720391 ◽

2019 ◽

Author(s):

Wikum Dinalankara ◽

Qian Ke ◽

Donald Geman ◽

Luigi Marchionni

Keyword(s):

High Throughput Sequencing ◽

R Package ◽

The Cancer Genome Atlas ◽

High Dimensional ◽

Omics Data ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Ternary Code ◽

Cancer Genome Atlas ◽

Level Analysis

AbstractGiven the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with sample high throughput sequencing data from the Cancer Genome Atlas.

Download Full-text

An R package for divergence analysis of omics data

PLoS ONE ◽

10.1371/journal.pone.0249002 ◽

2021 ◽

Vol 16 (4) ◽

pp. e0249002

Author(s):

Wikum Dinalankara ◽

Qian Ke ◽

Donald Geman ◽

Luigi Marchionni

Keyword(s):

R Package ◽

The Cancer Genome Atlas ◽

High Dimensional ◽

Omics Data ◽

Ternary Code ◽

Cancer Genome Atlas ◽

Level Analysis ◽

Data Analysis Methods ◽

Genome Atlas ◽

Omics Data Analysis

Given the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with data from the Cancer Genome Atlas.

Download Full-text

Deep Subspace Mutual Learning For Cancer Subtypes Prediction

Bioinformatics ◽

10.1093/bioinformatics/btab625 ◽

2021 ◽

Author(s):

Bo Yang ◽

Ting-Ting Xin ◽

Shan-Min Pang ◽

Meng Wang ◽

Yi-Jie Wang

Keyword(s):

The Cancer Genome Atlas ◽

Supplementary Information ◽

Omics Data ◽

Computational Framework ◽

Disease Etiology ◽

Cancer Subtypes ◽

Mutual Learning ◽

Cancer Genome Atlas ◽

Precise Prediction ◽

Multi Level

Abstract Motivation Precise prediction of cancer subtypes is of significant importance in cancer diagnosis and treatment. Disease etiology is complicated existing at different omics levels, hence integrative analysis provides a very effective way to improve our understanding of cancer. Results We propose a novel computational framework, named Deep Subspace Mutual Learning (DSML). DSML has the capability to simultaneously learn the subspace structures in each available omics data and in overall multi-omics data by adopting deep neural networks, which thereby facilitates the subtypes prediction via clustering on multi-level, single level, and partial level omics data. Extensive experiments are performed in five different cancers on three levels of omics data from The Cancer Genome Atlas. The experimental analysis demonstrates that DSML delivers comparable or even better results than many state-of-the-art integrative methods. Availability An implementation and documentation of the DSML is publicly available at https://github.com/polytechnicXTT/Deep-Subspace-Mutual-Learning.git. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A novel similarity score based on gene ranks to reveal genetic relationships among diseases

PeerJ ◽

10.7717/peerj.10576 ◽

2021 ◽

Vol 9 ◽

pp. e10576

Author(s):

Dongmei Luo ◽

Chengdong Zhang ◽

Liwan Fu ◽

Yuening Zhang ◽

Yue-Qing Hu

Keyword(s):

Genetic Relationships ◽

Similarity Measures ◽

Similarity Score ◽

The Cancer Genome Atlas ◽

Genetic Mechanisms ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Pan Cancer ◽

Cancer Studies

Knowledge of similarities among diseases can contribute to uncovering common genetic mechanisms. Based on ranked gene lists, a couple of similarity measures were proposed in the literature. Notice that they may suffer from the determination of cutoff or heavy computational load, we propose a novel similarity score SimSIP among diseases based on gene ranks. Simulation studies under various scenarios demonstrate that SimSIP has better performance than existing rank-based similarity measures. Application of SimSIP in gene expression data of 18 cancer types from The Cancer Genome Atlas shows that SimSIP is superior in clarifying the genetic relationships among diseases and demonstrates the tendency to cluster the histologically or anatomically related cancers together, which is analogous to the pan-cancer studies. Moreover, SimSIP with simpler form and faster computation is more robust for higher levels of noise than existing methods and provides a basis for future studies on genetic relationships among diseases. In addition, a measure MAG is developed to gauge the magnitude of association of anindividual gene with diseases. By using MAG the genes and biological processes significantly associated with colorectal cancer are detected.

Download Full-text

Performance Comparison of Deep Learning Autoencoders for Cancer Subtype Detection Using Multi-Omics Data

Cancers ◽

10.3390/cancers13092013 ◽

2021 ◽

Vol 13 (9) ◽

pp. 2013

Author(s):

Edian F. Franco ◽

Pratip Rana ◽

Aline Cruz ◽

Víctor V. Calderón ◽

Vasco Azevedo ◽

...

Keyword(s):

Deep Learning ◽

Data Fusion ◽

Similarity Measures ◽

Research Problem ◽

Optimal Number ◽

Performance Comparison ◽

The Cancer Genome Atlas ◽

Cancer Type ◽

Omics Data ◽

Cancer Subtype

A heterogeneous disease such as cancer is activated through multiple pathways and different perturbations. Depending upon the activated pathway(s), the survival of the patients varies significantly and shows different efficacy to various drugs. Therefore, cancer subtype detection using genomics level data is a significant research problem. Subtype detection is often a complex problem, and in most cases, needs multi-omics data fusion to achieve accurate subtyping. Different data fusion and subtyping approaches have been proposed over the years, such as kernel-based fusion, matrix factorization, and deep learning autoencoders. In this paper, we compared the performance of different deep learning autoencoders for cancer subtype detection. We performed cancer subtype detection on four different cancer types from The Cancer Genome Atlas (TCGA) datasets using four autoencoder implementations. We also predicted the optimal number of subtypes in a cancer type using the silhouette score and found that the detected subtypes exhibit significant differences in survival profiles. Furthermore, we compared the effect of feature selection and similarity measures for subtype detection. For further evaluation, we used the Glioblastoma multiforme (GBM) dataset and identified the differentially expressed genes in each of the subtypes. The results obtained are consistent with other genomic studies and can be corroborated with the involved pathways and biological functions. Thus, it shows that the results from the autoencoders, obtained through the interaction of different datatypes of cancer, can be used for the prediction and characterization of patient subgroups and survival profiles.

Download Full-text

Inferring perturbation profiles of cancer samples

Bioinformatics ◽

10.1093/bioinformatics/btab113 ◽

2021 ◽

Author(s):

Martin Pirkl ◽

Niko Beerenwinkel

Keyword(s):

Indirect Evidence ◽

R Package ◽

The Cancer Genome Atlas ◽

Supplementary Information ◽

Patient Specific ◽

Driver Genes ◽

Cancer Driver ◽

Molecular Alterations ◽

Incomplete Coverage ◽

Gene Perturbations

Abstract Motivation Cancer is one of the most prevalent diseases in the world. Tumors arise due to important genes changing their activity, e.g. when inhibited or over-expressed. But these gene perturbations are difficult to observe directly. Molecular profiles of tumors can provide indirect evidence of gene perturbations. However, inferring perturbation profiles from molecular alterations is challenging due to error-prone molecular measurements and incomplete coverage of all possible molecular causes of gene perturbations. Results We have developed a novel mathematical method to analyze cancer driver genes and their patient-specific perturbation profiles. We combine genetic aberrations with gene expression data in a causal network derived across patients to infer unobserved perturbations. We show that our method can predict perturbations in simulations, CRISPR perturbation screens and breast cancer samples from The Cancer Genome Atlas. Availability and implementation The method is available as the R-package nempi at https://github.com/cbg-ethz/nempi and http://bioconductor.org/packages/nempi. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Novel cancer subtyping method based on patient-specific gene regulatory network

10.1101/2021.03.24.436731 ◽

2021 ◽

Author(s):

Mai Adachi Nakazawa ◽

Yoshinori Tamada ◽

Yoshihisa Tanaka ◽

Marie Ikeguchi ◽

Kako Higashihara ◽

...

Keyword(s):

Gene Networks ◽

Regulatory Networks ◽

The Cancer Genome Atlas ◽

Patient Specific ◽

Specific Gene ◽

Omics Data ◽

Cancer Subtypes ◽

Molecular Systems ◽

Molecular Features ◽

Cancer Genome Atlas

The identification of cancer subtypes is important for the understanding of tumor heterogeneity. In recent years, numerous computational methods have been proposed for this problem based on the multi-omics data of patients. It is widely accepted that different cancer subtypes are induced by different molecular regulatory networks. However, only a few incorporate the differences between their molecular systems into the classification processes. In this study, we present a novel method to classify cancer subtypes based on patient-specific molecular systems. Our method quantifies patient-specific gene networks, which are estimated from their transcriptome data. By clustering their quantified networks, our method allows for cancer subtyping, taking into consideration the differences in the molecular systems of patients. Comprehensive analyses of The Cancer Genome Atlas (TCGA) datasets applied to our method confirmed that they were able to identify more clinically meaningful cancer subtypes than the existing subtypes and found that the identified subtypes comprised different molecular features. Our findings show that the proposed method, based on a simple classification using the patient-specific molecular systems, can identify cancer subtypes even with single omics data, which cannot otherwise be captured by existing methods using multi-omics data.

Download Full-text

Integrated Dissection of lncRNA-Perturbated Triplets Reveals Novel Prognostic Signatures Across Cancer Types

International Journal of Molecular Sciences ◽

10.3390/ijms21176087 ◽

2020 ◽

Vol 21 (17) ◽

pp. 6087

Author(s):

Yunzhen Wei ◽

Limeng Zhou ◽

Yingzhang Huang ◽

Dianjing Guo

Keyword(s):

Cancer Biology ◽

Noncoding Rna ◽

The Cancer Genome Atlas ◽

Dynamic Changes ◽

Computational Framework ◽

Potential Biomarker ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Tcga Dataset ◽

Pan Cancer

Long noncoding RNA (lncRNA)/microRNA(miRNA)/mRNA triplets contribute to cancer biology. However, identifying significative triplets remains a major challenge for cancer research. The dynamic changes among factors of the triplets have been less understood. Here, by integrating target information and expression datasets, we proposed a novel computational framework to identify the triplets termed as “lncRNA-perturbated triplets”. We applied the framework to five cancer datasets in The Cancer Genome Atlas (TCGA) project and identified 109 triplets. We showed that the paired miRNAs and mRNAs were widely perturbated by lncRNAs in different cancer types. LncRNA perturbators and lncRNA-perturbated mRNAs showed significantly higher evolutionary conservation than other lncRNAs and mRNAs. Importantly, the lncRNA-perturbated triplets exhibited high cancer specificity. The pan-cancer perturbator OIP5-AS1 had higher expression level than that of the cancer-specific perturbators. These lncRNA perturbators were significantly enriched in known cancer-related pathways. Furthermore, among the 25 lncRNA in the 109 triplets, lncRNA SNHG7 was identified as a stable potential biomarker in lung adenocarcinoma (LUAD) by combining the TCGA dataset and two independent GEO datasets. Results from cell transfection also indicated that overexpression of lncRNA SNHG7 and TUG1 enhanced the expression of the corresponding mRNA PNMA2 and CDC7 in LUAD. Our study provides a systematic dissection of lncRNA-perturbated triplets and facilitates our understanding of the molecular roles of lncRNAs in cancers.

Download Full-text

Epstein-Barr Virus-Positive Cancers Show Altered B-Cell Clonality

mSystems ◽

10.1128/msystems.00081-18 ◽

2018 ◽

Vol 3 (5) ◽

Cited By ~ 11

Author(s):

Sara R. Selitsky ◽

David Marron ◽

Lisle E. Mose ◽

Joel S. Parker ◽

Dirk P. Dittmer

Keyword(s):

B Cells ◽

B Cell ◽

Epstein Barr Virus ◽

The Cancer Genome Atlas ◽

Barr Virus ◽

Cancer Genome Atlas ◽

Cancer Types ◽

Epstein Barr ◽

Tumor Types ◽

Genome Atlas

ABSTRACTEpstein-Barr virus (EBV) is convincingly associated with gastric cancer, nasopharyngeal carcinoma, and certain lymphomas, but its role in other cancer types remains controversial. To test the hypothesis that there are additional cancer types with high prevalence of EBV, we determined EBV viral expression in all the Cancer Genome Atlas Project (TCGA) mRNA sequencing (mRNA-seq) samples (n= 10,396) from 32 different tumor types. We found that EBV was present in gastric adenocarcinoma and lymphoma, as expected, and was also present in >5% of samples in 10 additional tumor types. For most samples, EBV transcript levels were low, which suggests that EBV was likely present due to infected infiltrating B cells. In order to determine if there was a difference in the B-cell populations, we assembled B-cell receptors for each sample and found B-cell receptor abundance (P≤ 1.4 × 10−20) and diversity (P≤ 8.3 × 10−27) were significantly higher in EBV-positive samples. Moreover, diversity was independent of B-cell abundance, suggesting that the presence of EBV was associated with an increased and altered B-cell population.IMPORTANCEAround 20% of human cancers are associated with viruses. Epstein-Barr virus (EBV) contributes to gastric cancer, nasopharyngeal carcinoma, and certain lymphomas, but its role in other cancer types remains controversial. We assessed the prevalence of EBV in RNA-seq from 32 tumor types in the Cancer Genome Atlas Project (TCGA) and found EBV to be present in >5% of samples in 12 tumor types. EBV infects epithelial cells and B cells and in B cells causes proliferation. We hypothesized that the low expression of EBV in most of the tumor types was due to infiltration of B cells into the tumor. The increase in B-cell abundance and diversity in subjects where EBV was detected in the tumors strengthens this hypothesis. Overall, we found that EBV was associated with an increased and altered immune response. This result is not evidence of causality, but a potential novel biomarker for tumor immune status.

Download Full-text

Efficient weighted univariate clustering maps outstanding dysregulated genomic zones in human cancers

Bioinformatics ◽

10.1093/bioinformatics/btaa613 ◽

2020 ◽

Vol 36 (20) ◽

pp. 5027-5036 ◽

Cited By ~ 3

Author(s):

Mingzhou Song ◽

Hua Zhong

Keyword(s):

Clustering Algorithm ◽

Human Cancer ◽

Clustering Algorithms ◽

Search Space ◽

R Package ◽

Supplementary Information ◽

Diagnostic Biomarkers ◽

Cancer Types ◽

Molecular Patterns ◽

Pan Cancer

Abstract Motivation Chromosomal patterning of gene expression in cancer can arise from aneuploidy, genome disorganization or abnormal DNA methylation. To map such patterns, we introduce a weighted univariate clustering algorithm to guarantee linear runtime, optimality and reproducibility. Results We present the chromosome clustering method, establish its optimality and runtime and evaluate its performance. It uses dynamic programming enhanced with an algorithm to reduce search-space in-place to decrease runtime overhead. Using the method, we delineated outstanding genomic zones in 17 human cancer types. We identified strong continuity in dysregulation polarity—dominance by either up- or downregulated genes in a zone—along chromosomes in all cancer types. Significantly polarized dysregulation zones specific to cancer types are found, offering potential diagnostic biomarkers. Unreported previously, a total of 109 loci with conserved dysregulation polarity across cancer types give insights into pan-cancer mechanisms. Efficient chromosomal clustering opens a window to characterize molecular patterns in cancer genome and beyond. Availability and implementation Weighted univariate clustering algorithms are implemented within the R package ‘Ckmeans.1d.dp’ (4.0.0 or above), freely available at https://cran.r-project.org/package=Ckmeans.1d.dp. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text