DiffVar: A new method for detecting differential variability with application to methylation in cancer and aging

Mapping Intimacies ◽

10.1101/008847 ◽

2014 ◽

Author(s):

Belinda Phipson ◽

Alicia Oshlack

Keyword(s):

Empirical Bayes ◽

Cost Effective ◽

R Package ◽

The Cancer Genome Atlas ◽

Model Framework ◽

Methylation Of Dna ◽

Illumina Humanmethylation450 ◽

Illumina Humanmethylation450 Beadchip ◽

Cancer Genome Atlas ◽

Differential Variability

Methylation of DNA is known to be essential to development and dramatically altered in cancers. The Illumina HumanMethylation450 BeadChip has been used extensively as a cost-effective way to profile nearly half a million CpG sites across the human genome. Here we present DiffVar, a novel method to test for differential variability between sample groups. DiffVar employs an empirical Bayes model framework that can take into account any experimental design and is robust to outliers. We applied DiffVar to several datasets from The Cancer Genome Atlas, as well as an aging dataset. DiffVar is available in the missMethyl Bioconductor R package.

Download Full-text

missMethyl: an R package for analyzing data from Illumina’s HumanMethylation450 platform

Bioinformatics ◽

10.1093/bioinformatics/btv560 ◽

2015 ◽

Vol 32 (2) ◽

pp. 286-288 ◽

Cited By ~ 190

Author(s):

Belinda Phipson ◽

Jovana Maksimovic ◽

Alicia Oshlack

Keyword(s):

Dna Methylation ◽

Cost Effective ◽

R Package ◽

Supplementary Information ◽

450K Array ◽

Illumina Humanmethylation450 ◽

Bioconductor Project ◽

Illumina Humanmethylation450 Beadchip ◽

Differential Variability ◽

Differential Methylation Analysis

Abstract Summary: DNA methylation is one of the most commonly studied epigenetic modifications due to its role in both disease and development. The Illumina HumanMethylation450 BeadChip is a cost-effective way to profile >450 000 CpGs across the human genome, making it a popular platform for profiling DNA methylation. Here we introduce missMethyl, an R package with a suite of tools for performing normalization, removal of unwanted variation in differential methylation analysis, differential variability testing and gene set analysis for the 450K array. Availability and implementation: missMethyl is an R package available from the Bioconductor project at www.bioconductor.org. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

An R Package for Divergence Analysis of Omics Data

10.1101/720391 ◽

2019 ◽

Author(s):

Wikum Dinalankara ◽

Qian Ke ◽

Donald Geman ◽

Luigi Marchionni

Keyword(s):

High Throughput Sequencing ◽

R Package ◽

The Cancer Genome Atlas ◽

High Dimensional ◽

Omics Data ◽

Sequencing Data ◽

High Throughput Sequencing Data ◽

Ternary Code ◽

Cancer Genome Atlas ◽

Level Analysis

AbstractGiven the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with sample high throughput sequencing data from the Cancer Genome Atlas.

Download Full-text

An R package for divergence analysis of omics data

PLoS ONE ◽

10.1371/journal.pone.0249002 ◽

2021 ◽

Vol 16 (4) ◽

pp. e0249002

Author(s):

Wikum Dinalankara ◽

Qian Ke ◽

Donald Geman ◽

Luigi Marchionni

Keyword(s):

R Package ◽

The Cancer Genome Atlas ◽

High Dimensional ◽

Omics Data ◽

Ternary Code ◽

Cancer Genome Atlas ◽

Level Analysis ◽

Data Analysis Methods ◽

Genome Atlas ◽

Omics Data Analysis

Given the ever-increasing amount of high-dimensional and complex omics data becoming available, it is increasingly important to discover simple but effective methods of analysis. Divergence analysis transforms each entry of a high-dimensional omics profile into a digitized (binary or ternary) code based on the deviation of the entry from a given baseline population. This is a novel framework that is significantly different from existing omics data analysis methods: it allows digitization of continuous omics data at the univariate or multivariate level, facilitates sample level analysis, and is applicable on many different omics platforms. The divergence package, available on the R platform through the Bioconductor repository collection, provides easy-to-use functions for carrying out this transformation. Here we demonstrate how to use the package with data from the Cancer Genome Atlas.

Download Full-text

Monitoring of Technical Variation in Quantitative High-Throughput Datasets

Cancer Informatics ◽

10.4137/cin.s12862 ◽

2013 ◽

Vol 12 ◽

pp. CIN.S12862 ◽

Cited By ~ 36

Author(s):

Martin Lauss ◽

Ilhami Visne ◽

Albert Kriegner ◽

Markus Ringnér ◽

Göran Jönsson ◽

...

Keyword(s):

High Throughput ◽

Principal Components ◽

General Procedure ◽

R Package ◽

The Cancer Genome Atlas ◽

Batch Effects ◽

Cancer Genome Atlas ◽

Technical Bias ◽

Microarray Datasets ◽

High Dimensional Datasets

High-dimensional datasets can be confounded by variation from technical sources, such as batches. Undetected batch effects can have severe consequences for the validity of a study's conclusion(s). We evaluate high-throughput RNAseq and miRNAseq as well as DNA methylation and gene expression microarray datasets, mainly from the Cancer Genome Atlas (TCGA) project, in respect to technical and biological annotations. We observe technical bias in these datasets and discuss corrective interventions. We then suggest a general procedure to control study design, detect technical bias using linear regression of principal components, correct for batch effects, and re-evaluate principal components. This procedure is implemented in the R package swamp, and as graphical user interface software. In conclusion, high-throughput platforms that generate continuous measurements are sensitive to various forms of technical bias. For such data, monitoring of technical variation is an important analysis step.

Download Full-text

An R Implementation of Tumor-Stroma-Immune Transcriptome Deconvolution Pipeline using DeMixT

10.1101/566075 ◽

2019 ◽

Author(s):

Shaolong Cao ◽

Zeya Wang ◽

Fan Gao ◽

Jingxiao Chen ◽

Feng Zhang ◽

...

Keyword(s):

Cancer Progression ◽

Expression Profiles ◽

Progression Free Survival ◽

Tumor Stroma ◽

R Package ◽

The Cancer Genome Atlas ◽

Biological Information ◽

Computationally Efficient ◽

Multiple Cancer ◽

Cancer Genome Atlas

AbstractThe deconvolution of transcriptomic data from heterogeneous tissues in cancer studies remains challenging. Available software faces difficulties for accurately estimating both component-specific proportions and expression profiles for individual samples. To address these challenges, we present a new R-implementation pipeline for the more accurate and efficient transcriptome deconvolution of high dimensional data from mixtures of more than two components. The pipeline utilizes the computationally efficient DeMixT R-package with OpenMP and additional cancer-specific biological information to perform three-component deconvolution without requiring data from the immune profiles. It enables a wide application of DeMixT to gene expression datasets available from cancer consortium such as the Cancer Genome Atlas (TCGA) projects, where, other than the mixed tumor samples, a handful of normal samples are profiled in multiple cancer types. We have applied this pipeline to two TCGA datasets in colorectal adenocarcinoma (COAD) and prostate adenocarcinoma (PRAD). In COAD, we found varying distributions of immune proportions across the Consensus Molecular Subtypes, from the highest to the lowest being CMS1, CMS3, CMS4 and CMS2. In PRAD, we found the immune proportions are associated with progression-free survival (p<0.01) and negatively correlated with Gleason scores (p<0.001). Our DeMixT-centered analysis protocol opens up new opportunities to investigate the tumor-stroma-immune microenvironment, by providing both proportions and component-specific expressions, and thus better define the underlying biology of cancer progression.Availability and implementation: An R package, scripts and data are available: https://github.com/wwylab/DeMixTallmaterials.

Download Full-text

Identification of a Ubiquitin Related Genes Signature for Predicting Prognosis of Prostate Cancer

Frontiers in Genetics ◽

10.3389/fgene.2021.778503 ◽

2022 ◽

Vol 12 ◽

Author(s):

Guoda Song ◽

Yucong Zhang ◽

Hao Li ◽

Zhuo Liu ◽

Wen Song ◽

...

Keyword(s):

Prostate Cancer ◽

Roc Curve ◽

Cox Regression ◽

R Package ◽

The Cancer Genome Atlas ◽

Prognostic Signature ◽

Post Translational Modifications ◽

Training Cohort ◽

Kaplan Meier ◽

Cancer Genome Atlas

Background: Ubiquitin and ubiquitin-like (UB/UBL) conjugations are one of the most important post-translational modifications and involve in the occurrence of cancers. However, the biological function and clinical significance of ubiquitin related genes (URGs) in prostate cancer (PCa) are still unclear.Methods: The transcriptome data and clinicopathological data were downloaded from The Cancer Genome Atlas (TCGA), which was served as training cohort. The GSE21034 dataset was used to validate. The two datasets were removed batch effects and normalized using the “sva” R package. Univariate Cox, LASSO Cox, and multivariate Cox regression were performed to identify a URGs prognostic signature. Then Kaplan-Meier curve and receiver operating characteristic (ROC) curve analyses were used to evaluate the performance of the URGs signature. Thereafter, a nomogram was constructed and evaluated.Results: A six-URGs signature was established to predict biochemical recurrence (BCR) of PCa, which included ARIH2, FBXO6, GNB4, HECW2, LZTR1 and RNF185. Kaplan-Meier curve and ROC curve analyses revealed good performance of the prognostic signature in both training cohort and validation cohort. Univariate and multivariate Cox analyses showed the signature was an independent prognostic factor for BCR of PCa in training cohort. Then a nomogram based on the URGs signature and clinicopathological factors was established and showed an accurate prediction for prognosis in PCa.Conclusion: Our study established a URGs prognostic signature and constructed a nomogram to predict the BCR of PCa. This study could help with individualized treatment and identify PCa patients with high BCR risks.

Download Full-text

Inferring perturbation profiles of cancer samples

10.1101/2020.12.10.419077 ◽

2020 ◽

Author(s):

Martin Pirkl ◽

Niko Beerenwinkel

Keyword(s):

Indirect Evidence ◽

R Package ◽

The Cancer Genome Atlas ◽

Patient Specific ◽

Driver Genes ◽

Cancer Driver ◽

Molecular Alterations ◽

Incomplete Coverage ◽

Cancer Genome Atlas ◽

Gene Perturbations

AbstractMotivationCancer is one of the most prevalent diseases in the world. Tumors arise due to important genes changing their activity, e.g., when inhibited or over-expressed. But these gene perturbations are difficult to observe directly. Molecular profiles of tumors can provide indirect evidence of gene perturbations. However, inferring perturbation profiles from molecular alterations is challenging due to error-prone molecular measurements and incomplete coverage of all possible molecular causes of gene perturbations.ResultsWe have developed a novel mathematical method to analyze cancer driver genes and their patient-specific perturbation profiles. We combine genetic aberrations with gene expression data in a causal network derived across patients to infer unobserved perturbations. We show that our method can predict perturbations in simulations, CRISPR perturbation screens, and breast cancer samples from The Cancer Genome Atlas.AvailabilityThe method is available as the R-package nempi at https://github.com/cbg-ethz/[email protected], [email protected]

Download Full-text

IMIX: A multivariate mixture model approach to integrative analysis of multiple types of omics data

10.1101/2020.06.23.167312 ◽

2020 ◽

Author(s):

Ziqiao Wang ◽

Peng Wei

Keyword(s):

Mixture Model ◽

Statistical Power ◽

Large Scale ◽

Complex Disease ◽

Ad Hoc ◽

Genomic Analysis ◽

R Package ◽

Data Type ◽

The Cancer Genome Atlas ◽

Model Framework

AbstractMotivationIntegrative genomic analysis is a powerful tool to study the biological mechanisms underlying a complex disease or trait across multiplatform high-dimensional data, such as DNA methylation, copy number variation (CNV), and gene expression. It is common to perform large-scale genome-wide association analysis of an outcome for each data type separately and combine the results ad hoc, leading to loss of statistical power and uncontrolled overall false discovery rate (FDR).ResultsWe propose a multivariate mixture model framework (IMIX) that integrates multiple types of genomic data and allows examining and relaxing the commonly adopted conditional independence assumption. We investigate across-data-type FDR control in IMIX, and show the gain in lower misclassification rates at controlled over-all FDR compared with established individual data type analysis strategies, such as Benjamini-Hochberg FDR control, the q-value, and the local FDR control by extensive simulations. IMIX features statistically-principled model selection, FDR control, and computational efficiency. Applications to the Cancer Genome Atlas (TCGA) data provide novel multi-omic insights into the luminal/basal subtyping of bladder cancer and the prognosis of pancreatic cancer.Availability and implementationWe have implemented our method in R package “IMIX” with instructions and examples available at https://github.com/ziqiaow/IMIX.

Download Full-text

Integrating Transcriptomics for the Identification of Potential Age-related Genes and Cells in Three Major Urogenital Cancers Across the Cancer Genome Atlas

10.21203/rs.3.rs-41767/v1 ◽

2020 ◽

Author(s):

Jinlong Cao ◽

Jianpeng Li ◽

Xin Yang ◽

Pan Li ◽

Zhiqiang Yao ◽

...

Keyword(s):

Differentially Expressed Genes ◽

R Package ◽

Cancer Genome ◽

The Cancer Genome Atlas ◽

Differentially Expressed ◽

Hub Genes ◽

Age Related ◽

Cancer Genome Atlas ◽

Urogenital Cancers ◽

Genome Atlas

Abstract Background: Cancer is often defined as a disease of aging. The majority of patients with urogenital cancers are the elderly, whose clinical characteristics are greatly affected by age and aging. Here, we aimed to explore age-related biological changes in three major urogenital cancers by integrative bioinformatics analysis.Methods: First, mRNA (count format) and clinical data for bladder cancer, prostate cancer and renal cell carcinoma were downloaded from the Cancer Genome Atlas (TCGA) portal. The expressions of 64 cells were obtained by xCell deconvolution method. EdgeR package and limma package were used to analyze differentially expressed genes and cells in the young group and the old group, respectively. ClusterProfiler R package and clueGO plugin were used for enrichment analysis, and cytohubba plugin was used for hub genes analysis. Then co-expression analysis and chromosome distribution for hub genes were analyzed and demonstrated by RIdeogram R package. The clinical correlation of hub genes and key cells was analyzed by Graphpad Prism software. Finally, the correlation between hub genes and key cells was explored by corrplot R package.Results: We screened and identified 14 hub genes and 4 key cells related to age and urogenital cancers. The age-related differentially expressed genes and co-expressed genes were mainly enriched in muscle movement (Cl-, Ca2+), inflammatory response, antibacterial humoral immune response, substance metabolism and transport, redox reaction, etc. Most of the age-related genes are on chromosome 17. Moreover, the correlation between cells and genes was analyzed. Conclusion: Our study analyzed age-related genes and cells in the tumor microenvironment of urogenital cancers, and explored the pathways involved. This could contribute to personalized therapy for patients of different ages and a new understanding of the potential relationship between the aging microenvironment and urogenital cancers.

Download Full-text

Semi-supervised identification of cancer subgroups using survival outcomes and overlapping grouping information

Statistical Methods in Medical Research ◽

10.1177/0962280217752980 ◽

2018 ◽

Vol 28 (7) ◽

pp. 2137-2149 ◽

Cited By ~ 1

Author(s):

Wei Wei ◽

Zequn Sun ◽

Willian A da Silveira ◽

Zhenning Yu ◽

Andrew Lawson ◽

...

Keyword(s):

Ovarian Cancer ◽

Cancer Progression ◽

High Throughput ◽

Genomic Data ◽

R Package ◽

The Cancer Genome Atlas ◽

Integrative Genomics ◽

Molecular Features ◽

Biological Interpretation ◽

Cancer Genome Atlas

Identification of cancer patient subgroups using high throughput genomic data is of critical importance to clinicians and scientists because it can offer opportunities for more personalized treatment and overlapping treatments of cancers. In spite of tremendous efforts, this problem still remains challenging because of low reproducibility and instability of identified cancer subgroups and molecular features. In order to address this challenge, we developed Integrative Genomics Robust iDentification of cancer subgroups (InGRiD), a statistical approach that integrates information from biological pathway databases with high-throughput genomic data to improve the robustness for identification and interpretation of molecularly-defined subgroups of cancer patients. We applied InGRiD to the gene expression data of high-grade serous ovarian cancer from The Cancer Genome Atlas and the Australian Ovarian Cancer Study. The results indicate clear benefits of the pathway-level approaches over the gene-level approaches. In addition, using the proposed InGRiD framework, we also investigate and address the issue of gene sharing among pathways, which often occurs in practice, to further facilitate biological interpretation of key molecular features associated with cancer progression. The R package “InGRiD” implementing the proposed approach is currently available in our research group GitHub webpage ( https://dongjunchung.github.io/INGRID/ ).

Download Full-text