MOVICS: an R package for multi-omics integration and visualization in cancer subtyping

Bioinformatics ◽

10.1093/bioinformatics/btaa1018 ◽

2020 ◽

Author(s):

Xiaofan Lu ◽

Jialin Meng ◽

Yujie Zhou ◽

Liyun Jiang ◽

Fangrong Yan

Keyword(s):

Clustering Algorithms ◽

R Package ◽

Supplementary Information ◽

Multiple Perspectives ◽

Model Free ◽

Omics Integration ◽

Wide Range ◽

Breast Cancer Cohort ◽

The One ◽

Minimal Effort

Abstract Summary Stratification of cancer patients into distinct molecular subgroups based on multi-omics data is an important issue in the context of precision medicine. Here, we present MOVICS, an R package for multi-omics integration and visualization in cancer subtyping. MOVICS provides a unified interface for 10 state-of-the-art multi-omics integrative clustering algorithms, and incorporates the most commonly used downstream analyses in cancer subtyping researches, including characterization and comparison of identified subtypes from multiple perspectives, and verification of subtypes in external cohort using two model-free approaches for multiclass prediction. MOVICS also creates feature rich customizable visualizations with minimal effort. By analysing two published breast cancer cohort, we signifies that MOVICS can serve a wide range of users and assist cancer therapy by moving away from the ‘one-size-fits-all’ approach to patient care. Availability and implementation MOVICS package and online tutorial are freely available at https://github.com/xlucpu/MOVICS. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MOVICS: an R package for multi-omics integration and visualization in cancer subtyping

10.1101/2020.09.15.297820 ◽

2020 ◽

Author(s):

Xiaofan Lu ◽

Jialin Meng ◽

Yujie Zhou ◽

Liyun Jiang ◽

Fangrong Yan

Keyword(s):

State Of The Art ◽

Clustering Algorithms ◽

R Package ◽

Molecular Subgroups ◽

Multiple Perspectives ◽

Online Tutorial ◽

Model Free ◽

Omics Integration ◽

Model Free Approach ◽

Minimal Effort

AbstractSummaryStratification of cancer patients into distinct molecular subgroups based on multi-omics data is an important issue in the context of precision medicine. Here we present MOVICS, an R package for multi-omics integration and visualization in cancer subtyping. MOVICS provides a unified interface for 10 state-of-the-art multi-omics integrative clustering algorithms, and incorporates the most commonly used downstream analyses in cancer subtyping researches, including characterization and comparison of identified subtypes from multiple perspectives, and verification of subtypes in external cohort using a model-free approach for multiclass prediction. MOVICS also creates feature rich customizable visualizations with minimal effort.Availability and implementationMOVICS package and online tutorial are freely available at https://github.com/xlucpu/MOVICS.

Download Full-text

BloodGen3Module: Blood transcriptional module repertoire analysis and visualization using R

Bioinformatics ◽

10.1093/bioinformatics/btab121 ◽

2021 ◽

Author(s):

Darawan Rinchai ◽

Jessica Roelands ◽

Mohammed Toufiq ◽

Wouter Hendrickx ◽

Matthew C Altman ◽

...

Keyword(s):

Transcript Abundance ◽

R Package ◽

Supplementary Information ◽

Illustrative Case ◽

Bioinformatic Tools ◽

Transcriptional Module ◽

Wide Range ◽

Downstream Analysis ◽

Computing Module ◽

Parallel Workflow

Abstract Motivation We previously described the construction and characterization of generic and reusable blood transcriptional module repertoires. More recently we released a third iteration (“BloodGen3” module repertoire) that comprises 382 functionally annotated gene sets (modules) and encompasses 14,168 transcripts. Custom bioinformatic tools are needed to support downstream analysis, visualization and interpretation relying on such fixed module repertoires. Results We have developed and describe here a R package, BloodGen3Module. The functions of our package permit group comparison analyses to be performed at the module-level, and to display the results as annotated fingerprint grid plots. A parallel workflow for computing module repertoire changes for individual samples rather than groups of samples is also available; these results are displayed as fingerprint heatmaps. An illustrative case is used to demonstrate the steps involved in generating blood transcriptome repertoire fingerprints of septic patients. Taken together, this resource could facilitate the analysis and interpretation of changes in blood transcript abundance observed across a wide range of pathological and physiological states. Availability The BloodGen3Module package and documentation are freely available from Github: https://github.com/Drinchai/BloodGen3Module Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Efficient weighted univariate clustering maps outstanding dysregulated genomic zones in human cancers

Bioinformatics ◽

10.1093/bioinformatics/btaa613 ◽

2020 ◽

Vol 36 (20) ◽

pp. 5027-5036 ◽

Cited By ~ 3

Author(s):

Mingzhou Song ◽

Hua Zhong

Keyword(s):

Clustering Algorithm ◽

Human Cancer ◽

Clustering Algorithms ◽

Search Space ◽

R Package ◽

Supplementary Information ◽

Diagnostic Biomarkers ◽

Cancer Types ◽

Molecular Patterns ◽

Pan Cancer

Abstract Motivation Chromosomal patterning of gene expression in cancer can arise from aneuploidy, genome disorganization or abnormal DNA methylation. To map such patterns, we introduce a weighted univariate clustering algorithm to guarantee linear runtime, optimality and reproducibility. Results We present the chromosome clustering method, establish its optimality and runtime and evaluate its performance. It uses dynamic programming enhanced with an algorithm to reduce search-space in-place to decrease runtime overhead. Using the method, we delineated outstanding genomic zones in 17 human cancer types. We identified strong continuity in dysregulation polarity—dominance by either up- or downregulated genes in a zone—along chromosomes in all cancer types. Significantly polarized dysregulation zones specific to cancer types are found, offering potential diagnostic biomarkers. Unreported previously, a total of 109 loci with conserved dysregulation polarity across cancer types give insights into pan-cancer mechanisms. Efficient chromosomal clustering opens a window to characterize molecular patterns in cancer genome and beyond. Availability and implementation Weighted univariate clustering algorithms are implemented within the R package ‘Ckmeans.1d.dp’ (4.0.0 or above), freely available at https://cran.r-project.org/package=Ckmeans.1d.dp. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

gpart: human genome partitioning and visualization of high-density SNP data by identifying haplotype blocks

Bioinformatics ◽

10.1093/bioinformatics/btz308 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4419-4421 ◽

Cited By ~ 3

Author(s):

Sun Ah Kim ◽

Myriam Brossard ◽

Delnaz Roshandel ◽

Andrew D Paterson ◽

Shelley B Bull ◽

...

Keyword(s):

Clustering Algorithms ◽

R Package ◽

Supplementary Information ◽

Visualization Tool ◽

Sequencing Data ◽

Haplotype Blocks ◽

Snp Data ◽

Computing Environments ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Abstract Summary For the analysis of high-throughput genomic data produced by next-generation sequencing (NGS) technologies, researchers need to identify linkage disequilibrium (LD) structure in the genome. In this work, we developed an R package gpart which provides clustering algorithms to define LD blocks or analysis units consisting of SNPs. The visualization tool in gpart can display the LD structure and gene positions for up to 20 000 SNPs in one image. The gpart functions facilitate construction of LD blocks and SNP partitions for vast amounts of genome sequencing data within reasonable time and memory limits in personal computing environments. Availability and implementation The R package is available at https://bioconductor.org/packages/gpart. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

pldist: ecological dissimilarities for paired and longitudinal microbiome association analysis

Bioinformatics ◽

10.1093/bioinformatics/btz120 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3567-3575 ◽

Cited By ~ 4

Author(s):

Anna M Plantinga ◽

Jun Chen ◽

Robert R Jenq ◽

Michael C Wu

Keyword(s):

Statistical Power ◽

Human Microbiome ◽

R Package ◽

Supplementary Information ◽

Microbiome Composition ◽

Type 1 Error ◽

Wide Range ◽

Subject Variability ◽

Ordination Analysis

Abstract Motivation The human microbiome is notoriously variable across individuals, with a wide range of ‘healthy’ microbiomes. Paired and longitudinal studies of the microbiome have become increasingly popular as a way to reduce unmeasured confounding and to increase statistical power by reducing large inter-subject variability. Statistical methods for analyzing such datasets are scarce. Results We introduce a paired UniFrac dissimilarity that summarizes within-individual (or within-pair) shifts in microbiome composition and then compares these compositional shifts across individuals (or pairs). This dissimilarity depends on a novel transformation of relative abundances, which we then extend to more than two time points and incorporate into several phylogenetic and non-phylogenetic dissimilarities. The data transformation and resulting dissimilarities may be used in a wide variety of downstream analyses, including ordination analysis and distance-based hypothesis testing. Simulations demonstrate that tests based on these dissimilarities retain appropriate type 1 error and high power. We apply the method in two real datasets. Availability and implementation The R package pldist is available on GitHub at https://github.com/aplantin/pldist. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

VOLTA: adVanced mOLecular neTwork Analysis

Bioinformatics ◽

10.1093/bioinformatics/btab642 ◽

2021 ◽

Author(s):

Alisa Pavel ◽

Antonio Federico ◽

Giusy del Giudice ◽

Angela Serra ◽

Dario Greco

Keyword(s):

Network Analysis ◽

Clustering Algorithms ◽

Expression Patterns ◽

Direct Access ◽

Supplementary Information ◽

Complete Analysis ◽

Wide Range ◽

Novice Users ◽

Analytical Step ◽

The Individual

Abstract Motivation Network analysis is a powerful approach to investigate biological systems. It is often applied to study gene co-expression patterns derived from transcriptomics experiments. Even though co-expression analysis is widely used, there is still a lack of tools that are open and customizable on the basis of different network types and analysis scenarios (e.g. through function accessibility), but are also suitable for novice users by providing complete analysis pipelines. Results We developed VOLTA, a Python package suited for complex co-expression network analysis. VOLTA is designed to allow users direct access to the individual functions, while they are also provided with complete analysis pipelines. Moreover, VOLTA offers when possible multiple algorithms applicable to each analytical step (e.g. multiple community detection or clustering algorithms are provided), hence providing the user with the possibility to perform analysis tailored to their needs. This makes VOLTA highly suitable for experienced users who wish to build their own analysis pipelines for a wide range of networks as well as for novice users for which a ‘plug and play’ system is provided. Availability and implementation The package and used data are available at GitHub: https://github.com/fhaive/VOLTA and 10.5281/zenodo.5171719. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SPsimSeq: semi-parametric simulation of bulk and single-cell RNA-sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btaa105 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3276-3278 ◽

Cited By ~ 2

Author(s):

Alemu Takele Assefa ◽

Jo Vandesompele ◽

Olivier Thas

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Real Data ◽

Simulation Method ◽

R Package ◽

Supplementary Information ◽

Expression Data ◽

Sequencing Data ◽

Wide Range ◽

Single Cell Rna Sequencing

Abstract Summary SPsimSeq is a semi-parametric simulation method to generate bulk and single-cell RNA-sequencing data. It is designed to simulate gene expression data with maximal retention of the characteristics of real data. It is reasonably flexible to accommodate a wide range of experimental scenarios, including different sample sizes, biological signals (differential expression) and confounding batch effects. Availability and implementation The R package and associated documentation is available from https://github.com/CenterForStatistics-UGent/SPsimSeq. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ascend: R package for analysis of single cell RNA-seq data

10.1101/207704 ◽

2017 ◽

Cited By ~ 11

Author(s):

Anne Senabouth ◽

Samuel W Lukowski ◽

Jose Alquicira Hernandez ◽

Stacey Andersen ◽

Xin Mei ◽

...

Keyword(s):

Single Cell ◽

R Package ◽

Computational Genomics ◽

Supplementary Information ◽

Rna Seq ◽

Software Packages ◽

Wide Range ◽

Flexible Framework ◽

Supplementary Material ◽

Data Objects

AbstractSummaryascend is an R package comprised of fast, streamlined analysis functions optimized to address the statistical challenges of single cell RNA-seq. The package incorporates novel and established methods to provide a flexible framework to perform filtering, quality control, normalization, dimension reduction, clustering, differential expression and a wide-range of plotting. ascend is designed to work with scRNA-seq data generated by any high-throughput platform, and includes functions to convert data objects between software packages.AvailabilityThe R package and associated vignettes are freely available at https://github.com/IMB-Computational-Genomics-Lab/[email protected] informationAn example dataset is available at ArrayExpress, accession number E-MTAB-6108

Download Full-text

Evaluating single-cell cluster stability using the Jaccard similarity index

10.1101/2020.05.26.116640 ◽

2020 ◽

Cited By ~ 1

Author(s):

Ming Tang ◽

Yasin Kaymaz ◽

Brandon Logeman ◽

Stephen Eichhorn ◽

ZhengZheng S. Liang ◽

...

Keyword(s):

Single Cell ◽

Clustering Algorithms ◽

Similarity Index ◽

R Package ◽

Supplementary Information ◽

Clustering Methods ◽

K Nearest Neighbor ◽

Jaccard Similarity ◽

Cluster Stability ◽

Link Type

AbstractMotivationOne major goal of single-cell RNA sequencing (scRNAseq) experiments is to identify novel cell types. With increasingly large scRNAseq datasets, unsupervised clustering methods can now produce detailed catalogues of transcriptionally distinct groups of cells in a sample. However, the interpretation of these clusters is challenging for both technical and biological reasons. Popular clustering algorithms are sensitive to parameter choices, and can produce different clustering solutions with even small changes in the number of principal components used, the k nearest neighbor, and the resolution parameters, among others.ResultsHere, we present a set of tools to evaluate cluster stability by subsampling, which can guide parameter choice and aid in biological interpretation. The R package scclusteval and the accompanying Snakemake workflow implement all steps of the pipeline: subsampling the cells, repeating the clustering with Seurat, and estimation of cluster stability using the Jaccard similarity index. The Snakemake workflow takes advantage of high-performance computing clusters and dispatches jobs in parallel to available CPUs to speed up the analysis. The scclusteval package provides functions to facilitate the analysis of the output, including a series of rich visualizations.AvailabilityR package scclusteval: https://github.com/crazyhottommy/scclusteval Snakemake workflow: https://github.com/crazyhottommy/[email protected], [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

General Trend of Negative Transference Number in Li Salt/Ionic Liquid Mixtures

10.26434/chemrxiv.7728545 ◽

2019 ◽

Cited By ~ 1

Author(s):

Nicola Molinari ◽

Jonathan P. Mailoa ◽

Boris Kozinsky

Keyword(s):

Ionic Liquid ◽

Transference Number ◽

Liquid Mixtures ◽

Single Linkage ◽

Self Diffusion ◽

Wide Range ◽

Single Linkage Cluster ◽

Concentrated Solution ◽

The One ◽

Dynamics Simulations

We show that strong cation-anion interactions in a wide range of lithium-salt/ionic liquid mixtures result in a negative lithium transference number, using molecular dynamics simulations and rigorous concentrated solution theory. This behavior fundamentally deviates from the one obtained using self-diffusion coefficient analysis and agrees well with experimental electrophoretic NMR measurements, which accounts for ion correlations. We extend these findings to several ionic liquid compositions. We investigate the degree of spatial ionic coordination employing single-linkage cluster analysis, unveiling asymmetrical anion-cation clusters. Additionally, we formulate a way to compute the effective lithium charge that corresponds to and agrees well with electrophoretic measurements and show that lithium effectively carries a negative charge in a remarkably wide range of chemistries and concentrations. The generality of our observation has significant implications for the energy storage community, emphasizing the need to reconsider the potential of these systems as next generation battery electrolytes.<br>

Download Full-text