PINSPlus: a tool for tumor subtype discovery in integrated genomic data

Hung Nguyen; Sangam Shrestha; Sorin Draghici; Tin Nguyen

doi:10.1093/bioinformatics/bty1049

PINSPlus: a tool for tumor subtype discovery in integrated genomic data

Bioinformatics ◽

10.1093/bioinformatics/bty1049 ◽

2018 ◽

Vol 35 (16) ◽

pp. 2843-2846 ◽

Cited By ~ 15

Author(s):

Hung Nguyen ◽

Sangam Shrestha ◽

Sorin Draghici ◽

Tin Nguyen

Keyword(s):

Personal Computer ◽

Genomic Data ◽

Supplementary Information ◽

Omics Data ◽

Tumor Subtype ◽

Supplementary Data ◽

Significant Survival ◽

Survival Differences

Abstract Summary Since cancer is a heterogeneous disease, tumor subtyping is crucial for improved treatment and prognosis. We have developed a subtype discovery tool, called PINSPlus, that is: (i) robust against noise and unstable quantitative assays, (ii) able to integrate multiple types of omics data in a single analysis and (iii) dramatically superior to established approaches in identifying known subtypes and novel subgroups with significant survival differences. Our validation on 12,158 samples from 44 datasets shows that PINSPlus vastly outperforms other approaches. The software is easy-to-use and can partition hundreds of patients in a few minutes on a personal computer. Availability and implementation The package is available at https://cran.r-project.org/package=PINSPlus. Data and R script used in this manuscript are available at https://bioinformatics.cse.unr.edu/software/PINSPlus/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

tugHall: a simulator of cancer-cell evolution based on the hallmarks of cancer and tumor-related genes

Bioinformatics ◽

10.1093/bioinformatics/btaa182 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3597-3599 ◽

Cited By ~ 1

Author(s):

Iurii S Nagornov ◽

Mamoru Kato

Keyword(s):

Cancer Cell ◽

Tumor Heterogeneity ◽

Clonal Evolution ◽

Source Code ◽

Genomic Data ◽

Supplementary Information ◽

Cell Behavior ◽

Supplementary Data ◽

Hallmarks Of Cancer ◽

Cell Evolution

Abstract Summary The flood of recent cancer genomic data requires a coherent model that can sort out the findings to systematically explain clonal evolution and the resultant intra-tumor heterogeneity (ITH). Here, we present a new mathematical model designed to computationally simulate the evolution of cancer cells. The model connects the well-known hallmarks of cancer with the specific mutational states of tumor-related genes. The cell behavior phenotypes are stochastically determined, and the hallmarks probabilistically interfere with the phenotypic probabilities. In turn, the hallmark variables depend on the mutational states of tumor-related genes. Thus, our software can deepen our understanding of cancer-cell evolution and generation of ITH. Availability and implementation The open-source code is available in the repository https://github.com/nagornovys/Cancer_cell_evolution. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Causal network perturbations for instance-specific analysis of single cell and disease samples

Bioinformatics ◽

10.1093/bioinformatics/btz949 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2515-2521 ◽

Cited By ~ 1

Author(s):

Kristina L Buschur ◽

Maria Chikina ◽

Panayiotis V Benos

Keyword(s):

Single Cell ◽

Gene Networks ◽

Underlying Mechanism ◽

Alternative Methods ◽

Supplementary Information ◽

Patient Specific ◽

Data Dependencies ◽

Significant Survival ◽

Specific Analysis ◽

Survival Differences

Abstract Motivation Complex diseases involve perturbation in multiple pathways and a major challenge in clinical genomics is characterizing pathway perturbations in individual samples. This can lead to patient-specific identification of the underlying mechanism of disease thereby improving diagnosis and personalizing treatment. Existing methods rely on external databases to quantify pathway activity scores. This ignores the data dependencies and that pathways are incomplete or condition-specific. Results ssNPA is a new approach for subtyping samples based on deregulation of their gene networks. ssNPA learns a causal graph directly from control data. Sample-specific network neighborhood deregulation is quantified via the error incurred in predicting the expression of each gene from its Markov blanket. We evaluate the performance of ssNPA on liver development single-cell RNA-seq data, where the correct cell timing is recovered; and two TCGA datasets, where ssNPA patient clusters have significant survival differences. In all analyses ssNPA consistently outperforms alternative methods, highlighting the advantage of network-based approaches. Availability and implementation http://www.benoslab.pitt.edu/Software/ssnpa/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

pyGenomeTracks: reproducible plots for multivariate genomic data sets

Bioinformatics ◽

10.1093/bioinformatics/btaa692 ◽

2020 ◽

Cited By ~ 7

Author(s):

Lucille Lopez-Delisle ◽

Leily Rabbani ◽

Joachim Wolff ◽

Vivek Bhardwaj ◽

Rolf Backofen ◽

...

Keyword(s):

Genomic Data ◽

Supplementary Information ◽

Data Sets ◽

Command Line ◽

Graphical Interface ◽

Supplementary Data ◽

Considerable Effort ◽

Vector Graphic ◽

Graphic Software

Abstract Motivation Generating publication ready plots to display multiple genomic tracks can pose a serious challenge. Making desirable and accurate figures requires considerable effort. This is usually done by hand or by using a vector graphic software. Results pyGenomeTracks (PGT) is a modular plotting tool that easily combines multiple tracks. It enables a reproducible and standardized generation of highly customizable and publication ready images. Availability PGT is available through a graphical interface on https://usegalaxy.eu and through the command line. It is provided on conda via the bioconda channel, on pip and it is openly developed on github: https://github.com/deeptools/pyGenomeTracks. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

dbgap2x: An R package to explore and extract data from the database of Genotypes and Phenotypes (dbGaP)

Bioinformatics ◽

10.1093/bioinformatics/btz680 ◽

2019 ◽

Cited By ~ 1

Author(s):

Grégoire Versmée ◽

Laura Versmée ◽

Mikaël Dusenne ◽

Niloofar Jalali ◽

Paul Avillach

Keyword(s):

Data Sharing ◽

Large Scale ◽

Genomic Data ◽

R Package ◽

National Institutes Of Health ◽

Supplementary Information ◽

Supplementary Data ◽

Complex Procedure ◽

Range Of Functions ◽

The Relationship

Abstract Summary Based on the Genomic Data Sharing Policy issued in August 2007, the National Institutes of Health (NIH) has supported several repositories such as the database of Genotypes and Phenotypes (dbGaP). dbGaP is an online repository that provides access to large-scale genetic and phenotypic datasets with more than 1,000 studies. However, navigating the website and understanding the relationship between the studies are not easy tasks. Moreover, the decryption of the files is a complex procedure. In this study we propose the dbgap2x R package that covers a broad range of functions for searching dbGaP studies, exploring the characteristics of a study and easily decrypting the files from dbGaP. Availability and implementation dbgap2x is an R package with the code available at https://github.com/gversmee/dbgap2x. A containerized version including the package, a Jupyter server and with a Notebook example is available at https://hub.docker.com/r/gversmee/dbgap2x. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

FQSqueezer: k-mer-based compression of sequencing data

10.1101/559807 ◽

2019 ◽

Cited By ~ 1

Author(s):

Sebastian Deorowicz

Keyword(s):

Data Compression ◽

State Of The Art ◽

Genomic Data ◽

General Purpose ◽

Supplementary Information ◽

Supplementary Data ◽

Sequencing Data ◽

Partial Matching ◽

Supplementary Material ◽

Better Than

AbstractMotivationThe amount of genomic data that needs to be stored is huge. Therefore it is not surprising that a lot of work has been done in the field of specialized data compression of FASTQ files. The existing algorithms are, however, still imperfect and the best tools produce quite large archives.ResultsWe present FQSqueezer, a novel compression algorithm for sequencing data able to process single- and paired-end reads of variable lengths. It is based on the ideas from the famous prediction by partial matching and dynamic Markov coder algorithms known from the general-purpose-compressors world. The compression ratios are often tens of percent better than offered by the state-of-the-art tools.Availability and Implementationhttps://github.com/refresh-bio/[email protected] informationSupplementary data are available at publisher’s Web site.

Download Full-text

NCutYX: a package for clustering analysis of multilayer omics data

Bioinformatics ◽

10.1093/bioinformatics/btz842 ◽

2019 ◽

Author(s):

Sebastian J Teran Hidalgo ◽

Mengyun Wu ◽

Shuangge Ma

Keyword(s):

Clustering Analysis ◽

Complex Diseases ◽

R Package ◽

Supplementary Information ◽

Omics Data ◽

Supplementary Data ◽

Normalized Cut ◽

Novel Methods

Abstract Summary Multilayer omics profiling has become a major venue for understanding complex diseases. We develop NCutYX, an R package for clustering analysis of multilayer omics data. The package and methods jointly analyze multiple layers of omics measurements and effectively accommodate their regulations. They systematically conduct a series of analysis based on the normalized cut technique, including the clusterings of subjects and omics measurements and biclustering. The package can be valuable for its timely context, novel methods, and comprehensiveness. Availability https://cran.r-project.org/web/packages/NCutYX/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MsPAC: a tool for haplotype-phased structural variant detection

Bioinformatics ◽

10.1093/bioinformatics/btz618 ◽

2019 ◽

Vol 36 (3) ◽

pp. 922-924 ◽

Cited By ~ 3

Author(s):

Oscar L Rodriguez ◽

Anna Ritz ◽

Andrew J Sharp ◽

Ali Bashir

Keyword(s):

Genomic Data ◽

Supplementary Information ◽

Supplementary Data ◽

High Quality ◽

Structural Variant ◽

Long Read ◽

One Step ◽

Variant Detection ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Abstract Summary While next-generation sequencing (NGS) has dramatically increased the availability of genomic data, phased genome assembly and structural variant (SV) analyses are limited by NGS read lengths. Long-read sequencing from Pacific Biosciences and NGS barcoding from 10x Genomics hold the potential for far more comprehensive views of individual genomes. Here, we present MsPAC, a tool that combines both technologies to partition reads, assemble haplotypes (via existing software) and convert assemblies into high-quality, phased SV predictions. MsPAC represents a framework for haplotype-resolved SV calls that moves one step closer to fully resolved, diploid genomes. Availability and implementation https://github.com/oscarlr/MsPAC. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

multiclassPairs: An R package to train multiclass pairbased classifier

Bioinformatics ◽

10.1093/bioinformatics/btab088 ◽

2021 ◽

Author(s):

Nour-al-dain Marzouka ◽

Pontus Eriksson

Keyword(s):

Gene Expression ◽

Prediction Models ◽

R Package ◽

Supplementary Information ◽

Tumor Subtype ◽

Test Results ◽

Supplementary Data ◽

Classification Problems ◽

Excellent Performance ◽

Class Prediction

Abstract Motivation k–Top Scoring Pairs (kTSP) algorithms utilize in-sample gene expression feature pair rules for class prediction, and have demonstrated excellent performance and robustness. The available packages and tools primarily focus on binary prediction (i.e. two classes). However, many real-world classification problems e.g., tumor subtype prediction, are multiclass tasks. Results Here, we present multiclassPairs, an R package to train pair-based single sample classifiers for multiclass problems. multiclassPairs offers two main methods to build multiclass prediction models, either using a one-vs-rest kTSP scheme or through a novel pair-based Random Forest approach. The package also provides options for dealing with class imbalances, multiplatform training, missing features in test data, and visualization of training and test results. Availability ‘multiclassPairs’ package is available on CRAN servers and GitHub: https://github.com/NourMarzouka/multiclassPairs Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PEWO: a collection of workflows to benchmark phylogenetic placement

Bioinformatics ◽

10.1093/bioinformatics/btaa657 ◽

2020 ◽

Cited By ~ 1

Author(s):

Benjamin Linard ◽

Nikolai Romashchenko ◽

Fabio Pardi ◽

Eric Rivals

Keyword(s):

Parameter Optimization ◽

Genomic Data ◽

Supplementary Information ◽

Taxonomic Identification ◽

Supplementary Data ◽

Phylogenetic Placement ◽

Future Developments ◽

Community Effort ◽

Standard Support ◽

Selection Of

Abstract Motivation Phylogenetic placement (PP) is a process of taxonomic identification for which several tools are now available. However, it remains difficult to assess which tool is more adapted to particular genomic data or a particular reference taxonomy. We developed Placement Evaluation WOrkflows (PEWO), the first benchmarking tool dedicated to PP assessment. Its automated workflows can evaluate PP at many levels, from parameter optimization for a particular tool, to the selection of the most appropriate genetic marker when PP-based species identifications are targeted. Our goal is that PEWO will become a community effort and a standard support for future developments and applications of PP. Availability and implementation https://github.com/phylo42/PEWO. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

hts-nim: scripting high-performance genomic analyses

10.1101/261735 ◽

2018 ◽

Author(s):

Brent S. Pedersen ◽

Aaron R. Quinlan

Keyword(s):

High Performance ◽

Genomic Data ◽

Supplementary Information ◽

Supplementary Data ◽

Scripting Languages ◽

Link Type ◽

Custom Software ◽

Genomic Analyses ◽

Biological Insight ◽

Supplementary Material

AbstractMotivationExtracting biological insight from genomic data inevitably requires custom software. In many cases, this is accomplished with scripting languages, owing to their accessibility and brevity. Unfortunately, the ease of scripting languages typically comes at a substantial performance cost that is especially acute with the scale of modern genomics datasets.ResultsWe present hts-nim, a high-performance library written in the Nim programming language that provides a simple, scripting-like syntax without sacrificing performance.Availabilityhts-nim is available at https://github.com/brentp/hts-nim and the example tools are at https://github.com/brentp/hts-nim-tools both under the MIT [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text