G3viz: an R package to interactively visualize genetic mutation data using a lollipop-diagram

Bioinformatics ◽

10.1093/bioinformatics/btz631 ◽

2019 ◽

Cited By ~ 1

Author(s):

Xin Guo ◽

Bo Zhang ◽

Wenqi Zeng ◽

Shuting Zhao ◽

Dongliang Ge

Keyword(s):

Cancer Genomics ◽

R Package ◽

Genetic Mutation ◽

Supplementary Information ◽

Genetic Mutations ◽

Supplementary Data ◽

High Quality ◽

Web Browser ◽

Mutation Data ◽

Different Levels

Abstract Summary The lollipop-diagram is one of the widely used graphical representations to visualize and explore translational effects of genetic mutations in cancer genomics. However, an easy-to-use lollipop-diagram tool with full functionality is still lacking. Here, we introduce g3viz, an R package that enables researchers to explore genetic mutation data using a lollipop-diagram in a web browser. With a few lines of R code, users can interactively visualize data details, annotate findings and export resultant diagrams in high-quality figures. Because of usefulness and usability, g3viz can be generally exploited by researchers with different levels of bioinformatics skills and programming experience. Availability and implementation The R package is freely available under the MIT license from CRAN (http://cran.r-project.org/web/packages/g3viz). The g3lollipop JavaScript package is freely available under MIT license at GitHub (https://github.com/g3viz/g3lollipop.js). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

mmgenome: a toolbox for reproducible genome extraction from metagenomes

10.1101/059121 ◽

2016 ◽

Cited By ~ 42

Author(s):

Søeren M. Karst ◽

Rasmus H. Kirkegaard ◽

Mads Albertsen

Keyword(s):

Optimal Strategy ◽

R Package ◽

Supplementary Information ◽

Data Generation ◽

Supplementary Data ◽

High Quality ◽

Standard Analysis ◽

Specific Population ◽

The Core ◽

Supplementary Material

ABSTRACTSummaryRecovery of population genomes is becoming a standard analysis in metagenomics and a multitude of different approaches exists. However, the workflows are complex, requiring data generation, binning, validation and finishing to generate high quality population genome bins. In addition, several different approaches are often used on the same dataset as the optimal strategy to extract a specific population genome varies. Here we introduce mmgenome: a toolbox for reproducible genome extraction from metagenomes. At the core of mmgenome is an R package that facilitates effortless integration of different binning strategies by collecting information on scaffolds. Genome binning is facilitated through integrated tools that support effortless visualizations, validation and calculation of key statistics. Full reproducibility and transparency is obtained through Rmarkdown, whereby every step can be recreated.Availability and implementationThe binning framework of mmge-nome is implemented in R. Wrapper scripts for data generation and finishing is written in Perl. The mmgenome toolbox and associated step-by-step guides are available at http://madsal-bertsen.github.io/mmgenome/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

DEsingle for detecting three types of differential expression in single-cell RNA-seq data

10.1101/173997 ◽

2017 ◽

Cited By ~ 1

Author(s):

Zhun Miao ◽

Ke Deng ◽

Xiaowo Wang ◽

Xuegong Zhang

Keyword(s):

Single Cell ◽

Differential Expression ◽

Negative Binomial ◽

Single Cells ◽

R Package ◽

Supplementary Information ◽

Binomial Model ◽

Supplementary Data ◽

Rna Seq ◽

Real Zeros

AbstractSummaryThe excessive amount of zeros in single-cell RNA-seq data include “real” zeros due to the on-off nature of gene transcription in single cells and “dropout” zeros due to technical reasons. Existing differential expression (DE) analysis methods cannot distinguish these two types of zeros. We developed an R package DEsingle which employed Zero-Inflated Negative Binomial model to estimate the proportion of real and dropout zeros and to define and detect 3 types of DE genes in single-cell RNA-seq data with higher accuracy.Availability and ImplementationThe R package DEsingle is freely available at https://github.com/miaozhun/DEsingle and is under Bioconductor’s consideration [email protected] informationSupplementary data are available at bioRxiv online.

Download Full-text

OpenBioLink: a benchmarking framework for large-scale biomedical link prediction

Bioinformatics ◽

10.1093/bioinformatics/btaa274 ◽

2020 ◽

Vol 36 (13) ◽

pp. 4097-4098 ◽

Cited By ~ 3

Author(s):

Anna Breit ◽

Simon Ott ◽

Asan Agibetov ◽

Matthias Samwald

Keyword(s):

Link Prediction ◽

Large Scale ◽

Source Code ◽

Machine Learning Algorithms ◽

Knowledge Networks ◽

Supplementary Information ◽

Supplementary Data ◽

Biomedical Knowledge ◽

High Quality ◽

Baseline Evaluation

Abstract Summary Recently, novel machine-learning algorithms have shown potential for predicting undiscovered links in biomedical knowledge networks. However, dedicated benchmarks for measuring algorithmic progress have not yet emerged. With OpenBioLink, we introduce a large-scale, high-quality and highly challenging biomedical link prediction benchmark to transparently and reproducibly evaluate such algorithms. Furthermore, we present preliminary baseline evaluation results. Availability and implementation Source code and data are openly available at https://github.com/OpenBioLink/OpenBioLink. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ngsReports: a Bioconductor package for managing FastQC reports and other NGS related log files

Bioinformatics ◽

10.1093/bioinformatics/btz937 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2587-2588 ◽

Cited By ~ 10

Author(s):

Christopher M Ward ◽

Thu-Hien To ◽

Stephen M Pederson

Keyword(s):

Quality Control ◽

R Package ◽

Supplementary Information ◽

Bioconductor Package ◽

Supplementary Data ◽

Large Sample ◽

Log Files ◽

Shiny App ◽

Next Generation Sequencing Ngs ◽

Generation Sequencing

Abstract Motivation High throughput next generation sequencing (NGS) has become exceedingly cheap, facilitating studies to be undertaken containing large sample numbers. Quality control (QC) is an essential stage during analytic pipelines and the outputs of popular bioinformatics tools such as FastQC and Picard can provide information on individual samples. Although these tools provide considerable power when carrying out QC, large sample numbers can make inspection of all samples and identification of systemic bias a challenge. Results We present ngsReports, an R package designed for the management and visualization of NGS reports from within an R environment. The available methods allow direct import into R of FastQC reports along with outputs from other tools. Visualization can be carried out across many samples using default, highly customizable plots with options to perform hierarchical clustering to quickly identify outlier libraries. Moreover, these can be displayed in an interactive shiny app or HTML report for ease of analysis. Availability and implementation The ngsReports package is available on Bioconductor and the GUI shiny app is available at https://github.com/UofABioinformaticsHub/shinyNgsreports. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Top-Down Garbage Collector: a tool for selecting high-quality top-down proteomics mass spectra

Bioinformatics ◽

10.1093/bioinformatics/btz085 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3489-3490 ◽

Cited By ~ 1

Author(s):

Diogo B Lima ◽

André R F Silva ◽

Mathieu Dupré ◽

Marlon D M Santos ◽

Milan A Clasen ◽

...

Keyword(s):

Quality Control ◽

Mass Spectra ◽

Rate Increase ◽

Supplementary Information ◽

Supplementary Data ◽

Top Down ◽

High Quality ◽

Garbage Collector ◽

E Coli ◽

Spectral Libraries

Abstract Motivation We present the first tool for unbiased quality control of top-down proteomics datasets. Our tool can select high-quality top-down proteomics spectra, serve as a gateway for building top-down spectral libraries and, ultimately, improve identification rates. Results We demonstrate that a twofold rate increase for two E. coli top-down proteomics datasets may be achievable. Availability and implementation http://patternlabforproteomics.org/tdgc, freely available for academic use. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CPS analysis: self-contained validation of biomedical data clustering

Bioinformatics ◽

10.1093/bioinformatics/btaa165 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3516-3521 ◽

Cited By ~ 1

Author(s):

Lixiang Zhang ◽

Lin Lin ◽

Jia Li

Keyword(s):

Data Clustering ◽

State Of The Art ◽

R Package ◽

Research Community ◽

Supplementary Information ◽

Biomedical Data ◽

Data Generation ◽

Supplementary Data ◽

Point Set ◽

Class Labels

Abstract Motivation Cluster analysis is widely used to identify interesting subgroups in biomedical data. Since true class labels are unknown in the unsupervised setting, it is challenging to validate any cluster obtained computationally, an important problem barely addressed by the research community. Results We have developed a toolkit called covering point set (CPS) analysis to quantify uncertainty at the levels of individual clusters and overall partitions. Functions have been developed to effectively visualize the inherent variation in any cluster for data of high dimension, and provide more comprehensive view on potentially interesting subgroups in the data. Applying to three usage scenarios for biomedical data, we demonstrate that CPS analysis is more effective for evaluating uncertainty of clusters comparing to state-of-the-art measurements. We also showcase how to use CPS analysis to select data generation technologies or visualization methods. Availability and implementation The method is implemented in an R package called OTclust, available on CRAN. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

RMTL: an R library for multi-task learning

Bioinformatics ◽

10.1093/bioinformatics/bty831 ◽

2018 ◽

Vol 35 (10) ◽

pp. 1797-1798 ◽

Cited By ~ 2

Author(s):

Han Cao ◽

Jiayu Zhou ◽

Emanuel Schwarz

Keyword(s):

Biological Networks ◽

Simulated Data ◽

R Package ◽

Low Rank ◽

Supplementary Information ◽

Supplementary Data ◽

Software Environment ◽

Machine Learning Technique ◽

Task Learning ◽

Learning Technique

Abstract Motivation Multi-task learning (MTL) is a machine learning technique for simultaneous learning of multiple related classification or regression tasks. Despite its increasing popularity, MTL algorithms are currently not available in the widely used software environment R, creating a bottleneck for their application in biomedical research. Results We developed an efficient, easy-to-use R library for MTL (www.r-project.org) comprising 10 algorithms applicable for regression, classification, joint predictor selection, task clustering, low-rank learning and incorporation of biological networks. We demonstrate the utility of the algorithms using simulated data. Availability and implementation The RMTL package is an open source R package and is freely available at https://github.com/transbioZI/RMTL. RMTL will also be available on cran.r-project.org. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

schex avoids overplotting for large single-cell RNA-sequencing datasets

Bioinformatics ◽

10.1093/bioinformatics/btz907 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2291-2292 ◽

Cited By ~ 1

Author(s):

Saskia Freytag ◽

Ryan Lister

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

R Package ◽

Supplementary Information ◽

Supplementary Data ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Abstract Summary Due to the scale and sparsity of single-cell RNA-sequencing data, traditional plots can obscure vital information. Our R package schex overcomes this by implementing hexagonal binning, which has the additional advantages of improving speed and reducing storage for resulting plots. Availability and implementation schex is freely available from Bioconductor via http://bioconductor.org/packages/release/bioc/html/schex.html and its development version can be accessed on GitHub via https://github.com/SaskiaFreytag/schex. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Serpentine: a flexible 2D binning method for differential Hi-C analysis

Bioinformatics ◽

10.1093/bioinformatics/btaa249 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3645-3651

Author(s):

Lyam Baudry ◽

Gaël A Millot ◽

Agnes Thierry ◽

Romain Koszul ◽

Vittore F Scolari

Keyword(s):

Deep Sequencing ◽

Low Noise ◽

Supplementary Information ◽

Supplementary Data ◽

Fractal Nature ◽

Contact Map ◽

Signal To Noise ◽

High Quality ◽

Contact Maps ◽

Contact Frequency

Abstract Motivation Hi-C contact maps reflect the relative contact frequencies between pairs of genomic loci, quantified through deep sequencing. Differential analyses of these maps enable downstream biological interpretations. However, the multi-fractal nature of the chromatin polymer inside the cellular envelope results in contact frequency values spanning several orders of magnitude: contacts between loci pairs separated by large genomic distances are much sparser than closer pairs. The same is true for poorly covered regions, such as repeated sequences. Both distant and poorly covered regions translate into low signal-to-noise ratios. There is no clear consensus to address this limitation. Results We present Serpentine, a fast, flexible procedure operating on raw data, which considers the contacts in each region of a contact map. Binning is performed only when necessary on noisy regions, preserving informative ones. This results in high-quality, low-noise contact maps that can be conveniently visualized for rigorous comparative analyses. Availability and implementation Serpentine is available on the PyPI repository and https://github.com/koszullab/serpentine; documentation and tutorials are provided at https://serpentine.readthedocs.io/en/latest/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Bringing data from curated pathway resources to Cytoscape with OmniPath

Bioinformatics ◽

10.1093/bioinformatics/btz968 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2632-2633 ◽

Cited By ~ 6

Author(s):

Francesco Ceccarelli ◽

Denes Turei ◽

Attila Gabor ◽

Julio Saez-Rodriguez

Keyword(s):

Source Code ◽

Large Body ◽

Supplementary Information ◽

Supplementary Data ◽

Network Resources ◽

High Quality ◽

Comprehensive Collection ◽

Intuitive Interface ◽

Growing Network

Abstract Summary Multiple databases provide valuable information about curated pathways and other resources that can be used to build and analyze networks. OmniPath combines 61 (and continuously growing) network resources into a comprehensive collection, with over 120 000 interactions. We present here the OmniPath App, a Cytoscape plugin to flexibly import data from OmniPath via a simple and intuitive interface. Thus, it makes possible to directly access the large body of high-quality knowledge provided by OmniPath within Cytoscape for inspection and further use with other tools. Availability and implementation The OmniPath App has been developed for Cytoscape 3 in the Java programing language. The latest source code and the plugin can be found at: https://github.com/saezlab/Omnipath_Cytoscape and http://apps.cytoscape.org/apps/omnipath, respectively. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text