Spliceogen: an integrative, scalable tool for the discovery of splice-altering variants

Steven Monger; Michael Troup; Eddie Ip; Sally L Dunwoodie; Eleni Giannoulatou

doi:10.1093/bioinformatics/btz263

Spliceogen: an integrative, scalable tool for the discovery of splice-altering variants

Bioinformatics ◽

10.1093/bioinformatics/btz263 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4405-4407 ◽

Cited By ~ 1

Author(s):

Steven Monger ◽

Michael Troup ◽

Eddie Ip ◽

Sally L Dunwoodie ◽

Eleni Giannoulatou

Keyword(s):

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

In Silico Prediction ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Prediction Tools ◽

Motif Prediction ◽

Command Line Tool ◽

Genome Scale

Abstract Motivation In silico prediction tools are essential for identifying variants which create or disrupt cis-splicing motifs. However, there are limited options for genome-scale discovery of splice-altering variants. Results We have developed Spliceogen, a highly scalable pipeline integrating predictions from some of the individually best performing models for splice motif prediction: MaxEntScan, GeneSplicer, ESRseq and Branchpointer. Availability and implementation Spliceogen is available as a command line tool which accepts VCF/BED inputs and handles both single nucleotide variants (SNVs) and indels (https://github.com/VCCRI/Spliceogen). SNV databases with prediction scores are also available, covering all possible SNVs at all genomic positions within all Gencode-annotated multi-exon transcripts. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Genesis and Gappa: processing, analyzing and visualizing phylogenetic (placement) data

Bioinformatics ◽

10.1093/bioinformatics/btaa070 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3263-3265 ◽

Cited By ~ 14

Author(s):

Lucas Czech ◽

Pierre Barbera ◽

Alexandros Stamatakis

Keyword(s):

Phylogenetic Trees ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Computationally Efficient ◽

Data Types ◽

Low Level ◽

Phylogenetic Placement ◽

Command Line Tool ◽

High Level

Abstract Summary We present genesis, a library for working with phylogenetic data, and gappa, an accompanying command-line tool for conducting typical analyses on such data. The tools target phylogenetic trees and phylogenetic placements, sequences, taxonomies and other relevant data types, offer high-level simplicity as well as low-level customizability, and are computationally efficient, well-tested and field-proven. Availability and implementation Both genesis and gappa are written in modern C++11, and are freely available under GPLv3 at http://github.com/lczech/genesis and http://github.com/lczech/gappa. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Visualization of circular RNAs and their internal splicing events from transcriptomic data

Bioinformatics ◽

10.1093/bioinformatics/btaa033 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2934-2935 ◽

Cited By ~ 1

Author(s):

Yi Zheng ◽

Fangqing Zhao

Keyword(s):

Supplementary Information ◽

Circular Rnas ◽

Visualization Tool ◽

Command Line ◽

Supplementary Data ◽

Transcriptomic Data ◽

Command Line Tool ◽

Transcriptome Comparison ◽

Multiple Samples ◽

Splicing Patterns

Abstract Summary Circular RNAs (circRNAs) are proved to have unique compositions and splicing events distinct from canonical mRNAs. However, there is no visualization tool designed for the exploration of complex splicing patterns in circRNA transcriptomes. Here, we present CIRI-vis, a Java command-line tool for quantifying and visualizing circRNAs by integrating the alignments and junctions of circular transcripts. CIRI-vis can be applied to visualize the internal structure and isoform abundance of circRNAs and perform circRNA transcriptome comparison across multiple samples. Availability and implementation https://sourceforge.net/projects/ciri/files/CIRI-vis. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SimRVSequences: an R package to simulate genetic sequence data for pedigrees

Bioinformatics ◽

10.1093/bioinformatics/btz881 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2295-2297

Author(s):

Christina Nieuwoudt ◽

Angela Brooks-Wilson ◽

Jinko Graham

Keyword(s):

Sequence Data ◽

R Package ◽

Supplementary Information ◽

Supplementary Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Genetic Sequence ◽

Large Numbers

Abstract Summary We present the R package SimRVSequences to simulate sequence data for pedigrees. SimRVSequences allows for simulations of large numbers of single-nucleotide variants (SNVs) and scales well with increasing numbers of pedigrees. Users provide a sample of pedigrees and SNV data from a sample of unrelated individuals. Availability and implementation SimRVSequences is publicly-available on CRAN https://cran.r-project.org/web/packages/SimRVSequences/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SMuRF: a novel tool to identify genomic regions enriched for somatic point mutations

10.1101/271957 ◽

2018 ◽

Author(s):

Paul Guilhamon ◽

Mathieu Lupien

Keyword(s):

Point Mutations ◽

Regulatory Elements ◽

Supplementary Information ◽

Gene Promoters ◽

Nucleotide Polymorphisms ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Command Line Tool ◽

Risk Snps ◽

Genomic Regions

AbstractMotivationSingle Nucleotide Variants (SNVs), including somatic point mutations and Single Nucleotide Polymorphisms (SNPs), in noncoding cis-regulatory elements (CREs) can affect gene regulation and lead to disease development (Zhou et al., 2016; Zhang et al., 2014). Others have previously developed methods to identify important clusters of somatic point mutations based on proximity (Weinhold et al., 2014) or the enrichment of inherited risk-SNPs at CREs (Ahmed et al., 2017). Here, we present SMuRF (Significantly Mutated Region Finder), a user-friendly command-line tool to identify these significantly mutated regions from user-defined genomic intervals and SNVs.ResultsSMuRF identified 72 significantly mutated CREs in liver cancer, including known mutated gene promoters as well as previously unreported regions.AvailabilityThe source code for SMuRF is open-source and freely available on GitHub (https://github.com/LupienLabOrganization/SMuRF) under the GNU GPLv3 license. SMuRF is implemented in Bash and R; it runs on any platform with Bash (≥4.1.2), R (≥3.3.0) and BEDTools (≥2.26.0). It requires the following R packages: GenomicRanges, gtools, gplots, ggplot2, data.table, psych, and dplyr.Supplementary InformationSupplementary information available at Bioinformatics [email protected]; [email protected]

Download Full-text

aCLImatise: automated generation of tool definitions for bioinformatics workflows

Bioinformatics ◽

10.1093/bioinformatics/btaa1033 ◽

2020 ◽

Author(s):

Michael Milton ◽

Natalie Thorne

Keyword(s):

Source Code ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Automated Generation ◽

Base Camp ◽

Python Package ◽

Bioinformatics Workflow ◽

Bioinformatics Workflows

Abstract Summary aCLImatise is a utility for automatically generating tool definitions compatible with bioinformatics workflow languages, by parsing command-line help output. aCLImatise also has an associated database called the aCLImatise Base Camp, which provides thousands of pre-computed tool definitions. Availability and implementation The latest aCLImatise source code is available within a GitHub organisation, under the GPL-3.0 license: https://github.com/aCLImatise. In particular, documentation for the aCLImatise Python package is available at https://aclimatise.github.io/CliHelpParser/, and the aCLImatise Base Camp is available at https://aclimatise.github.io/BaseCamp/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

neoepiscope improves neoepitope prediction with multivariant phasing

Bioinformatics ◽

10.1093/bioinformatics/btz653 ◽

2019 ◽

Vol 36 (3) ◽

pp. 713-720 ◽

Cited By ~ 5

Author(s):

Mary A Wood ◽

Austin Nguyen ◽

Adam J Struck ◽

Kyle Ellrott ◽

Abhinav Nellore ◽

...

Keyword(s):

False Negative ◽

Supplementary Information ◽

Supplementary File ◽

Sequencing Data ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Somatic Variant ◽

Negative Results ◽

Multiple Datasets ◽

False Negative Results

Abstract Motivation The vast majority of tools for neoepitope prediction from DNA sequencing of complementary tumor and normal patient samples do not consider germline context or the potential for the co-occurrence of two or more somatic variants on the same mRNA transcript. Without consideration of these phenomena, existing approaches are likely to produce both false-positive and false-negative results, resulting in an inaccurate and incomplete picture of the cancer neoepitope landscape. We developed neoepiscope chiefly to address this issue for single nucleotide variants (SNVs) and insertions/deletions (indels). Results Herein, we illustrate how germline and somatic variant phasing affects neoepitope prediction across multiple datasets. We estimate that up to ∼5% of neoepitopes arising from SNVs and indels may require variant phasing for their accurate assessment. neoepiscope is performant, flexible and supports several major histocompatibility complex binding affinity prediction tools. Availability and implementation neoepiscope is available on GitHub at https://github.com/pdxgx/neoepiscope under the MIT license. Scripts for reproducing results described in the text are available at https://github.com/pdxgx/neoepiscope-paper under the MIT license. Additional data from this study, including summaries of variant phasing incidence and benchmarking wallclock times, are available in Supplementary Files 1, 2 and 3. Supplementary File 1 contains Supplementary Table 1, Supplementary Figures 1 and 2, and descriptions of Supplementary Tables 2–8. Supplementary File 2 contains Supplementary Tables 2–6 and 8. Supplementary File 3 contains Supplementary Table 7. Raw sequencing data used for the analyses in this manuscript are available from the Sequence Read Archive under accessions PRJNA278450, PRJNA312948, PRJNA307199, PRJNA343789, PRJNA357321, PRJNA293912, PRJNA369259, PRJNA305077, PRJNA306070, PRJNA82745 and PRJNA324705; from the European Genome-phenome Archive under accessions EGAD00001004352 and EGAD00001002731; and by direct request to the authors. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Knot_pull—python package for biopolymer smoothing and knot detection

Bioinformatics ◽

10.1093/bioinformatics/btz644 ◽

2019 ◽

Cited By ~ 1

Author(s):

Aleksandra I Jarmolinska ◽

Anna Gambin ◽

Joanna I Sulkowska

Keyword(s):

Learning Curve ◽

Source Code ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Steep Learning Curve ◽

Independent Source ◽

Python Package

Abstract Summary The biggest hurdle in studying topology in biopolymers is the steep learning curve for actually seeing the knots in structure visualization. Knot_pull is a command line utility designed to simplify this process—it presents the user with a smoothing trajectory for provided structures (any number and length of protein, RNA or chromatin chains in PDB, CIF or XYZ format), and calculates the knot type (including presence of any links, and slipknots when a subchain is specified). Availability and implementation Knot_pull works under Python >=2.7 and is system independent. Source code and documentation are available at http://github.com/dzarmola/knot_pull under GNU GPL license and include also a wrapper script for PyMOL for easier visualization. Examples of smoothing trajectories can be found at: https://www.youtube.com/watch?v=IzSGDfc1vAY. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

neoepiscopeimproves neoepitope prediction with multi-variant phasing

10.1101/418129 ◽

2018 ◽

Cited By ~ 2

Author(s):

Mary A. Wood ◽

Austin Nguyen ◽

Adam Struck ◽

Kyle Ellrott ◽

Abhinav Nellore ◽

...

Keyword(s):

False Negative ◽

List Type ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Somatic Variant ◽

Prediction Tools ◽

Negative Results ◽

Multiple Datasets ◽

False Negative Results ◽

Key Points

ABSTRACTThe vast majority of tools for neoepitope prediction from DNA sequencing of complementary tumor and normal patient samples do not consider germline context or the potential for co-occurrence of two or more somatic variants on the same mRNA transcript. Without consideration of these phenomena, existing approaches are likely to produce both false positive and false negative results, resulting in an inaccurate and incomplete picture of the cancer neoepitope landscape. We developedneoepiscopechiefly to address this issue for single nucleotide variants (SNVs) and insertions/deletions (indels), and herein illustrate how germline and somatic variant phasing affects neoepitope prediction across multiple datasets. We estimate that up to ∼5% of neoepitopes arising from SNVs and indels may require variant phasing for their accurate assessment.neoepiscopeis performant, flexible, and supports several major histocompatibility complex binding affinity prediction tools. We have releasedneoepiscopeas open-source software (MIT license,https://github.com/pdxgx/neoepiscope) for broad use.KEY POINTSGermline context and somatic variant phasing are important for neoepitope predictionMany popular neoepitope prediction tools have issues of performance and reproducibilityWe describe and provide performant software for accurate neoepitope prediction from DNA-seq data

Download Full-text

MONET: a toolbox integrating top-performing methods for network modularization

Bioinformatics ◽

10.1093/bioinformatics/btaa236 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3920-3921

Author(s):

Mattia Tomasoni ◽

Sergio Gómez ◽

Jake Crawford ◽

Weijia Zhang ◽

Sarvenaz Choobdar ◽

...

Keyword(s):

Molecular Network ◽

Supplementary Information ◽

Command Line ◽

The Core ◽

Disease Mechanisms ◽

Different Types ◽

Command Line Tool ◽

Disease Module ◽

Community Effort ◽

Bioinformatics Community

Abstract Summary We define a disease module as a partition of a molecular network whose components are jointly associated with one or several diseases or risk factors thereof. Identification of such modules, across different types of networks, has great potential for elucidating disease mechanisms and establishing new powerful biomarkers. To this end, we launched the ‘Disease Module Identification (DMI) DREAM Challenge’, a community effort to build and evaluate unsupervised molecular network modularization algorithms. Here, we present MONET, a toolbox providing easy and unified access to the three top-performing methods from the DMI DREAM Challenge for the bioinformatics community. Availability and implementation MONET is a command line tool for Linux, based on Docker and Singularity containers; the core algorithms were written in R, Python, Ada and C++. It is freely available for download at https://github.com/BergmannLab/MONET.git. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Phylonium: fast estimation of evolutionary distances from large samples of similar genomes

Bioinformatics ◽

10.1093/bioinformatics/btz903 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2040-2046 ◽

Cited By ~ 2

Author(s):

Fabian Klötzl ◽

Bernhard Haubold

Keyword(s):

Disease Outbreaks ◽

Supplementary Information ◽

Whole Genome ◽

Command Line ◽

Supplementary Data ◽

Large Samples ◽

Fast Estimation ◽

Unix Command ◽

Similar Accuracy ◽

Single Sequence

Abstract Motivation Tracking disease outbreaks by whole-genome sequencing leads to the collection of large samples of closely related sequences. Five years ago, we published a method to accurately compute all pairwise distances for such samples by indexing each sequence. Since indexing is slow, we now ask whether it is possible to achieve similar accuracy when indexing only a single sequence. Results We have implemented this idea in the program phylonium and show that it is as accurate as its predecessor and roughly 100 times faster when applied to all 2678 Escherichia coli genomes contained in ENSEMBL. One of the best published programs for rapidly computing pairwise distances, mash, analyzes the same dataset four times faster but, with default settings, it is less accurate than phylonium. Availability and implementation Phylonium runs under the UNIX command line; its C++ sources and documentation are available from github.com/evolbioinf/phylonium. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text