Sequence tube maps: making graph genomes intuitive to commuters

Abstract Motivation Compared to traditional haploid reference genomes, graph genomes are an efficient and compact data structure for storing multiple genomic sequences, for storing polymorphisms or for mapping sequencing reads with greater sensitivity. Further, graphs are well-studied computer science objects that can be efficiently analyzed. However, their adoption in genomic research is slow, in part because of the cognitive difficulty in interpreting graphs. Results We present an intuitive graphical representation for graph genomes that re-uses well-honed techniques developed to display public transport networks, and demonstrate it as a web tool. Availability and implementation Code: https://github.com/vgteam/sequenceTubeMap. Demonstration https://vgteam.github.io/sequenceTubeMap/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

VarMap: a web tool for mapping genomic coordinates to protein sequence and structure and retrieving protein structural annotations

Bioinformatics ◽

10.1093/bioinformatics/btz482 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4854-4856 ◽

Cited By ~ 8

Author(s):

James D Stephenson ◽

Roman A Laskowski ◽

Andrew Nightingale ◽

Matthew E Hurles ◽

Janet M Thornton

Keyword(s):

Protein Sequence ◽

Structural Information ◽

Protein Structures ◽

Supplementary Information ◽

Supplementary Data ◽

Web Tool ◽

Genomic Variants ◽

Structural Context ◽

Pathogenic Variants ◽

Transcript Evidence

Abstract Motivation Understanding the protein structural context and patterning on proteins of genomic variants can help to separate benign from pathogenic variants and reveal molecular consequences. However, mapping genomic coordinates to protein structures is non-trivial, complicated by alternative splicing and transcript evidence. Results Here we present VarMap, a web tool for mapping a list of chromosome coordinates to canonical UniProt sequences and associated protein 3D structures, including validation checks, and annotating them with structural information. Availability and implementation https://www.ebi.ac.uk/thornton-srv/databases/VarMap. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MAJIQ-SPEL: web-tool to interrogate classical and complex splicing variations from RNA-Seq data

Bioinformatics ◽

10.1093/bioinformatics/btx565 ◽

2017 ◽

Vol 34 (2) ◽

pp. 300-302 ◽

Cited By ~ 2

Author(s):

Christopher J Green ◽

Matthew R Gazzara ◽

Yoseph Barash

Keyword(s):

Experimental Validation ◽

Ucsc Genome Browser ◽

Supplementary Information ◽

Supplementary Data ◽

Rna Seq ◽

Web Tool ◽

Rt Pcr ◽

Design Algorithm ◽

Gene Isoforms ◽

Downstream Analysis

Abstract Summary Analysis of RNA sequencing (RNA-Seq) data have highlighted the fact that most genes undergo alternative splicing (AS) and that these patterns are tightly regulated. Many of these events are complex, resulting in numerous possible isoforms that quickly become difficult to visualize, interpret and experimentally validate. To address these challenges we developed MAJIQ-SPEL, a web-tool that takes as input local splicing variations (LSVs) quantified from RNA-Seq data and provides users with visualization and quantification of gene isoforms associated with those. Importantly, MAJIQ-SPEL is able to handle both classical (binary) and complex, non-binary, splicing variations. Using a matching primer design algorithm it also suggests to users possible primers for experimental validation by RT-PCR and displays those, along with the matching protein domains affected by the LSV, on UCSC Genome Browser for further downstream analysis. Availability and implementation Program and code will be available athttp://majiq.biociphers.org/majiq-spel. Supplementary information Supplementary data are available atBioinformatics online.

Download Full-text

Kmer-db: instant evolutionary distance estimation

10.1101/263590 ◽

2018 ◽

Author(s):

Sebastian Deorowicz ◽

Adam Gudys ◽

Maciej Dlugosz ◽

Marek Kokot ◽

Agnieszka Danek

Keyword(s):

Data Structure ◽

Web Site ◽

Parallel Implementation ◽

Evolutionary Relationship ◽

Distance Estimation ◽

Evolutionary Distance ◽

Supplementary Information ◽

Supplementary Data ◽

Efficient Data ◽

Evolutionary Distance Estimation

AbstractSummaryKmer-db is a new tool for estimating evolutionary relationship on the basis of k-mers extracted from genomes or sequencing reads. Thanks to an efficient data structure and parallel implementation, our software estimates distances between 40,715 pathogens in less than 4 minutes (on a modern workstation), 44 times faster than Mash, its main competitor.Availability and Implementationhttps://github.com/refresh-bio/[email protected] informationSupplementary data are available at publisher’s Web site

Download Full-text

eFORGE v2.0: updated analysis of cell type-specific signal in epigenomic data

Bioinformatics ◽

10.1093/bioinformatics/btz456 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4767-4769 ◽

Cited By ~ 9

Author(s):

Charles E Breeze ◽

Alex P Reynolds ◽

Jenny van Dongen ◽

Ian Dunham ◽

John Lazar ◽

...

Keyword(s):

Supplementary Information ◽

Supplementary Data ◽

Cell Type ◽

Web Tool ◽

Methylation Analysis ◽

450K Array ◽

Composition Effects ◽

Epigenome Editing ◽

Cell Type Specific ◽

Dna Methylation Analysis

Abstract Summary The Illumina Infinium EPIC BeadChip is a new high-throughput array for DNA methylation analysis, extending the earlier 450k array by over 400 000 new sites. Previously, a method named eFORGE was developed to provide insights into cell type-specific and cell-composition effects for 450k data. Here, we present a significantly updated and improved version of eFORGE that can analyze both EPIC and 450k array data. New features include analysis of chromatin states, transcription factor motifs and DNase I footprints, providing tools for epigenome-wide association study interpretation and epigenome editing. Availability and implementation eFORGE v2.0 is implemented as a web tool available from https://eforge.altiusinstitute.org and https://eforge-tf.altiusinstitute.org/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

HaploTypo: a variant-calling pipeline for phased genomes

Bioinformatics ◽

10.1093/bioinformatics/btz933 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2569-2571 ◽

Cited By ~ 3

Author(s):

Cinta Pegueroles ◽

Verónica Mixão ◽

Laia Carreté ◽

Manu Molina ◽

Toni Gabaldón

Keyword(s):

Genetic Variation ◽

Genetic Variant ◽

Reference Genome ◽

Variant Calling ◽

Supplementary Information ◽

Haplotype Structure ◽

Supplementary Data ◽

Heterozygous Variant ◽

Reference Genomes

Abstract Summary An increasing number of phased (i.e. with resolved haplotypes) reference genomes are available. However, the most genetic variant calling tools do not explicitly account for haplotype structure. Here, we present HaploTypo, a pipeline tailored to resolve haplotypes in genetic variation analyses. HaploTypo infers the haplotype correspondence for each heterozygous variant called on a phased reference genome. Availability and implementation HaploTypo is implemented in Python 2.7 and Python 3.5, and is freely available at https://github.com/gabaldonlab/haplotypo, and as a Docker image. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PyRanges: efficient comparison of genomic intervals in Python

Bioinformatics ◽

10.1093/bioinformatics/btz615 ◽

2019 ◽

Cited By ~ 2

Author(s):

Endre Bakken Stovner ◽

Pål Sætrom

Keyword(s):

Data Structure ◽

Supplementary Information ◽

Supplementary Data ◽

Genomic Libraries ◽

Simple Set ◽

Set Operations ◽

Wide Range ◽

Genomic Analyses ◽

Associated Data ◽

Memory Efficient

Abstract Summary Complex genomic analyses often use sequences of simple set operations like intersection, overlap and nearest on genomic intervals. These operations, coupled with some custom programming, allow a wide range of analyses to be performed. To this end, we have written PyRanges, a data structure for representing and manipulating genomic intervals and their associated data in Python. Run single threaded on binary set operations, PyRanges is in median 2.3–9.6 times faster than the popular R GenomicRanges library and is equally memory efficient; run multi-threaded on 8 cores, our library is up to 123 times faster. PyRanges is therefore ideally suited both for individual analyses and as a foundation for future genomic libraries in Python. Availability and implementation PyRanges is available as open source under the MIT license at https://github.com/biocore-NTNU/pyranges and the documentation exists at https://biocore-NTNU.github.io/pyranges/ Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

FlaGs and webFlaGs: discovering novel biology through the analysis of gene neighbourhood conservation

Bioinformatics ◽

10.1093/bioinformatics/btaa788 ◽

2020 ◽

Author(s):

Chayan Kumar Saha ◽

Rodrigo Sanches Pires ◽

Harald Brolin ◽

Maxence Delannoy ◽

Gemma Catherine Atkinson

Keyword(s):

Phylogenetic Tree ◽

Supplementary Information ◽

Evolutionary Analysis ◽

Gene Conservation ◽

Supplementary Data ◽

Web Tool ◽

Cluster Evolution ◽

Graphical Visualization ◽

Molecular Evolutionary Analysis ◽

The Web

Abstract Summary Analysis of conservation of gene neighbourhoods over different evolutionary levels is important for understanding operon and gene cluster evolution, and predicting functional associations. Our tool FlaGs (standing for Flanking Genes) takes a list of NCBI protein accessions as input, clusters neighbourhood-encoded proteins into homologous groups using sensitive sequence searching, and outputs a graphical visualization of the gene neighbourhood and its conservation, along with a phylogenetic tree annotated with flanking gene conservation. FlaGs has demonstrated utility for molecular evolutionary analysis, having uncovered a new toxin–antitoxin system in prokaryotes and bacteriophages. The web tool version of FlaGs (webFlaGs) can optionally include a BLASTP search against a reduced RefSeq database to generate an input accession list and analyse neighbourhood conservation within the same run. Availability and implementation FlaGs can be downloaded from https://github.com/GCA-VH-lab/FlaGs or run online at http://www.webflags.se/. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CROP: correlation-based reduction of feature multiplicities in untargeted metabolomic data

Bioinformatics ◽

10.1093/bioinformatics/btaa012 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2941-2942 ◽

Cited By ~ 1

Author(s):

Štěpán Kouřil ◽

Julie de Sousa ◽

Jan Václavík ◽

David Friedecký ◽

Tomáš Adam

Keyword(s):

Mass Spectrometry ◽

Graphical Representation ◽

High Resolution Mass Spectrometry ◽

Mass Spectrometry Analysis ◽

Supplementary Information ◽

Supplementary Data ◽

Spectrometry Analysis ◽

Metabolomic Data ◽

Pairwise Correlations ◽

Resolution Mass

Abstract Summary Untargeted liquid chromatography–high-resolution mass spectrometry analysis produces a large number of features which correspond to the potential compounds in the sample that is analyzed. During the data processing, it is necessary to merge features associated with one compound to prevent multiplicities in the data and possible misidentification. The processing tools that are currently employed use complex algorithms to detect abundances, such as adducts or isotopes. However, most of them are not able to deal with unpredictable adducts and in-source fragments. We introduce a simple open-source R-script CROP based on Pearson pairwise correlations and retention time together with a graphical representation of the correlation network to remove these redundant features. Availability and implementation The CROP R-script is available online at www.github.com/rendju/CROP under GNU GPL. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ECONOMICS, STATISTICS, MATHEMATICS & COMPUTER SCIENCE: THE SPECS OF ENGINEERING ACADEMICS

10.31219/osf.io/yw8gd ◽

2020 ◽

Author(s):

JAYDIP DATTA

Keyword(s):

Data Structure ◽

Computer Science ◽

Management System ◽

Mathematical Statistics ◽

Industrial Economics ◽

System A

With Reference to earlier works like MATHEMATICAL STATISTICS: AN APPLICATION BASED STATISTICS, December 2019 , DOI : 10.13140/RG.2.2.32537.57446 / DATA STRUCTURE & MANAGEMENT SYSTEM: A REVIEW, December 2019 , DOI : 10.13140/RG.2.2.36453.96488 / OPTIMISATION: A VIEW FROM INDUSTRIAL ECONOMICS , January 2020 , DOI : 10.13140/RG.2.2.35662.61764 the following aspects of any general graduate engineering courses highlight the following feature.

Download Full-text

ccNetViz: a WebGL-based JavaScript library for visualization of large networks

Bioinformatics ◽

10.1093/bioinformatics/btaa559 ◽

2020 ◽

Vol 36 (16) ◽

pp. 4527-4529

Author(s):

Ales Saska ◽

David Tichy ◽

Robert Moore ◽

Achilles Rasquinha ◽

Caner Akdas ◽

...

Keyword(s):

Systems Biology ◽

Complex Networks ◽

Open Source ◽

High Speed ◽

A Priori ◽

Supplementary Information ◽

Network Visualization ◽

Supplementary Data ◽

Web Based ◽

Flow Of Information

Abstract Summary Visualizing a network provides a concise and practical understanding of the information it represents. Open-source web-based libraries help accelerate the creation of biologically based networks and their use. ccNetViz is an open-source, high speed and lightweight JavaScript library for visualization of large and complex networks. It implements customization and analytical features for easy network interpretation. These features include edge and node animations, which illustrate the flow of information through a network as well as node statistics. Properties can be defined a priori or dynamically imported from models and simulations. ccNetViz is thus a network visualization library particularly suited for systems biology. Availability and implementation The ccNetViz library, demos and documentation are freely available at http://helikarlab.github.io/ccNetViz/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text