MOSGA: Modular Open-Source Genome Annotator

Bioinformatics ◽

10.1093/bioinformatics/btaa1003 ◽

2020 ◽

Author(s):

Roman Martin ◽

Thomas Hackl ◽

Georges Hattab ◽

Matthias G Fischer ◽

Dominik Heider

Keyword(s):

Open Source ◽

Source Code ◽

Supplementary Information ◽

Web Interface ◽

Fully Integrated ◽

Sequencing Technologies ◽

A Genome ◽

Wide Range ◽

User Friendly ◽

Eukaryotic Genomes

Abstract Motivation The generation of high-quality assemblies, even for large eukaryotic genomes, has become a routine task for many biologists thanks to recent advances in sequencing technologies. However, the annotation of these assemblies—a crucial step toward unlocking the biology of the organism of interest—has remained a complex challenge that often requires advanced bioinformatics expertise. Results Here, we present MOSGA (Modular Open-Source Genome Annotator), a genome annotation framework for eukaryotic genomes with a user-friendly web-interface that generates and integrates annotations from various tools. The aggregated results can be analyzed with a fully integrated genome browser and are provided in a format ready for submission to NCBI. MOSGA is built on a portable, customizable and easily extendible Snakemake backend, and thus, can be tailored to a wide range of users and projects. Availability and implementation We provide MOSGA as a web service at https://mosga.mathematik.uni-marburg.de and as a docker container at registry.gitlab.com/mosga/mosga: latest. Source code can be found at https://gitlab.com/mosga/mosga Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

G-OnRamp: a Galaxy-based platform for collaborative annotation of eukaryotic genomes

Bioinformatics ◽

10.1093/bioinformatics/btz309 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4422-4423 ◽

Cited By ~ 4

Author(s):

Yating Liu ◽

Luke Sargent ◽

Wilson Leung ◽

Sarah C R Elgin ◽

Jeremy Goecks

Keyword(s):

Genome Annotation ◽

Source Code ◽

Supplementary Information ◽

Rna Seq ◽

Sequence Alignments ◽

Web Based ◽

Collaborative Annotation ◽

Genome Browsers ◽

User Friendly ◽

Eukaryotic Genomes

Abstract Summary G-OnRamp provides a user-friendly, web-based platform for collaborative, end-to-end annotation of eukaryotic genomes using UCSC Assembly Hubs and JBrowse/Apollo genome browsers with evidence tracks derived from sequence alignments, ab initio gene predictors, RNA-Seq data and repeat finders. G-OnRamp can be used to visualize large genomics datasets and to perform collaborative genome annotation projects in both research and educational settings. Availability and implementation The virtual machine images and tutorials are available on the G-OnRamp web site (http://g-onramp.org/deployments). The source code is available under an Academic Free License version 3.0 through the goeckslab GitHub repository (https://github.com/goeckslab). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

The intrinsic combinatorial organization and information theoretic content of a sequence are correlated to the DNA encoded nucleosome organization of eukaryotic genomes

Bioinformatics ◽

10.1093/bioinformatics/btv679 ◽

2015 ◽

Vol 32 (6) ◽

pp. 835-842 ◽

Cited By ~ 9

Author(s):

Filippo Utro ◽

Valeria Di Benedetto ◽

Davide F.V. Corona ◽

Raffaele Giancarlo

Keyword(s):

Closed Form ◽

Dna Sequence ◽

Chemical Properties ◽

Supplementary Information ◽

Information Theoretic ◽

Nucleosome Organization ◽

A Genome ◽

Intrinsic Complexity ◽

Mathematical Formulas ◽

Eukaryotic Genomes

Abstract Motivation: Thanks to research spanning nearly 30 years, two major models have emerged that account for nucleosome organization in chromatin: statistical and sequence specific. The first is based on elegant, easy to compute, closed-form mathematical formulas that make no assumptions of the physical and chemical properties of the underlying DNA sequence. Moreover, they need no training on the data for their computation. The latter is based on some sequence regularities but, as opposed to the statistical model, it lacks the same type of closed-form formulas that, in this case, should be based on the DNA sequence only. Results: We contribute to close this important methodological gap between the two models by providing three very simple formulas for the sequence specific one. They are all based on well-known formulas in Computer Science and Bioinformatics, and they give different quantifications of how complex a sequence is. In view of how remarkably well they perform, it is very surprising that measures of sequence complexity have not even been considered as candidates to close the mentioned gap. We provide experimental evidence that the intrinsic level of combinatorial organization and information-theoretic content of subsequences within a genome are strongly correlated to the level of DNA encoded nucleosome organization discovered by Kaplan et al. Our results establish an important connection between the intrinsic complexity of subsequences in a genome and the intrinsic, i.e. DNA encoded, nucleosome organization of eukaryotic genomes. It is a first step towards a mathematical characterization of this latter ‘encoding’. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: [email protected].

Download Full-text

MetaADEDB 2.0: a comprehensive database on adverse drug events

Bioinformatics ◽

10.1093/bioinformatics/btaa973 ◽

2020 ◽

Author(s):

Zhuohang Yu ◽

Zengrui Wu ◽

Weihua Li ◽

Guixia Liu ◽

Yun Tang

Keyword(s):

Safety Assessment ◽

Adverse Drug Events ◽

Adverse Event Reporting System ◽

Adverse Event Reporting ◽

Supplementary Information ◽

Online Database ◽

Web Interface ◽

Drug Discovery And Development ◽

Comprehensive Information ◽

User Friendly

Abstract Summary MetaADEDB is an online database we developed to integrate comprehensive information on adverse drug events (ADEs). The first version of MetaADEDB was released in 2013 and has been widely used by researchers. However, it has not been updated for more than seven years. Here, we reported its second version by collecting more and newer data from the U.S. FDA Adverse Event Reporting System (FAERS) and Canada Vigilance Adverse Reaction Online Database, in addition to the original three sources. The new version consists of 744 709 drug–ADE associations between 8498 drugs and 13 193 ADEs, which has an over 40% increase in drug–ADE associations compared to the previous version. Meanwhile, we developed a new and user-friendly web interface for data search and analysis. We hope that MetaADEDB 2.0 could provide a useful tool for drug safety assessment and related studies in drug discovery and development. Availability and implementation The database is freely available at: http://lmmd.ecust.edu.cn/metaadedb/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SVIM-asm: Structural variant detection from haploid and diploid genome assemblies

10.1101/2020.10.27.356907 ◽

2020 ◽

Author(s):

David Heller ◽

Martin Vingron

Keyword(s):

Genetic Information ◽

Source Code ◽

Supplementary Information ◽

Supplementary Data ◽

Diploid Genome ◽

Insertions And Deletions ◽

Structural Variant ◽

Sequencing Technologies ◽

Variant Detection ◽

Genome Assemblies

AbstractMotivationWith the availability of new sequencing technologies, the generation of haplotype-resolved genome assemblies up to chromosome scale has become feasible. These assemblies capture the complete genetic information of both parental haplotypes, increase structural variant (SV) calling sensitivity and enable direct genotyping and phasing of SVs. Yet, existing SV callers are designed for haploid genome assemblies only, do not support genotyping or detect only a limited set of SV classes.ResultsWe introduce our method SVIM-asm for the detection and genotyping of six common classes of SVs from haploid and diploid genome assemblies. Compared against the only other existing SV caller for diploid assemblies, DipCall, SVIM-asm detects more SV classes and reached higher F1 scores for the detection of insertions and deletions on two recently published assemblies of the HG002 individual.Availability and ImplementationSVIM-asm has been implemented in Python and can be easily installed via bioconda. Its source code is available at github.com/eldariont/[email protected] informationSupplementary data are available online.

Download Full-text

SHEMAT-Suite: a parallel open source simulator for flow, heat and mass transport in porous media

10.5194/egusphere-egu21-15084 ◽

2021 ◽

Author(s):

Johannes Keller ◽

Johanna Fink ◽

Norbert Klitzsch

Keyword(s):

Porous Media ◽

Mass Transport ◽

Open Source ◽

Source Code ◽

Modular Structure ◽

Numerical Code ◽

Heat And Mass Transport ◽

Transport In Porous Media ◽

Wide Range ◽

Flow Heat

We present SHEMAT-Suite, a numerical code for simulating flow, heat, and mass transport in porous media that has been published as an open source code recently. The functionality of SHEMAT-Suite comprises pure forward computation, deterministic Bayesian inversion, and stochastic Monte Carlo simulation and data assimilation. Additionally, SHEMAT-Suite features a multi-level OpenMP parallelization. Along with the source code of the software, extensive documentation and a suite of test models is provided.SHEMAT-Suite has a modular structure that makes it easy for users to adapt the code to their needs. Most importantly, there is an interface for defining the functional relationship between dynamic variables and subsurface parameters. Additionally, user-defined input and output can be implemented without interfering with the core of the code. Finally, at a deeper level, linear solvers and preconditioners can be added to the code.We present studies that have made use of the code's HPC capabilities. SHEMAT-Suite has been applied to large-scale groundwater models for a wide range of purposes, including studying the formation of convection cells, assessing geothermal potential below an office building, or modeling submarine groundwater discharge since the last ice age. The modular structure of SHEMAT-Suite has also led to diverse applications, such as glacier modeling, simulation of borehole heat exchangers, or Optimal Experimental Design applied to the placing of geothermal boreholes.Further, we present ongoing developments for improving the performance of SHEMAT-Suite, both by refactoring the source code and by interfacing SHEMAT-Suite with up-to-date HPC software. Examples of this include interfacing SHEMAT-Suite with the Portable Data Interface (PDI) for improved data management, interfacing SHEMAT-Suite with PetSC for MPI-parallel solvers, and interfacing SHEMAT-Suite with PDAF for parallel EnKF algorithms.The goal for the open source SHEMAT-Suite is to provide a rigorously tested core code for flow, heat and transport simulation, Bayesian and stochastic inversion, while at the same time enabling a wide range of scientific research through straightforward user interaction.

Download Full-text

DrawGlycan-SNFG and gpAnnotate: rendering glycans and annotating glycopeptide mass spectra

Bioinformatics ◽

10.1093/bioinformatics/btz819 ◽

2019 ◽

Cited By ~ 4

Author(s):

Kai Cheng ◽

Gabrielle Pawlowski ◽

Xinheng Yu ◽

Yusen Zhou ◽

Sriram Neelamegham

Keyword(s):

Mass Spectrometry ◽

Open Source ◽

Mass Spectra ◽

Supplementary Information ◽

Supplementary Data ◽

International Union ◽

Open Source Program ◽

Source Program ◽

Wide Range ◽

Peptide Modifications

Abstract Summary This manuscript describes an open-source program, DrawGlycan-SNFG (version 2), that accepts IUPAC (International Union of Pure and Applied Chemist)-condensed inputs to render Symbol Nomenclature For Glycans (SNFG) drawings. A wide range of local and global options enable display of various glycan/peptide modifications including bond breakages, adducts, repeat structures, ambiguous identifications etc. These facilities make DrawGlycan-SNFG ideal for integration into various glycoinformatics software, including glycomics and glycoproteomics mass spectrometry (MS) applications. As a demonstration of such usage, we incorporated DrawGlycan-SNFG into gpAnnotate, a standalone application to score and annotate individual MS/MS glycopeptide spectrum in different fragmentation modes. Availability and implementation DrawGlycan-SNFG and gpAnnotate are platform independent. While originally coded using MATLAB, compiled packages are also provided to enable DrawGlycan-SNFG implementation in Python and Java. All programs are available from https://virtualglycome.org/drawglycan; https://virtualglycome.org/gpAnnotate. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Temporal network alignment via GoT-WAVE

Bioinformatics ◽

10.1093/bioinformatics/btz119 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3527-3529 ◽

Cited By ~ 3

Author(s):

David Aparício ◽

Pedro Ribeiro ◽

Tijana Milenković ◽

Fernando Silva

Keyword(s):

User Interface ◽

State Of The Art ◽

Source Code ◽

Network Alignment ◽

Supplementary Information ◽

Temporal Network ◽

Temporal Networks ◽

Supplementary Data ◽

Node Similarity ◽

User Friendly

Abstract Motivation Network alignment (NA) finds conserved regions between two networks. NA methods optimize node conservation (NC) and edge conservation. Dynamic graphlet degree vectors are a state-of-the-art dynamic NC measure, used within the fastest and most accurate NA method for temporal networks: DynaWAVE. Here, we use graphlet-orbit transitions (GoTs), a different graphlet-based measure of temporal node similarity, as a new dynamic NC measure within DynaWAVE, resulting in GoT-WAVE. Results On synthetic networks, GoT-WAVE improves DynaWAVE’s accuracy by 30% and speed by 64%. On real networks, when optimizing only dynamic NC, the methods are complementary. Furthermore, only GoT-WAVE supports directed edges. Hence, GoT-WAVE is a promising new temporal NA algorithm, which efficiently optimizes dynamic NC. We provide a user-friendly user interface and source code for GoT-WAVE. Availability and implementation http://www.dcc.fc.up.pt/got-wave/ Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DamageProfiler: Fast damage pattern calculation for ancient DNA

Bioinformatics ◽

10.1093/bioinformatics/btab190 ◽

2021 ◽

Author(s):

Judith Neukamm ◽

Alexander Peltzer ◽

Kay Nieselt

Keyword(s):

Ancient Dna ◽

Source Code ◽

Supplementary Information ◽

Command Line ◽

Central Importance ◽

Command Line Interface ◽

Analysis Pipeline ◽

File Formats ◽

Programming Knowledge ◽

User Friendly

Abstract Motivation In ancient DNA research, the authentication of ancient samples based on specific features remains a crucial step in data analysis. Because of this central importance, researchers lacking deeper programming knowledge should be able to run a basic damage authentication analysis. Such software should be user-friendly and easy to integrate into an analysis pipeline. Results DamageProfiler is a Java based, stand-alone software to determine damage patterns in ancient DNA. The results are provided in various file formats and plots for further processing. DamageProfiler has an intuitive graphical as well as command line interface that allows the tool to be easily embedded into an analysis pipeline. Availability All of the source code is freely available on GitHub (https://github.com/Integrative-Transcriptomics/DamageProfiler). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CellTracker (not only) for dummies

Bioinformatics ◽

10.1093/bioinformatics/btv686 ◽

2015 ◽

Vol 32 (6) ◽

pp. 955-957 ◽

Cited By ~ 46

Author(s):

Filippo Piccinini ◽

Alexa Kiss ◽

Peter Horvath

Keyword(s):

Graphical User Interface ◽

Open Source Software ◽

Phase Contrast ◽

Cell Tracking ◽

Source Code ◽

Software Tool ◽

Time Lapse ◽

Supplementary Information ◽

Differential Interference Contrast ◽

User Friendly

Abstract Motivation: Time-lapse experiments play a key role in studying the dynamic behavior of cells. Single-cell tracking is one of the fundamental tools for such analyses. The vast majority of the recently introduced cell tracking methods are limited to fluorescently labeled cells. An equally important limitation is that most software cannot be effectively used by biologists without reasonable expertise in image processing. Here we present CellTracker, a user-friendly open-source software tool for tracking cells imaged with various imaging modalities, including fluorescent, phase contrast and differential interference contrast (DIC) techniques. Availability and implementation: CellTracker is written in MATLAB (The MathWorks, Inc., USA). It works with Windows, Macintosh and UNIX-based systems. Source code and graphical user interface (GUI) are freely available at: http://celltracker.website/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

LRez: C ++ API and toolkit for analyzing and managing Linked-Reads data

Bioinformatics Advances ◽

10.1093/bioadv/vbab022 ◽

2021 ◽

Author(s):

Pierre Morisse ◽

Claire Lemaitre ◽

Fabrice Legeai

Keyword(s):

Genome Assembly ◽

Low Cost ◽

Variant Calling ◽

Supplementary Information ◽

Supplementary Data ◽

High Quality ◽

Dna Molecule ◽

Sequencing Technologies ◽

Wide Range ◽

Genomic Regions

Abstract Motivation Linked-Reads technologies combine both the high-quality and low cost of short-reads sequencing and long-range information, through the use of barcodes tagging reads which originate from a common long DNA molecule. This technology has been employed in a broad range of applications including genome assembly, phasing and scaffolding, as well as structural variant calling. However, to date, no tool or API dedicated to the manipulation of Linked-Reads data exist. Results We introduce LRez, a C ++ API and toolkit which allows easy management of Linked-Reads data. LRez includes various functionalities, for computing numbers of common barcodes between genomic regions, extracting barcodes from BAM files, as well as indexing and querying BAM, FASTQ and gzipped FASTQ files to quickly fetch all reads or alignments containing a given barcode. LRez is compatible with a wide range of Linked-Reads sequencing technologies, and can thus be used in any tool or pipeline requiring barcode processing or indexing, in order to improve their performances. Availability and implementation LRez is implemented in C ++, supported on Unix-based platforms, and available under AGPL-3.0 License at https://github.com/morispi/LRez, and as a bioconda module. Supplementary information Supplementary data are available at Bioinformatics Advances

Download Full-text