A RESTful API to serve BAM file with OAuth2 compatible authorization

Mapping Intimacies ◽

10.1101/151787 ◽

2017 ◽

Author(s):

Julien Delafontaine ◽

Sylvain Pradervand

Keyword(s):

Open Source ◽

Source Code ◽

Variant Calling ◽

Use Case ◽

Web Interface ◽

Sensitive Data ◽

Link Type ◽

Restful Service

AbstractSummaryBam-server is an open-source RESTful service to query slices of BAM files securely and manage their user accesses. A typical use case is the visualization of local read alignments in a web interface for variant calling diagnostic, without exposing sensitive data to unauthorized users through the network, and without moving the original - heavy - file. Bam-server follows the standard implementation of a protected resource server in the context of a typical token-based authorization protocol, supporting HMAC- and RSA-hashed signatures from an authorization server of choice.AvailabilityThe source code is available at https://github.com/chuv-ssrc/bam-server-scala, and a complete documentation can be found at http://bam-server-scala.readthedocs.io/en/latest/[email protected]

Pairs and Pairix: a file format and a tool for efficient storage and retrieval for Hi-C read pairs

10.1101/2021.08.24.457552 ◽

2021 ◽

Author(s):

Soohyun Lee ◽

Carl Vitzthum ◽

Burak H. Alver ◽

Peter J. Park

Keyword(s):

Quality Control ◽

Open Source ◽

Source Code ◽

Three Dimensional ◽

File Format ◽

Interaction Data ◽

Text File ◽

Storage And Retrieval ◽

Link Type ◽

Efficient Storage

AbstractSummaryAs the amount of three-dimensional chromosomal interaction data continues to increase, storing and accessing such data efficiently becomes paramount. We introduce Pairs, a block-compressed text file format for storing paired genomic coordinates from Hi-C data, and Pairix, an open-source C application to index and query Pairs files. Pairix (also available in Python and R) extends the functionalities of Tabix to paired coordinates data. We have also developed PairsQC, a collapsible HTML quality control report generator for Pairs files.AvailabilityThe format specification and source code are available at https://github.com/4dn-dcic/pairix, https://github.com/4dn-dcic/Rpairix and https://github.com/4dn-dcic/[email protected] or [email protected]

MOSGA: Modular Open-Source Genome Annotator

Bioinformatics ◽

10.1093/bioinformatics/btaa1003 ◽

2020 ◽

Author(s):

Roman Martin ◽

Thomas Hackl ◽

Georges Hattab ◽

Matthias G Fischer ◽

Dominik Heider

Keyword(s):

Open Source ◽

Source Code ◽

Supplementary Information ◽

Web Interface ◽

Fully Integrated ◽

Sequencing Technologies ◽

A Genome ◽

Wide Range ◽

User Friendly ◽

Eukaryotic Genomes

Abstract Motivation The generation of high-quality assemblies, even for large eukaryotic genomes, has become a routine task for many biologists thanks to recent advances in sequencing technologies. However, the annotation of these assemblies—a crucial step toward unlocking the biology of the organism of interest—has remained a complex challenge that often requires advanced bioinformatics expertise. Results Here, we present MOSGA (Modular Open-Source Genome Annotator), a genome annotation framework for eukaryotic genomes with a user-friendly web-interface that generates and integrates annotations from various tools. The aggregated results can be analyzed with a fully integrated genome browser and are provided in a format ready for submission to NCBI. MOSGA is built on a portable, customizable and easily extendible Snakemake backend, and thus, can be tailored to a wide range of users and projects. Availability and implementation We provide MOSGA as a web service at https://mosga.mathematik.uni-marburg.de and as a docker container at registry.gitlab.com/mosga/mosga: latest. Source code can be found at https://gitlab.com/mosga/mosga Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Matchathon: A guide to student-faculty connections in PhD programs

10.1101/2020.11.06.371526 ◽

2020 ◽

Author(s):

Haley Amemiya ◽

Zena Lapp ◽

Cathy Smith ◽

Margaret Durdan ◽

Michelle DiMondo ◽

...

Keyword(s):

Open Source ◽

Source Code ◽

Faculty Members ◽

Retention Rates ◽

Link Type ◽

Shiny App ◽

R Shiny ◽

Web App ◽

Phd Programs ◽

The Web

AbstractRelevant and impactful mentors are essential to a graduate student’s career. Finding mentors can be challenging in umbrella programs with hundreds of faculty members. To foster connections between potential mentors and students with similar research interests, we created a Matchathon event, which has successfully enabled students to find mentors. We developed an easy-to-use R Shiny app (https://github.com/UM-OGPS/matchathon/) to facilitate matching and organizing the event that can be used at any institution. It is our hope that this resource will improve the environment and retention rates for students in the academy.The open source app is publicly available on the web (app: https://UM-OGPS.shinyapps.io/matchathon/; source code: https://github.com/UM-OGPS/matchathon/).

Biobtree: A tool to search and map bioinformatics identifiers and special keywords

F1000Research ◽

10.12688/f1000research.17927.3 ◽

2020 ◽

Vol 8 ◽

pp. 145

Author(s):

Tamer Gur

Keyword(s):

Web Services ◽

Open Source ◽

Resource Usage ◽

Web Interface ◽

Bioinformatics Tool ◽

Link Type ◽

As Species ◽

Storage Resource ◽

Executable File ◽

And Storage

Biobtree is a bioinformatics tool to search and map bioinformatics datasets via identifiers or special keywords such as species name. It processes large bioinformatics datasets using a specialized MapReduce-based solution with optimum computational and storage resource usage. It provides uniform and B+ tree-based database output, a web interface, web services and allows performing chain mapping queries between datasets. It can be used via a single executable file or alternatively it can be used via the R or Python-based wrapper packages which are additionally provided for easier integration into existing pipelines. Biobtree is open source and available at GitHub.

SurfStamp: 3D Printer Compatible Molecular Surface Representation

10.1101/2020.10.29.360701 ◽

2020 ◽

Author(s):

Toshiyuki Oda

Keyword(s):

Open Source ◽

Source Code ◽

3D Structure ◽

Three Dimensional ◽

Molecular Surface ◽

3D Printer ◽

3D Object ◽

Link Type ◽

Surface Models ◽

Version 2.0

AbstractSurfStamp is an application that is used to generate textures for surface models of proteins. The textures contain information about surface residues and the information is drawn directly on the 3D object of the models. This approach is more intuitive than the labeling functions that most three-dimensional (3D) structure viewers use to show residue information. Therefore, the use of this application enables researchers, readers, or audiences to easily determine which residues are contributing the surface they are focusing on.AvailabilityThe application is provided under the open-source Apache License Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0). The application and source code are available from https://github.com/yamule/SurfStamp-public/releases.

Edlib: a C/C++ library for fast, exact sequence alignment using edit distance

10.1101/070649 ◽

2016 ◽

Cited By ~ 2

Author(s):

Martin Šošić ◽

Mile Šikić

Keyword(s):

Exact Sequence ◽

Open Source ◽

Sequence Alignment ◽

Test Data ◽

Edit Distance ◽

Source Code ◽

Memory Usage ◽

Pairwise Sequence Alignment ◽

Link Type ◽

Bioinformatics Tools

AbstractWe present Edlib, an open-source C/C++ library for exact pairwise sequence alignment using edit distance. We compare Edlib to other libraries and show that it is the fastest while not lacking in functionality, and can also easily handle very large sequences. Being easy to use, flexible, fast and low on memory usage, we expect it to be a cornerstone for many future bioinformatics tools.Source code, installation instructions and test data are freely available for download at https://github.com/Martinsos/edlib, implemented in C/C++ and supported on Linux, MS Windows, and Mac OS.Contact:[email protected]

UKBCC: a cohort curation package for UK Biobank

10.1101/2020.07.12.199810 ◽

2020 ◽

Author(s):

Isabell Kiral ◽

Nathalie Willems ◽

Benjamin Goudey

Keyword(s):

Source Code ◽

Heterogeneous Data ◽

Use Case ◽

Uk Biobank ◽

Link Type ◽

Search Terms ◽

Heterogeneous Data Sources ◽

Wide Range ◽

Critical Resource ◽

The Uk

AbstractSummaryThe UK Biobank (UKB) has quickly become a critical resource for researchers conducting a wide-range of biomedical studies (Bycroft et al., 2018). The database is constructed from heterogeneous data sources, employs several different encoding schemes, and is disparately distributed throughout UKB servers. Consequently, querying these data remains complicated, making it difficult to quickly identify participants who meet a given set of criteria. We have developed UK Biobank Cohort Curator (UKBCC), a Python tool that allows researchers to rapidly construct cohorts based on a set of search terms. Here, we describe the UKBCC implementation, critical sub-modules and functions, and outline its usage through an example use case for replicable cohort creation.AvailabilityUKBCC is available through PyPi (https://pypi.org/project/ukbcc) and as open source code on GitHub (https://github.com/tool-bin/ukbcc)[email protected]

pyABC: distributed, likelihood-free inference

10.1101/162552 ◽

2017 ◽

Cited By ~ 1

Author(s):

Emmanuel Klinger ◽

Dennis Rickert ◽

Jan Hasenauer

Keyword(s):

Sequential Monte Carlo ◽

Source Code ◽

Distance Functions ◽

Web Interface ◽

Practical Application ◽

Acceptance Threshold ◽

Link Type ◽

Data Querying ◽

Approximate Bayesian ◽

Python Package

SummaryLikelihood-free methods are often required for inference in systems biology. While Approximate Bayesian Computation (ABC) provides a theoretical solution, its practical application has often been challenging due to its high computational demands. To scale likelihood-free inference to computationally demanding stochastic models we developed pyABC: a distributed and scalable ABC-Sequential Monte Carlo (ABC-SMC) framework. It implements computation-minimizing and scalable, runtime-minimizing parallelization strategies for multi-core and distributed environments scaling to thousands of cores. The framework is accessible to non-expert users and also enables advanced users to experiment with and to custom implement many options of ABC-SMC schemes, such as acceptance threshold schedules, transition kernels and distance functions without alteration of pyABC’s source code. pyABC includes a web interface to visualize ongoing and 1nished ABC-SMC runs and exposes an API for data querying and post-processing.Availability and ImplementationpyABC is written in Python 3 and is released under the GPLv3 license. The source code is hosted on https://github.com/neuralyzer/pyabc and the documentation on http://pyabc.readthedocs.io. It can be installed from the Python Package Index (PyPI).

Strelka2: Fast and accurate variant calling for clinical sequencing applications

10.1101/192872 ◽

2017 ◽

Cited By ~ 13

Author(s):

Sangtae Kim ◽

Konrad Scheffler ◽

Aaron L Halpern ◽

Mitchell A Bekritsky ◽

Eunho Noh ◽

...

Keyword(s):

Open Source ◽

Mixture Model ◽

Variant Calling ◽

Normal Sample ◽

Clinical Sequencing ◽

Model Based ◽

Link Type ◽

Sample Contamination ◽

Modeling Strategy

We describe Strelka2 (https://github.com/Illumina/strelka), an open-source small variant calling method for clinical germline and somatic sequencing applications. Strelka2 introduces a novel mixture-model based estimation of indel error parameters from each sample, an efficient tiered haplotype modeling strategy and a normal sample contamination model to improve liquid tumor analysis. For both germline and somatic calling, Strelka2 substantially outperforms current leading tools on both variant calling accuracy and compute cost.

quasitools: A Collection of Tools for Viral Quasispecies Analysis

10.1101/733238 ◽

2019 ◽

Cited By ~ 2

Author(s):

Eric Marinier ◽

Eric Enns ◽

Camy Tran ◽

Matthew Fogel ◽

Cole Peters ◽

...

Keyword(s):

Amino Acid ◽

Genetic Distance ◽

Open Source ◽

Source Code ◽

Viral Quasispecies ◽

Consensus Sequences ◽

Link Type ◽

Version 2.0 ◽

Amino Acid Variants

AbstractSummaryquasitools is a collection of newly-developed, open-source tools for analyzing viral quasispcies data. The application suite includes tools with the ability to create consensus sequences, call nucleotide, codon, and amino acid variants, calculate the complexity of a quasispecies, and measure the genetic distance between two similar quasispecies. These tools may be run independently or in user-created workflows.AvailabilityThe quasitools suite is a freely available application licensed under the Apache License, Version 2.0. The source code, documentation, and file specifications are available at: https://phac-nml.github.io/quasitools/[email protected]