Pairs and Pairix: a file format and a tool for efficient storage and retrieval for Hi-C read pairs

AbstractSummaryAs the amount of three-dimensional chromosomal interaction data continues to increase, storing and accessing such data efficiently becomes paramount. We introduce Pairs, a block-compressed text file format for storing paired genomic coordinates from Hi-C data, and Pairix, an open-source C application to index and query Pairs files. Pairix (also available in Python and R) extends the functionalities of Tabix to paired coordinates data. We have also developed PairsQC, a collapsible HTML quality control report generator for Pairs files.AvailabilityThe format specification and source code are available at https://github.com/4dn-dcic/pairix, https://github.com/4dn-dcic/Rpairix and https://github.com/4dn-dcic/[email protected] or [email protected]

Download Full-text

SurfStamp: 3D Printer Compatible Molecular Surface Representation

10.1101/2020.10.29.360701 ◽

2020 ◽

Author(s):

Toshiyuki Oda

Keyword(s):

Open Source ◽

Source Code ◽

3D Structure ◽

Three Dimensional ◽

Molecular Surface ◽

3D Printer ◽

3D Object ◽

Link Type ◽

Surface Models ◽

Version 2.0

AbstractSurfStamp is an application that is used to generate textures for surface models of proteins. The textures contain information about surface residues and the information is drawn directly on the 3D object of the models. This approach is more intuitive than the labeling functions that most three-dimensional (3D) structure viewers use to show residue information. Therefore, the use of this application enables researchers, readers, or audiences to easily determine which residues are contributing the surface they are focusing on.AvailabilityThe application is provided under the open-source Apache License Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0). The application and source code are available from https://github.com/yamule/SurfStamp-public/releases.

Download Full-text

The Popgen Pipeline Platform: A Software Platform for Facilitating Population Genomic Analyses

10.1101/785774 ◽

2019 ◽

Author(s):

Andrew Webb ◽

Jared Knoblauch ◽

Nitesh Sabankar ◽

Apeksha Sukesh Kallur ◽

Jody Hey ◽

...

Keyword(s):

Open Source ◽

Development Time ◽

End Users ◽

File Format ◽

Software Platform ◽

Format Conversion ◽

Link Type ◽

Population Genomic ◽

Genomic Analyses ◽

File Format Conversion

AbstractHere we present the Pop-Gen Pipeline Platform (PPP), a software platform with the goal of reducing the computational expertise required for conducting population genomic analyses. The PPP was designed as a collection of scripts that facilitate common population genomic workflows in a consistent and standardized Python environment. Functions were developed to encompass entire workflows, including: input preparation, file format conversion, various population genomic analyses, output generation, and visualization. By facilitating entire workflows, the PPP offers several benefits to prospective end users - it reduces the need of redundant in-house software and scripts that would require development time and may be error-prone, or incorrect. The platform has also been developed with reproducibility and extensibility of analyses in mind. The PPP is an open-source package that is available for download and use at https://ppp.readthedocs.io/en/latest/PPP_pages/install.html

Download Full-text

Matchathon: A guide to student-faculty connections in PhD programs

10.1101/2020.11.06.371526 ◽

2020 ◽

Author(s):

Haley Amemiya ◽

Zena Lapp ◽

Cathy Smith ◽

Margaret Durdan ◽

Michelle DiMondo ◽

...

Keyword(s):

Open Source ◽

Source Code ◽

Faculty Members ◽

Retention Rates ◽

Link Type ◽

Shiny App ◽

R Shiny ◽

Web App ◽

Phd Programs ◽

The Web

AbstractRelevant and impactful mentors are essential to a graduate student’s career. Finding mentors can be challenging in umbrella programs with hundreds of faculty members. To foster connections between potential mentors and students with similar research interests, we created a Matchathon event, which has successfully enabled students to find mentors. We developed an easy-to-use R Shiny app (https://github.com/UM-OGPS/matchathon/) to facilitate matching and organizing the event that can be used at any institution. It is our hope that this resource will improve the environment and retention rates for students in the academy.The open source app is publicly available on the web (app: https://UM-OGPS.shinyapps.io/matchathon/; source code: https://github.com/UM-OGPS/matchathon/).

Download Full-text

A RESTful API to serve BAM file with OAuth2 compatible authorization

10.1101/151787 ◽

2017 ◽

Author(s):

Julien Delafontaine ◽

Sylvain Pradervand

Keyword(s):

Open Source ◽

Source Code ◽

Variant Calling ◽

Use Case ◽

Web Interface ◽

Sensitive Data ◽

Link Type ◽

Restful Service

AbstractSummaryBam-server is an open-source RESTful service to query slices of BAM files securely and manage their user accesses. A typical use case is the visualization of local read alignments in a web interface for variant calling diagnostic, without exposing sensitive data to unauthorized users through the network, and without moving the original - heavy - file. Bam-server follows the standard implementation of a protected resource server in the context of a typical token-based authorization protocol, supporting HMAC- and RSA-hashed signatures from an authorization server of choice.AvailabilityThe source code is available at https://github.com/chuv-ssrc/bam-server-scala, and a complete documentation can be found at http://bam-server-scala.readthedocs.io/en/latest/[email protected]

Download Full-text

fastp: an ultra-fast all-in-one FASTQ preprocessor

10.1101/274100 ◽

2018 ◽

Cited By ~ 12

Author(s):

Shifu Chen ◽

Yanqing Zhou ◽

Yaru Chen ◽

Jia Gu

Keyword(s):

Quality Control ◽

Open Source ◽

Programming Languages ◽

Source Code ◽

Data Filtering ◽

Quality Filtering ◽

Downstream Analysis ◽

High Level ◽

Unique Molecular Identifier ◽

Adapter Trimming

AbstractMotivationQuality control and preprocessing of FASTQ files are essential to providing clean data for downstream analysis. Traditionally, a different tool is used for each operation, such as quality control, adapter trimming, and quality filtering. These tools are often insufficiently fast as most are developed using high-level programming languages (e.g., Python and Java) and provide limited multi-threading support. Reading and loading data multiple times also renders preprocessing slow and I/O inefficient.ResultsWe developed fastp as an ultra-fast FASTQ preprocessor with useful quality control and data-filtering features. It can perform quality control, adapter trimming, quality filtering, per-read quality cutting, and many other operations with a single scan of the FASTQ data. It also supports unique molecular identifier preprocessing, poly tail trimming, output splitting, and base correction for paired-end data. It can automatically detect adapters for single-end and paired-end FASTQ data. This tool is developed in C++ and has multi-threading support. Based on our evaluation, fastp is 2–5 times faster than other FASTQ preprocessing tools such as Trimmomatic or Cutadapt despite performing far more operations than similar tools.Availability and ImplementationThe open-source code and corresponding instructions are available at https://github.com/OpenGene/[email protected]

Download Full-text

PDBeCIF: an open-source mmCIF/CIF parsing and processing package

BMC Bioinformatics ◽

10.1186/s12859-021-04271-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Glen van Ginkel ◽

Lukáš Pravda ◽

José M. Dana ◽

Mihaly Varadi ◽

Peter Keller ◽

...

Keyword(s):

Open Source ◽

Protein Data Bank ◽

Hybrid Methods ◽

Structural Data ◽

Data Bank ◽

File Format ◽

Macromolecular Complexes ◽

Link Type ◽

Information File ◽

Multiscale Structures

Abstract Background Biomacromolecular structural data outgrew the legacy Protein Data Bank (PDB) format which the scientific community relied on for decades, yet the use of its successor PDBx/Macromolecular Crystallographic Information File format (PDBx/mmCIF) is still not widespread. Perhaps one of the reasons is the availability of easy to use tools that only support the legacy format, but also the inherent difficulties of processing mmCIF files correctly, given the number of edge cases that make efficient parsing problematic. Nevertheless, to fully exploit macromolecular structure data and their associated annotations such as multiscale structures from integrative/hybrid methods or large macromolecular complexes determined using traditional methods, it is necessary to fully adopt the new format as soon as possible. Results To this end, we developed PDBeCIF, an open-source Python project for manipulating mmCIF and CIF files. It is part of the official list of mmCIF parsers recorded by the wwPDB and is heavily employed in the processes of the Protein Data Bank in Europe. The package is freely available both from the PyPI repository (http://pypi.org/project/pdbecif) and from GitHub (https://github.com/pdbeurope/pdbecif) along with rich documentation and many ready-to-use examples. Conclusions PDBeCIF is an efficient and lightweight Python 2.6+/3+ package with no external dependencies. It can be readily integrated with 3rd party libraries as well as adopted for broad scientific analyses.

Download Full-text

Edlib: a C/C++ library for fast, exact sequence alignment using edit distance

10.1101/070649 ◽

2016 ◽

Cited By ~ 2

Author(s):

Martin Šošić ◽

Mile Šikić

Keyword(s):

Exact Sequence ◽

Open Source ◽

Sequence Alignment ◽

Test Data ◽

Edit Distance ◽

Source Code ◽

Memory Usage ◽

Pairwise Sequence Alignment ◽

Link Type ◽

Bioinformatics Tools

AbstractWe present Edlib, an open-source C/C++ library for exact pairwise sequence alignment using edit distance. We compare Edlib to other libraries and show that it is the fastest while not lacking in functionality, and can also easily handle very large sequences. Being easy to use, flexible, fast and low on memory usage, we expect it to be a cornerstone for many future bioinformatics tools.Source code, installation instructions and test data are freely available for download at https://github.com/Martinsos/edlib, implemented in C/C++ and supported on Linux, MS Windows, and Mac OS.Contact:[email protected]

Download Full-text

PhyloCSF++: A fast and user-friendly implementation of PhyloCSF with annotation tools

10.1101/2021.03.10.434297 ◽

2021 ◽

Author(s):

Christopher Pockrandt ◽

Martin Steinegger ◽

Steven L. Salzberg

Keyword(s):

Source Code ◽

File Format ◽

Sequence Alignments ◽

Multiple Sequence ◽

Protein Coding ◽

Multiple Sequence Alignments ◽

Coding Regions ◽

Link Type ◽

A Genome ◽

User Friendly

AbstractSummaryPhyloCSF++ is an efficient and parallelized C++ implementation of the popular PhyloCSF method to distinguish protein-coding and non-coding regions in a genome based on multiple sequence alignments. It can score alignments or produce browser tracks for entire genomes in the wig file format. Additionally, PhyloCSF++ annotates coding sequences in GFF/GTF files using precomputed tracks or computes and scores multiple sequence alignments on the fly with MMseqs.AvailabilityPhyloCSF++ is released under the AGPLv3 license. Binaries and source code are available at https://github.com/cpockrandt/PhyloCSFpp. The software can be installed through bioconda. A variety of tracks can be accessed through ftp://ftp.ccb.jhu.edu/pub/software/phylocsf++/[email protected], [email protected]

Download Full-text

Personalization of structural PDB files.

Acta Biochimica Polonica ◽

10.18388/abp.2013_2025 ◽

2013 ◽

Vol 60 (4) ◽

Author(s):

Tomasz Woźniak ◽

Ryszard W Adamiak

Keyword(s):

Open Source ◽

Three Dimensional ◽

Dimensional Structure ◽

File Format ◽

Three Dimensional Structure

PDB format is most commonly applied by various programs to define three-dimensional structure of biomolecules. However, the programs often use different versions of the format. Thus far, no comprehensive solution for unifying the PDB formats has been developed. Here we present an open-source, Python-based tool called PDBinout for processing and conversion of various versions of PDB file format for biostructural applications. Moreover, PDBinout allows to create one's own PDB versions. PDBinout is freely available under the LGPL licence at http://pdbinout.ibch.poznan.pl.

Download Full-text

Three-dimensional semi-automated volumetric assessment of the pulp space of teeth following regenerative dental procedures

Scientific Reports ◽

10.1038/s41598-021-01489-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Heeresh Shetty ◽

Shishir Shetty ◽

Adesh Kakade ◽

Aditya Shetty ◽

Mohmed Isaqali Karobari ◽

...

Keyword(s):

Medical Imaging ◽

Open Source ◽

Three Dimensional ◽

Volume Estimation ◽

Permanent Teeth ◽

3D Slicer ◽

Link Type ◽

Volumetric Assessment ◽

Critical Measure ◽

Significant Difference

AbstractThe volumetric change that occurs in the pulp space over time represents a critical measure when it comes to determining the secondary outcomes of regenerative endodontic procedures (REPs). However, to date, only a few studies have investigated the accuracy of the available domain-specialized medical imaging tools with regard to three-dimensional (3D) volumetric assessment. This study sought to compare the accuracy of two different artificial intelligence-based medical imaging programs namely OsiriX MD (v 9.0, Pixmeo SARL, Bernex Switzerland, https://www.osirix-viewer.com) and 3D Slicer (http://www.slicer.org), in terms of estimating the volume of the pulp space following a REP. An Invitro assessment was performed to check the reliability and sensitivity of the two medical imaging programs in use. For the subsequent clinical application, pre- and post-procedure cone beam computed tomography scans of 35 immature permanent teeth with necrotic pulp and periradicular pathosis that had been treated with a cell-homing concept-based REP were processed using the two biomedical DICOM software programs (OsiriX MD and 3D Slicer). The volumetric changes in the teeth’s pulp spaces were assessed using semi-automated techniques in both programs. The data were statistically analyzed using t-tests and paired t-tests (P = 0.05). The pulp space volumes measured using both programs revealed a statistically significant decrease in the pulp space volume following the REP (P < 0.05), with no significant difference being found between the two programs (P > 0.05). The mean decreases in the pulp space volumes measured using OsiriX MD and 3D Slicer were 25.06% ± 19.45% and 26.10% ± 18.90%, respectively. The open-source software (3D Slicer) was found to be as accurate as the commercially available software with regard to the volumetric assessment of the post-REP pulp space. This study was the first to demonstrate the step-by-step application of 3D Slicer, a user-friendly and easily accessible open-source multiplatform software program for the segmentation and volume estimation of the pulp spaces of teeth treated with REPs.

Download Full-text