Jasmine: a Java pipeline for isomiR characterization in miRNA-Seq data

Bioinformatics ◽

10.1093/bioinformatics/btz806 ◽

2019 ◽

Cited By ~ 2

Author(s):

Xiangfu Zhong ◽

Albert Pla ◽

Simon Rayner

Keyword(s):

Population Structure ◽

Software Tool ◽

Supplementary Information ◽

Supplementary Data ◽

Analysis Pipeline ◽

Detailed Characterization ◽

Fasta Format ◽

Java Application

Abstract Motivation The existence of complex subpopulations of miRNA isoforms, or isomiRs, is well established. While many tools exist for investigating isomiR populations, they differ in how they characterize an isomiR, making it difficult to compare results across different tools. Thus, there is a need for a more comprehensive and systematic standard for defining isomiRs. Such a standard would allow investigation of isomiR population structure in progressively more refined sub-populations, permitting the identification of more subtle changes between conditions and leading to an improved understanding of the processes that generate these differences. Results We developed Jasmine, a software tool that incorporates a hierarchal framework for characterizing isomiR populations. Jasmine is a Java application that can process raw read data in fastq/fasta format, or mapped reads in SAM format to produce a detailed characterization of isomiR populations. Thus, Jasmine can reveal structure not apparent in a standard miRNA-Seq analysis pipeline. Availability and implementation Jasmine is implemented in Java and R and freely available at bitbucket https://bitbucket.org/bipous/jasmine/src/master/. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PATO: Pangenome Analysis Toolkit

10.1101/2021.01.30.428878 ◽

2021 ◽

Author(s):

Miguel D. Fernández-de-Bobadilla ◽

Alba Talavera-Rodríguez ◽

Lucía Chacón ◽

Fernando Baquero ◽

Teresa M. Coque ◽

...

Keyword(s):

Population Structure ◽

Statistical Analysis ◽

Core Genome ◽

State Of The Art ◽

Source Code ◽

Supplementary Information ◽

Complete Analysis ◽

Large Set ◽

Supplementary Data ◽

Desktop Computer

AbstractMotivationComparative genomics is a growing field but one that will be eventually overtaken by sample size studies and the increase of available genomes in public databases. We present the Pangenome Analysis Toolkit (PATO) designed to simultaneously analyze thousands of genomes using a desktop computer. The tool performs common tasks of pangenome analysis such as core-genome definition and accessory genome properties and includes new features that help characterize population structure, annotate pathogenic features and create gene sharedness networks. PATO has been developed in R to integrate with the large set of tools available for genetic, phylogenetic and statistical analysis in this environment.ResultsPATO can perform the most demanding bioinformatic analyses in minutes with an accuracy comparable to state-of-the-art software but 20–30x times faster. PATO also integrates all the necessary functions for the complete analysis of the most common objectives in microbiology studies. Lastly, PATO includes the necessary tools for visualizing the results and can be integrated with other analytical packages available in R.AvailabilityThe source code for PATO is freely available at https://github.com/irycisBioinfo/PATO under the GPLv3 [email protected] informationSupplementary data are available at Bioinformatics online

Download Full-text

PolishEM: image enhancement in FIB–SEM

Bioinformatics ◽

10.1093/bioinformatics/btaa218 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3947-3948

Author(s):

Jose-Jesus Fernandez ◽

Teobaldo E Torres ◽

Eva Martin-Solana ◽

Gerardo F Goya ◽

Maria-Rosario Fernandez-Fernandez

Keyword(s):

Electron Microscopy ◽

Scanning Electron Microscopy ◽

Focused Ion Beam ◽

Ion Beam ◽

Source Code ◽

Software Tool ◽

Supplementary Information ◽

Supplementary Data ◽

Efficient Processing ◽

Scanning Electron

Abstract Summary We have developed a software tool to improve the image quality in focused ion beam–scanning electron microscopy (FIB–SEM) stacks: PolishEM. Based on a Gaussian blur model, it automatically estimates and compensates for the blur affecting each individual image. It also includes correction for artifacts commonly arising in FIB–SEM (e.g. curtaining). PolishEM has been optimized for an efficient processing of huge FIB–SEM stacks on standard computers. Availability and implementation PolishEM has been developed in C. GPL source code and binaries for Linux, OSX and Windows are available at http://www.cnb.csic.es/%7ejjfernandez/polishem. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

TCRpair: prediction of functional pairing between HLA-A*02:01-restricted T cell receptor α and β chains

Bioinformatics ◽

10.1093/bioinformatics/btab573 ◽

2021 ◽

Author(s):

Anja Mösch ◽

Dmitrij Frishman

Keyword(s):

Amino Acids ◽

T Cell ◽

T Cell Receptor ◽

Mrna Level ◽

Cell Receptor ◽

Supplementary Information ◽

Supplementary Data ◽

Β Chain ◽

Complementarity Determining Region

Abstract Summary The ability of a T cell to recognize foreign peptides is defined by a single α and a single β hypervariable complementarity determining region (CDR3), which together form the T cell receptor (TCR) heterodimer. In ∼30%-35% of T cells, two α chains are expressed at the mRNA level but only one α chain is part of the functional TCR. This effect can also be observed for β chains, although it is less common. The identification of functional α/β chain pairs is instrumental in high-throughput characterization of therapeutic TCRs. TCRpair is the first method that predicts whether an α and β chain pair forms a functional, HLA-A*02:01 specific TCR without requiring the sequence of a recognized peptide. By taking additional amino acids flanking the CDR3 regions into account, TCRpair achieves an AUC of 0.71. Availability TCRpair is implemented in Python using TensorFlow 2.0 and is freely available at https://www.github.com/amoesch/TCRpair Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

debCAM: a bioconductor R package for fully unsupervised deconvolution of complex tissues

Bioinformatics ◽

10.1093/bioinformatics/btaa205 ◽

2020 ◽

Vol 36 (12) ◽

pp. 3927-3929 ◽

Cited By ~ 3

Author(s):

Lulu Chen ◽

Chiung-Ting Wu ◽

Niya Wang ◽

David M Herrington ◽

Robert Clarke ◽

...

Keyword(s):

Expression Profiles ◽

Software Tool ◽

R Package ◽

Supplementary Information ◽

Tissue Cell ◽

Deconvolution Method ◽

Imaging Data ◽

Specific Expression ◽

Knowledge Incorporation

Abstract Summary We develop a fully unsupervised deconvolution method to dissect complex tissues into molecularly distinctive tissue or cell subtypes based on bulk expression profiles. We implement an R package, deconvolution by Convex Analysis of Mixtures (debCAM) that can automatically detect tissue/cell-specific markers, determine the number of constituent subtypes, calculate subtype proportions in individual samples and estimate tissue/cell-specific expression profiles. We demonstrate the performance and biomedical utility of debCAM on gene expression, methylation, proteomics and imaging data. With enhanced data preprocessing and prior knowledge incorporation, debCAM software tool will allow biologists to perform a more comprehensive and unbiased characterization of tissue remodeling in many biomedical contexts. Availability and implementation http://bioconductor.org/packages/debCAM. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

poreTally: run and publish de novo nanopore assembler benchmarks

Bioinformatics ◽

10.1093/bioinformatics/bty1045 ◽

2018 ◽

Vol 35 (15) ◽

pp. 2663-2664 ◽

Cited By ~ 2

Author(s):

Carlos de Lannoy ◽

Judith Risse ◽

Dick de Ridder

Keyword(s):

Nucleic Acid ◽

De Novo Assembly ◽

De Novo ◽

Supplementary Information ◽

Nanopore Sequencing ◽

Supplementary Data ◽

Analysis Pipeline ◽

Tool Performance ◽

Nucleic Acid Analysis ◽

Assembly Tool

Abstract Summary Nanopore sequencing is a novel development in nucleic acid analysis. As such, nanopore-sequencing hardware and software are updated frequently and extensively, which quickly renders peer-reviewed publications on analysis pipeline benchmarking efforts outdated. To provide the user community with a faster, more flexible alternative to peer-reviewed benchmark papers for de novo assembly tool performance we constructed poreTally, a comprehensive benchmarking tool. poreTally automatically assembles a given read set using several often-used assembly pipelines, analyzes the resulting assemblies for correctness and continuity, and finally generates a quality report, which can immediately be published on Github/Gitlab. Availability and implementation poreTally is available on Github at https://github.com/ cvdelannoy/poreTally, under an MIT license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

gep2pep: a bioconductor package for the creation and analysis of pathway-based expression profiles

Bioinformatics ◽

10.1093/bioinformatics/btz803 ◽

2019 ◽

Cited By ~ 2

Author(s):

Farancesco Napolitano ◽

Diego Carrella ◽

Xin Gao ◽

Diego di Bernardo

Keyword(s):

Expression Profiles ◽

Enrichment Analysis ◽

Software Tool ◽

Supplementary Information ◽

Bioconductor Package ◽

Supplementary Data ◽

Connectivity Map ◽

Systematic Comparison ◽

Transcriptomic Data ◽

High Level

Abstract Summary Pathway-based expression profiles allow for high-level interpretation of transcriptomic data and systematic comparison of dysregulated cellular programs. We have previously demonstrated the efficacy of pathway-based approaches with two different applications: the drug set enrichment analysis and the Gene2drug analysis. Here, we present a software tool that allows to easily convert gene-based profiles to pathway-based profiles and analyze them within the popular R framework. We also provide pre-computed profiles derived from the original Connectivity Map and its next generation release, i.e. the LINCS database. Availability and implementation The tool is implemented as the R/Bioconductor package gep2pep and can be freely downloaded from https://bioconductor.org/packages/gep2pep. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

EpiGraph: an open-source platform to quantify epithelial organization

Bioinformatics ◽

10.1093/bioinformatics/btz683 ◽

2019 ◽

Author(s):

Pablo Vicente-Munuera ◽

Pedro Gómez-Gálvez ◽

Robert J Tetley ◽

Cristina Forja ◽

Antonio Tagua ◽

...

Keyword(s):

Image Analysis ◽

Open Access ◽

Supplementary Information ◽

Analysis Tool ◽

Distribution Analysis ◽

Supplementary Data ◽

Programming Skills ◽

User Friendly ◽

Degree Of Order

Abstract Summary Here we present EpiGraph, an image analysis tool that quantifies epithelial organization. Our method combines computational geometry and graph theory to measure the degree of order of any packed tissue. EpiGraph goes beyond the traditional polygon distribution analysis, capturing other organizational traits that improve the characterization of epithelia. EpiGraph can objectively compare the rearrangements of epithelial cells during development and homeostasis to quantify how the global ensemble is affected. Importantly, it has been implemented in the open-access platform Fiji. This makes EpiGraph very user friendly, with no programming skills required. Availability and implementation EpiGraph is available at https://imagej.net/EpiGraph and the code is accessible (https://github.com/ComplexOrganizationOfLivingMatter/Epigraph) under GPLv3 license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

LSTrAP-Kingdom: an automated pipeline to generate annotated gene expression atlases for kingdoms of life

Bioinformatics ◽

10.1093/bioinformatics/btab168 ◽

2021 ◽

Author(s):

William Goh1 ◽

Marek Mutwil1

Keyword(s):

Gene Expression ◽

Large Scale ◽

Supplementary Information ◽

Expression Data ◽

Supplementary Data ◽

Rna Seq ◽

Analysis Pipeline ◽

Study Gene Expression ◽

Automated Pipeline ◽

Bacteria And Fungi

Abstract Motivation There are now more than two million RNA sequencing experiments for plants, animals, bacteria and fungi publicly available, allowing us to study gene expression within and across species and kingdoms. However, the tools allowing the download, quality control and annotation of this data for more than one species at a time are currently missing. Results To remedy this, we present the Large-Scale Transcriptomic Analysis Pipeline in Kingdom of Life (LSTrAP-Kingdom) pipeline, which we used to process 134,521 RNA-seq samples, achieving ∼12,000 processed samples per day. Our pipeline generated quality-controlled, annotated gene expression matrices that rival the manually curated gene expression data in identifying functionally-related genes. Availability LSTrAP-Kingdom is available from: https://github.com/wirriamm/plants-pipeline and is fully implemented in Python and Bash. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PATO: Pangenome Analysis Toolkit

Bioinformatics ◽

10.1093/bioinformatics/btab697 ◽

2021 ◽

Author(s):

Miguel D Fernández-de-Bobadilla ◽

Alba Talavera-Rodríguez ◽

Lucía Chacón ◽

Fernando Baquero ◽

Teresa M Coque ◽

...

Keyword(s):

Population Structure ◽

Statistical Analysis ◽

Core Genome ◽

State Of The Art ◽

Source Code ◽

Supplementary Information ◽

Complete Analysis ◽

Large Set ◽

Supplementary Data ◽

Desktop Computer

Abstract Motivation We present the Pangenome Analysis Toolkit (PATO) designed to simultaneously analyze thousands of genomes using a desktop computer. The tool performs common tasks of pangenome analysis such as core-genome definition and accessory genome properties and includes new features that help characterize population structure, annotate pathogenic features and create gene sharedness networks. PATO has been developed in R to integrate with the large set of tools available for genetic, phylogenetic and statistical analysis in this environment. Results PATO can perform the most demanding bioinformatic analyses in minutes with an accuracy comparable to state-of-the-art software but 20–30x times faster. PATO also integrates all the necessary functions for the complete analysis of the most common objectives in microbiology studies. Lastly, PATO includes the necessary tools for visualizing the results and can be integrated with other analytical packages available in R. Availability The source code for PATO is freely available at https://github.com/irycisBioinfo/PATO under the GPLv3 license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Tibanna: software for scalable execution of portable pipelines on the cloud

Bioinformatics ◽

10.1093/bioinformatics/btz379 ◽

2019 ◽

Vol 35 (21) ◽

pp. 4424-4426 ◽

Cited By ~ 1

Author(s):

Soohyun Lee ◽

Jeremy Johnson ◽

Carl Vitzthum ◽

Koray Kırlı ◽

Burak H Alver ◽

...

Keyword(s):

Open Source Software ◽

Source Code ◽

Software Tool ◽

Application Programming Interface ◽

Supplementary Information ◽

Supplementary Data ◽

Description Language ◽

Amazon Web Services ◽

Application Programming ◽

Programming Interface

Abstract Summary We introduce Tibanna, an open-source software tool for automated execution of bioinformatics pipelines on Amazon Web Services (AWS). Tibanna accepts reproducible and portable pipeline standards including Common Workflow Language (CWL), Workflow Description Language (WDL) and Docker. It adopts a strategy of isolation and optimization of individual executions, combined with a serverless scheduling approach. Pipelines are executed and monitored using local commands or the Python Application Programming Interface (API) and cloud configuration is automatically handled. Tibanna is well suited for projects with a range of computational requirements, including those with large and widely fluctuating loads. Notably, it has been used to process terabytes of data for the 4D Nucleome (4DN) Network. Availability and implementation Source code is available on GitHub at https://github.com/4dn-dcic/tibanna. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text