scholarly journals Jasmine: a Java pipeline for isomiR characterization in miRNA-Seq data

Author(s):  
Xiangfu Zhong ◽  
Albert Pla ◽  
Simon Rayner

Abstract Motivation The existence of complex subpopulations of miRNA isoforms, or isomiRs, is well established. While many tools exist for investigating isomiR populations, they differ in how they characterize an isomiR, making it difficult to compare results across different tools. Thus, there is a need for a more comprehensive and systematic standard for defining isomiRs. Such a standard would allow investigation of isomiR population structure in progressively more refined sub-populations, permitting the identification of more subtle changes between conditions and leading to an improved understanding of the processes that generate these differences. Results We developed Jasmine, a software tool that incorporates a hierarchal framework for characterizing isomiR populations. Jasmine is a Java application that can process raw read data in fastq/fasta format, or mapped reads in SAM format to produce a detailed characterization of isomiR populations. Thus, Jasmine can reveal structure not apparent in a standard miRNA-Seq analysis pipeline. Availability and implementation Jasmine is implemented in Java and R and freely available at bitbucket https://bitbucket.org/bipous/jasmine/src/master/. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

2021 ◽  
Author(s):  
Miguel D. Fernández-de-Bobadilla ◽  
Alba Talavera-Rodríguez ◽  
Lucía Chacón ◽  
Fernando Baquero ◽  
Teresa M. Coque ◽  
...  

AbstractMotivationComparative genomics is a growing field but one that will be eventually overtaken by sample size studies and the increase of available genomes in public databases. We present the Pangenome Analysis Toolkit (PATO) designed to simultaneously analyze thousands of genomes using a desktop computer. The tool performs common tasks of pangenome analysis such as core-genome definition and accessory genome properties and includes new features that help characterize population structure, annotate pathogenic features and create gene sharedness networks. PATO has been developed in R to integrate with the large set of tools available for genetic, phylogenetic and statistical analysis in this environment.ResultsPATO can perform the most demanding bioinformatic analyses in minutes with an accuracy comparable to state-of-the-art software but 20–30x times faster. PATO also integrates all the necessary functions for the complete analysis of the most common objectives in microbiology studies. Lastly, PATO includes the necessary tools for visualizing the results and can be integrated with other analytical packages available in R.AvailabilityThe source code for PATO is freely available at https://github.com/irycisBioinfo/PATO under the GPLv3 [email protected] informationSupplementary data are available at Bioinformatics online


2020 ◽  
Vol 36 (12) ◽  
pp. 3947-3948
Author(s):  
Jose-Jesus Fernandez ◽  
Teobaldo E Torres ◽  
Eva Martin-Solana ◽  
Gerardo F Goya ◽  
Maria-Rosario Fernandez-Fernandez

Abstract Summary We have developed a software tool to improve the image quality in focused ion beam–scanning electron microscopy (FIB–SEM) stacks: PolishEM. Based on a Gaussian blur model, it automatically estimates and compensates for the blur affecting each individual image. It also includes correction for artifacts commonly arising in FIB–SEM (e.g. curtaining). PolishEM has been optimized for an efficient processing of huge FIB–SEM stacks on standard computers. Availability and implementation PolishEM has been developed in C. GPL source code and binaries for Linux, OSX and Windows are available at http://www.cnb.csic.es/%7ejjfernandez/polishem. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Anja Mösch ◽  
Dmitrij Frishman

Abstract Summary The ability of a T cell to recognize foreign peptides is defined by a single α and a single β hypervariable complementarity determining region (CDR3), which together form the T cell receptor (TCR) heterodimer. In ∼30%-35% of T cells, two α chains are expressed at the mRNA level but only one α chain is part of the functional TCR. This effect can also be observed for β chains, although it is less common. The identification of functional α/β chain pairs is instrumental in high-throughput characterization of therapeutic TCRs. TCRpair is the first method that predicts whether an α and β chain pair forms a functional, HLA-A*02:01 specific TCR without requiring the sequence of a recognized peptide. By taking additional amino acids flanking the CDR3 regions into account, TCRpair achieves an AUC of 0.71. Availability TCRpair is implemented in Python using TensorFlow 2.0 and is freely available at https://www.github.com/amoesch/TCRpair Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (12) ◽  
pp. 3927-3929 ◽  
Author(s):  
Lulu Chen ◽  
Chiung-Ting Wu ◽  
Niya Wang ◽  
David M Herrington ◽  
Robert Clarke ◽  
...  

Abstract Summary We develop a fully unsupervised deconvolution method to dissect complex tissues into molecularly distinctive tissue or cell subtypes based on bulk expression profiles. We implement an R package, deconvolution by Convex Analysis of Mixtures (debCAM) that can automatically detect tissue/cell-specific markers, determine the number of constituent subtypes, calculate subtype proportions in individual samples and estimate tissue/cell-specific expression profiles. We demonstrate the performance and biomedical utility of debCAM on gene expression, methylation, proteomics and imaging data. With enhanced data preprocessing and prior knowledge incorporation, debCAM software tool will allow biologists to perform a more comprehensive and unbiased characterization of tissue remodeling in many biomedical contexts. Availability and implementation http://bioconductor.org/packages/debCAM. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 35 (15) ◽  
pp. 2663-2664 ◽  
Author(s):  
Carlos de Lannoy ◽  
Judith Risse ◽  
Dick de Ridder

Abstract Summary Nanopore sequencing is a novel development in nucleic acid analysis. As such, nanopore-sequencing hardware and software are updated frequently and extensively, which quickly renders peer-reviewed publications on analysis pipeline benchmarking efforts outdated. To provide the user community with a faster, more flexible alternative to peer-reviewed benchmark papers for de novo assembly tool performance we constructed poreTally, a comprehensive benchmarking tool. poreTally automatically assembles a given read set using several often-used assembly pipelines, analyzes the resulting assemblies for correctness and continuity, and finally generates a quality report, which can immediately be published on Github/Gitlab. Availability and implementation poreTally is available on Github at https://github.com/ cvdelannoy/poreTally, under an MIT license. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Farancesco Napolitano ◽  
Diego Carrella ◽  
Xin Gao ◽  
Diego di Bernardo

Abstract Summary Pathway-based expression profiles allow for high-level interpretation of transcriptomic data and systematic comparison of dysregulated cellular programs. We have previously demonstrated the efficacy of pathway-based approaches with two different applications: the drug set enrichment analysis and the Gene2drug analysis. Here, we present a software tool that allows to easily convert gene-based profiles to pathway-based profiles and analyze them within the popular R framework. We also provide pre-computed profiles derived from the original Connectivity Map and its next generation release, i.e. the LINCS database. Availability and implementation The tool is implemented as the R/Bioconductor package gep2pep and can be freely downloaded from https://bioconductor.org/packages/gep2pep. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Pablo Vicente-Munuera ◽  
Pedro Gómez-Gálvez ◽  
Robert J Tetley ◽  
Cristina Forja ◽  
Antonio Tagua ◽  
...  

Abstract Summary Here we present EpiGraph, an image analysis tool that quantifies epithelial organization. Our method combines computational geometry and graph theory to measure the degree of order of any packed tissue. EpiGraph goes beyond the traditional polygon distribution analysis, capturing other organizational traits that improve the characterization of epithelia. EpiGraph can objectively compare the rearrangements of epithelial cells during development and homeostasis to quantify how the global ensemble is affected. Importantly, it has been implemented in the open-access platform Fiji. This makes EpiGraph very user friendly, with no programming skills required. Availability and implementation EpiGraph is available at https://imagej.net/EpiGraph and the code is accessible (https://github.com/ComplexOrganizationOfLivingMatter/Epigraph) under GPLv3 license. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
William Goh1 ◽  
Marek Mutwil1

Abstract Motivation There are now more than two million RNA sequencing experiments for plants, animals, bacteria and fungi publicly available, allowing us to study gene expression within and across species and kingdoms. However, the tools allowing the download, quality control and annotation of this data for more than one species at a time are currently missing. Results To remedy this, we present the Large-Scale Transcriptomic Analysis Pipeline in Kingdom of Life (LSTrAP-Kingdom) pipeline, which we used to process 134,521 RNA-seq samples, achieving ∼12,000 processed samples per day. Our pipeline generated quality-controlled, annotated gene expression matrices that rival the manually curated gene expression data in identifying functionally-related genes. Availability LSTrAP-Kingdom is available from: https://github.com/wirriamm/plants-pipeline and is fully implemented in Python and Bash. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Miguel D Fernández-de-Bobadilla ◽  
Alba Talavera-Rodríguez ◽  
Lucía Chacón ◽  
Fernando Baquero ◽  
Teresa M Coque ◽  
...  

Abstract Motivation We present the Pangenome Analysis Toolkit (PATO) designed to simultaneously analyze thousands of genomes using a desktop computer. The tool performs common tasks of pangenome analysis such as core-genome definition and accessory genome properties and includes new features that help characterize population structure, annotate pathogenic features and create gene sharedness networks. PATO has been developed in R to integrate with the large set of tools available for genetic, phylogenetic and statistical analysis in this environment. Results PATO can perform the most demanding bioinformatic analyses in minutes with an accuracy comparable to state-of-the-art software but 20–30x times faster. PATO also integrates all the necessary functions for the complete analysis of the most common objectives in microbiology studies. Lastly, PATO includes the necessary tools for visualizing the results and can be integrated with other analytical packages available in R. Availability The source code for PATO is freely available at https://github.com/irycisBioinfo/PATO under the GPLv3 license. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (21) ◽  
pp. 4424-4426 ◽  
Author(s):  
Soohyun Lee ◽  
Jeremy Johnson ◽  
Carl Vitzthum ◽  
Koray Kırlı ◽  
Burak H Alver ◽  
...  

Abstract Summary We introduce Tibanna, an open-source software tool for automated execution of bioinformatics pipelines on Amazon Web Services (AWS). Tibanna accepts reproducible and portable pipeline standards including Common Workflow Language (CWL), Workflow Description Language (WDL) and Docker. It adopts a strategy of isolation and optimization of individual executions, combined with a serverless scheduling approach. Pipelines are executed and monitored using local commands or the Python Application Programming Interface (API) and cloud configuration is automatically handled. Tibanna is well suited for projects with a range of computational requirements, including those with large and widely fluctuating loads. Notably, it has been used to process terabytes of data for the 4D Nucleome (4DN) Network. Availability and implementation Source code is available on GitHub at https://github.com/4dn-dcic/tibanna. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document