wft4galaxy: A Workflow Tester for Galaxy

Mapping Intimacies ◽

10.1101/132001 ◽

2017 ◽

Author(s):

Marco Enrico Piras ◽

Luca Pireddu ◽

Gianluigi Zanetti

Keyword(s):

Complex Analysis ◽

Computer Programs ◽

Supplementary Information ◽

Automated Testing ◽

Continuous Integration ◽

Link Type ◽

Scientific Analysis ◽

The Galaxy ◽

Supplementary Material ◽

High Level

ABSTRACTMotivationWorkflow managers for scientific analysis provide a high-level programming platform facilitating standardization, automation, collaboration and access to sophisticated computing resources. The Galaxy workflow manager provides a prime example of this type of platform. As compositions of simpler tools, workflows effectively comprise specialized computer programs implementing often very complex analysis procedures. To date, no simple way exists to automatically test Galaxy workflows and ensure their correctness has appeared in the literature.ResultsWith wft4galaxy we offer a tool to bring automated testing to Galaxy workflows, making it feasible to bring continuous integration to their development and ensuring that defects are detected promptly. wft4galaxy can be easily installed as a regular Python program or launched directly as a Docker container – the latter reducing installation effort to a minimum.Availabilitywft4galaxy is available online at https://github.com/phnmnl/wft4galaxy under the Academic Free License v3.0.Supplementary informationSupplementary information is available at http://wft4galaxy.readthedocs.io.

PathScore: a web tool for identifying altered pathways in cancer data

10.1101/067090 ◽

2016 ◽

Cited By ~ 2

Author(s):

Stephen G. Gaffney ◽

Jeffrey P. Townsend

Keyword(s):

Web Application ◽

Somatic Mutations ◽

Supplementary Information ◽

Web Tool ◽

Cancer Data ◽

Link Type ◽

Novel Approach ◽

Supplementary Material ◽

User Friendly ◽

Pathway Effect

ABSTRACTSummaryPathScore quantifies the level of enrichment of somatic mutations within curated pathways, applying a novel approach that identifies pathways enriched across patients. The application provides several user-friendly, interactive graphic interfaces for data exploration, including tools for comparing pathway effect sizes, significance, gene-set overlap and enrichment differences between projects.Availability and ImplementationWeb application available at pathscore.publichealth.yale.edu. Site implemented in Python and MySQL, with all major browsers supported. Source code available at github.com/sggaffney/pathscore with a GPLv3 [email protected] InformationAdditional documentation can be found at http://pathscore.publichealth.yale.edu/faq.

Palaeolatitudinal distribution of the Ediacaran macrobiota

Journal of the Geological Society ◽

10.1144/jgs2021-030 ◽

2021 ◽

pp. jgs2021-030

Author(s):

Catherine E. Boddy ◽

Emily G. Mitchell ◽

Andrew Merdith ◽

Alexander G. Liu

Keyword(s):

Taxonomic Composition ◽

Supplementary Information ◽

Cambrian Explosion ◽

Content Type ◽

Link Type ◽

Environmental Perturbations ◽

Significant Difference ◽

Evolutionary Trajectories ◽

Cambrian Radiation ◽

Supplementary Material

Macrofossils of the late Ediacaran Period (c. 579–539 Ma) document diverse, complex multicellular eukaryotes, including early animals, prior to the Cambrian radiation of metazoan phyla. To investigate the relationships between environmental perturbations, biotic responses and early metazoan evolutionary trajectories, it is vital to distinguish between evolutionary and ecological controls on the global distribution of Ediacaran macrofossils. The contributions of temporal, palaeoenvironmental and lithological factors in shaping the observed variations in assemblage taxonomic composition between Ediacaran macrofossil sites are widely discussed, but the role of palaeogeography remains ambiguous. Here we investigate the influence of palaeolatitude on the spatial distribution of Ediacaran macrobiota through the late Ediacaran Period using two leading palaeogeographical reconstructions. We find that overall generic diversity was distributed across all palaeolatitudes. Among specific groups, the distributions of candidate ‘Bilateral’ and Frondomorph taxa exhibit weakly statistically significant and statistically significant differences between low and high palaeolatitudes within our favoured palaeogeographical reconstruction, respectively, whereas Algal, Tubular, Soft-bodied and Biomineralizing taxa show no significant difference. The recognition of statistically significant palaeolatitudinal differences in the distribution of certain morphogroups highlights the importance of considering palaeolatitudinal influences when interrogating trends in Ediacaran taxon distributions.Supplementary material: Supplementary information, data and code are available at https://doi.org/10.6084/m9.figshare.c.5488945Thematic collection: This article is part of the Advances in the Cambrian Explosion collection available at: https://www.lyellcollection.org/cc/advances-cambrian-explosion

MODE-TASK: Large-scale protein motion tools

10.1101/217505 ◽

2017 ◽

Author(s):

Caroline Ross ◽

Bilal Nizami ◽

Michael Glenister ◽

Olivier Sheik Amamuddy ◽

Ali Rana Atilgan ◽

...

Keyword(s):

Large Scale ◽

Protein Complexes ◽

Normal Mode Analysis ◽

Md Simulations ◽

Supplementary Information ◽

Mode Analysis ◽

Analysis Tool ◽

Link Type ◽

Supplementary Material ◽

Anisotropic Network

AbstractSummaryMODE-TASK, a novel software suite, comprises Principle Component Analysis, Multidimensional Scaling, and t-Distributed Stochastic Neighbor Embedding techniques using molecular dynamics trajectories. MODE-TASK also includes a Normal Mode Analysis tool based on Anisotropic Network Model so as to provide a variety of ways to analyse and compare large-scale motions of protein complexes for which long MD simulations are prohibitive.Availability and ImplementationMODE-TASK has been open-sourced, and is available for download from https://github.com/RUBi-ZA/MODE-TASK, implemented in Python and C++.Supplementary informationDocumentation available at http://mode-task.readthedocs.io.

Crosslink: A fast, scriptable genetic mapper for outcrossing species

10.1101/135277 ◽

2017 ◽

Cited By ~ 6

Author(s):

Robert J. Vickerstaff ◽

Richard J. Harrison

Keyword(s):

Large Datasets ◽

Supplementary Information ◽

Supplementary Data ◽

Link Type ◽

Mapping Software ◽

Outcrossing Species ◽

Supplementary Material ◽

Novel Approaches ◽

Similar Accuracy ◽

General Public License

AbstractSummaryCrosslink is genetic mapping software for outcrossing species designed to run efficiently on large datasets by combining the best from existing tools with novel approaches. Tests show it runs much faster than several comparable programs whilst retaining a similar accuracy.Availability and implementationAvailable under the GNU General Public License version 2 from https://github.com/eastmallingresearch/[email protected] informationSupplementary data are available at Bioinformatics online and from https://github.com/eastmallingresearch/crosslink/releases/tag/v0.5.

Indoril: An I-PV Add-On for Visualization of Point Mutations on 3D Cartesian Coordinates

10.1101/148122 ◽

2017 ◽

Author(s):

Ibrahim Tanyalcin ◽

Julien Ferte ◽

Taushif Khan ◽

Carla Al Assaf

Keyword(s):

Protein Structure ◽

Mechanism Of Action ◽

Dimensional Space ◽

Point Mutations ◽

Supplementary Information ◽

Cartesian Coordinates ◽

3 Dimensional ◽

Link Type ◽

Supplementary Section ◽

Supplementary Material

ABSTRACTSummaryOne of the main goals of proteomics is to understand how point mutations impact on the protein structure. Visualization and clustering of point mutations on user-defined 3 dimensional space can allow researchers to have new insights and hypothesis about the mutation’s mechanism of action.Availability and ImplementationWe have developed an interactive I-PV add-on called INDORIL to visualize point mutations. Indoril can be downloaded fromhttp://[email protected]║[email protected] InformationPlease refer to the supplementary section andhttp://www.i-pv.org.

IMMAN: an R/Bioconductor package for Interolog protein network reconstruction, Mapping and Mining ANalysis

10.1101/069104 ◽

2016 ◽

Cited By ~ 1

Author(s):

Minoo Ashtinai ◽

Payman Nickchi ◽

Soheil Jahangiri-Tazehkand ◽

Abdollah Safari ◽

Mehdi Mirzaie ◽

...

Keyword(s):

Protein Function ◽

Protein Function Prediction ◽

Protein Interaction Networks ◽

Interaction Networks ◽

Protein Network ◽

Supplementary Information ◽

Protein Protein Interaction ◽

Link Type ◽

Supplementary Material ◽

Protein Protein Interaction Networks

SummaryIMMAN is a software for reconstructing Interolog Protein Network (IPN) by integrating several Protein-protein Interaction Networks (PPIN). Users can unify different PPINs to mine conserved common network among species. IMMAN helps to retrieve IPNs with different degrees of conservation to engage for protein function prediction analysis based on protein networks.AvailabilityIMMAN is freely available at https://bioconductor.org/packages/IMMAN, http://profiles.bs.ipm.ir/softwares/IMMAN/[email protected], [email protected], [email protected] informationSupplementary data are available online.

GTShark: Genotype compression in large project

10.1101/494104 ◽

2018 ◽

Author(s):

Sebastian Deorowicz ◽

Agnieszka Danek

Keyword(s):

Web Site ◽

Supplementary Information ◽

Supplementary Data ◽

Link Type ◽

Large Project ◽

Supplementary Material

AbstractSummaryNowadays large sequencing projects handle tens of thousands of individuals. The huge files summarizing the findings definitely require compression. We propose a tool able to compress large collections of genotypes as well as single samples in such projects to sizes not achievable to date.Availability and Implementationhttps://github.com/refresh-bio/[email protected] informationSupplementary data are available at publisher’s Web site.

UMI-Reducer: Collapsing duplicate sequencing reads via Unique Molecular Identifiers

10.1101/103267 ◽

2017 ◽

Cited By ~ 4

Author(s):

Serghei Mangul ◽

Sarah Van Driesche ◽

Lana S. Martin ◽

Kelsey C. Martin ◽

Eleazar Eskin

Keyword(s):

Polymerase Chain Reaction ◽

Supplementary Information ◽

Computational Tool ◽

Chain Reaction ◽

Commercial Use ◽

Link Type ◽

Sequencing Library ◽

Pcr Duplicates ◽

Supplementary Material ◽

Polymerase Chain

Short Structured AbstractSummaryEvery sequencing library contains duplicate reads. While many duplicates arise during polymerase chain reaction (PCR), some duplicates derive from multiple identical fragments of mRNA present in the original lysate (termed “biological duplicates”). Unique Molecular Identifiers (UMIs) are random oligonucleotide sequences that allow differentiation between technical and biological duplicates. Here we report the development of UMI-Reducer, a new computational tool for processing and differentiating PCR duplicates from biological duplicates. UMI-Reducer uses UMIs and the mapping position of the read to identify and collapse reads that are technical duplicates. Remaining true biological reads are further used for bias-free estimate of mRNA abundance in the original lysate. This strategy is of particular use for libraries made from low amounts of starting material, which typically require additional cycles of PCR and therefore are most prone to PCR duplicate bias.Availability and ImplementationThe UMI-Reducer is an open source Python software and is freely available for non-commercial use (GPL-3.0) at https://sergheimangul.wordpress.com/umi-reducer/. Documentation and tutorials are available at https://github.com/smangul1/UMI-Reducer/wiki/[email protected], [email protected] informationFlowchart of Library Construction

Haplotype-aware graph indexes

10.1101/559583 ◽

2019 ◽

Cited By ~ 7

Author(s):

Jouni Sirén ◽

Erik Garrison ◽

Adam M. Novak ◽

Benedict Paten ◽

Richard Durbin

Keyword(s):

Genetic Variation ◽

Chromosome 17 ◽

Supplementary Information ◽

Whole Genome ◽

Supplementary Data ◽

1000 Genomes Project ◽

1000 Genomes ◽

Link Type ◽

Supplementary Material ◽

Haplotype Information

AbstractMotivationThe variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are nonbiological, unlikely recombinations of true haplotypes.ResultsWe augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows–Wheelertransform (GBWT). We demonstrate the scalability of the new implementation by building a whole-genome index of the 5,008 haplotypes of the 1000 Genomes Project, and an index of all 108,070 TOPMed Freeze 5 chromosome 17 haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes.AvailabilityOur software is available at https://github.com/vgteam/vg, https://github.com/jltsiren/gbwt, and https://github.com/jltsiren/[email protected] informationSupplementary data are available.

16GT: a fast and sensitive variant caller using a 16-genotype probabilistic model

10.1101/111393 ◽

2017 ◽

Cited By ~ 3

Author(s):

Ruibang Luo ◽

Michael C. Schatz ◽

Steven L. Salzberg

Keyword(s):

Probabilistic Model ◽

Variant Calling ◽

Supplementary Information ◽

Link Type ◽

Indel Calling ◽

Supplementary Material ◽

Calling Algorithm

AbstractSummary16GT is a variant caller for Illumina WGS and WES germline data. It uses a new 16-genotype probabilistic model to unify SNP and indel calling in a single variant calling algorithm. In benchmark comparisons with five other widely used variant callers on a modern 36-core server, 16GT ran faster and demonstrated improved sensitivity in calling SNPs, and it provided comparable sensitivity and accuracy in calling indels as compared to the GATK HaplotypeCaller.Availability and implementationhttps://github.com/aquaskyline/[email protected] informationSupplementary tables and notes are available at Bioinformatics online.