scholarly journals HAMAP rules as SPARQL A portable annotation pipeline for genomes and proteomes

2019 ◽  
Author(s):  
Jerven Bolleman ◽  
Eduoard de Castro ◽  
Delphine Baratin ◽  
Sebastien Gehant ◽  
Beatrice A. Cuche ◽  
...  

AbstractMotivationGenome and proteome annotation pipelines are generally custom built and therefore not easily reusable by other groups, which leads to duplication of effort, increased costs, and suboptimal results. One cost-effective way to increase the data quality in public databases is to encourage the adoption of annotation standards and technological solutions that enable the sharing of biological knowledge and tools for genome and proteome annotation.ResultsWe have translated the rules of our HAMAP proteome annotation pipeline to queries in the W3C standard SPARQL 1.1 syntax and applied them with two off-the-shelf SPARQL engines to UniProtKB/Swiss-Prot protein sequences described in RDF format. This approach is applicable to any genome or proteome annotation pipeline and greatly simplifies their reuse.AvailabilityHAMAP SPARQL rules and documentation are freely available for download from the HAMAP FTP site ftp://ftp.expasy.org/databases/hamap/hamapsparql.tar.gz under a CC-BY-ND 4.0 license. The annotations generated by the rules are under the CC-BY 4.0 [email protected] informationSupplementary data are included at the end of this document.

2017 ◽  
Author(s):  
Robert J. Vickerstaff ◽  
Richard J. Harrison

AbstractSummaryCrosslink is genetic mapping software for outcrossing species designed to run efficiently on large datasets by combining the best from existing tools with novel approaches. Tests show it runs much faster than several comparable programs whilst retaining a similar accuracy.Availability and implementationAvailable under the GNU General Public License version 2 from https://github.com/eastmallingresearch/[email protected] informationSupplementary data are available at Bioinformatics online and from https://github.com/eastmallingresearch/crosslink/releases/tag/v0.5.


2018 ◽  
Author(s):  
Sebastian Deorowicz ◽  
Agnieszka Danek

AbstractSummaryNowadays large sequencing projects handle tens of thousands of individuals. The huge files summarizing the findings definitely require compression. We propose a tool able to compress large collections of genotypes as well as single samples in such projects to sizes not achievable to date.Availability and Implementationhttps://github.com/refresh-bio/[email protected] informationSupplementary data are available at publisher’s Web site.


2019 ◽  
Author(s):  
Jouni Sirén ◽  
Erik Garrison ◽  
Adam M. Novak ◽  
Benedict Paten ◽  
Richard Durbin

AbstractMotivationThe variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are nonbiological, unlikely recombinations of true haplotypes.ResultsWe augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows–Wheelertransform (GBWT). We demonstrate the scalability of the new implementation by building a whole-genome index of the 5,008 haplotypes of the 1000 Genomes Project, and an index of all 108,070 TOPMed Freeze 5 chromosome 17 haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes.AvailabilityOur software is available at https://github.com/vgteam/vg, https://github.com/jltsiren/gbwt, and https://github.com/jltsiren/[email protected] informationSupplementary data are available.


2016 ◽  
Author(s):  
Rohan Dandage ◽  
Kausik Chakraborty

SummaryHigh throughput genotype to phenotype (G2P) data is increasingly being generated by widely applicable Deep Mutational Scanning (DMS) method. dms2dfe is a comprehensive end-to-end workflow that addresses critical issue with noise reduction and offers variety of crucial downstream analyses. Noise reduction is carried out by normalizing counts of mutants by depth of sequencing and subsequent dispersion shrinkage at the level of calculation of preferential enrichments. In downstream analyses, dms2dfe workflow provides identification of relative selection pressures, potential molecular constraints and generation of data-rich visualizations.Availabilitydms2dfe is implemented as a python package and it is available at https://kc-lab.github.io/[email protected], [email protected] informationSupplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Brent S. Pedersen ◽  
Aaron R. Quinlan

AbstractMotivationExtracting biological insight from genomic data inevitably requires custom software. In many cases, this is accomplished with scripting languages, owing to their accessibility and brevity. Unfortunately, the ease of scripting languages typically comes at a substantial performance cost that is especially acute with the scale of modern genomics datasets.ResultsWe present hts-nim, a high-performance library written in the Nim programming language that provides a simple, scripting-like syntax without sacrificing performance.Availabilityhts-nim is available at https://github.com/brentp/hts-nim and the example tools are at https://github.com/brentp/hts-nim-tools both under the MIT [email protected] informationSupplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Bruno Henrique Ribeiro Da Fonseca ◽  
Douglas Silva Domingues ◽  
Alexandre Rossi Paschoal

AbstractMotivationMirtrons are originated from short introns with atypical cleavage from the miRNA canonical pathway by using the splicing mechanism. Several studies describe mirtrons in chordates, invertebrates and plants but in the current literature there is no repository that centralizes and organizes these public and available data. To fill this gap, we created the first knowledge database dedicated to mirtron, called mirtronDB, available at http://mirtrondb.cp.utfpr.edu.br/. MirtronDB has a total of 1,407 mirtron precursors and 2,426 mirtron mature sequences in 18 species.ResultsThrough a user-friendly interface, users can browse and search mirtrons by organism, organism group, type and name. MirtronDB is a specialized resource to explore mirtrons and their regulations, providing free, user-friendly access to knowledge on mirtron data.AvailabilityMirtronDB is available at http://mirtrondb.cp.utfpr.edu.br/[email protected] informationSupplementary data are available.


2016 ◽  
Author(s):  
Dengfeng Guan ◽  
Bo Liu ◽  
Yadong Wang

AbstractSummaryIn metagenomic studies, fast and effective tools are on wide demand to implement taxonomy classification for upto billions of reads. Herein, we propose deSPI, a novel read classification method that classifies reads by recognizing and analyzing the matches between reads and reference with de Bruijn graph-based lightweight reference indexing. deSPI has faster speed with relatively small memory footprint, meanwhile, it can also achieve higher or similar sensitivity and accuracy.Availabilitythe C++ source code of deSPI is available at https://github.com/hitbc/[email protected] informationSupplementary data are available at Bioinformatics online.


2016 ◽  
Author(s):  
Stephen G. Gaffney ◽  
Jeffrey P. Townsend

ABSTRACTSummaryPathScore quantifies the level of enrichment of somatic mutations within curated pathways, applying a novel approach that identifies pathways enriched across patients. The application provides several user-friendly, interactive graphic interfaces for data exploration, including tools for comparing pathway effect sizes, significance, gene-set overlap and enrichment differences between projects.Availability and ImplementationWeb application available at pathscore.publichealth.yale.edu. Site implemented in Python and MySQL, with all major browsers supported. Source code available at github.com/sggaffney/pathscore with a GPLv3 [email protected] InformationAdditional documentation can be found at http://pathscore.publichealth.yale.edu/faq.


2021 ◽  
pp. jgs2021-030
Author(s):  
Catherine E. Boddy ◽  
Emily G. Mitchell ◽  
Andrew Merdith ◽  
Alexander G. Liu

Macrofossils of the late Ediacaran Period (c. 579–539 Ma) document diverse, complex multicellular eukaryotes, including early animals, prior to the Cambrian radiation of metazoan phyla. To investigate the relationships between environmental perturbations, biotic responses and early metazoan evolutionary trajectories, it is vital to distinguish between evolutionary and ecological controls on the global distribution of Ediacaran macrofossils. The contributions of temporal, palaeoenvironmental and lithological factors in shaping the observed variations in assemblage taxonomic composition between Ediacaran macrofossil sites are widely discussed, but the role of palaeogeography remains ambiguous. Here we investigate the influence of palaeolatitude on the spatial distribution of Ediacaran macrobiota through the late Ediacaran Period using two leading palaeogeographical reconstructions. We find that overall generic diversity was distributed across all palaeolatitudes. Among specific groups, the distributions of candidate ‘Bilateral’ and Frondomorph taxa exhibit weakly statistically significant and statistically significant differences between low and high palaeolatitudes within our favoured palaeogeographical reconstruction, respectively, whereas Algal, Tubular, Soft-bodied and Biomineralizing taxa show no significant difference. The recognition of statistically significant palaeolatitudinal differences in the distribution of certain morphogroups highlights the importance of considering palaeolatitudinal influences when interrogating trends in Ediacaran taxon distributions.Supplementary material: Supplementary information, data and code are available at https://doi.org/10.6084/m9.figshare.c.5488945Thematic collection: This article is part of the Advances in the Cambrian Explosion collection available at: https://www.lyellcollection.org/cc/advances-cambrian-explosion


2017 ◽  
Author(s):  
Marco Enrico Piras ◽  
Luca Pireddu ◽  
Gianluigi Zanetti

ABSTRACTMotivationWorkflow managers for scientific analysis provide a high-level programming platform facilitating standardization, automation, collaboration and access to sophisticated computing resources. The Galaxy workflow manager provides a prime example of this type of platform. As compositions of simpler tools, workflows effectively comprise specialized computer programs implementing often very complex analysis procedures. To date, no simple way exists to automatically test Galaxy workflows and ensure their correctness has appeared in the literature.ResultsWith wft4galaxy we offer a tool to bring automated testing to Galaxy workflows, making it feasible to bring continuous integration to their development and ensuring that defects are detected promptly. wft4galaxy can be easily installed as a regular Python program or launched directly as a Docker container – the latter reducing installation effort to a minimum.Availabilitywft4galaxy is available online at https://github.com/phnmnl/wft4galaxy under the Academic Free License v3.0.Supplementary informationSupplementary information is available at http://wft4galaxy.readthedocs.io.


Sign in / Sign up

Export Citation Format

Share Document