Tysserand - Fast reconstruction of spatial networks from bioimages

Mapping Intimacies ◽

10.1101/2020.11.16.385377 ◽

2020 ◽

Author(s):

Alexis Coullomb ◽

Vera Pancaldi

Keyword(s):

Supplementary Information ◽

Spatial Networks ◽

Supplementary Data ◽

New Methods ◽

Link Type ◽

Spatially Resolved ◽

Fast Reconstruction ◽

Bioinformatics Community

AbstractMotivationNetworks provide a powerful framework to analyze spatial omics experiments. However, we lack tools that integrate several methods to easily reconstruct networks for further analyses with dedicated libraries. In addition, choosing the appropriate method and parameters can be challenging.SummaryWe propose tysserand, a Python library to reconstruct spatial networks from spatially resolved omics experiments. It is intended as a common tool where the bioinformatics community can add new methods to reconstruct networks, choose appropriate parameters, clean resulting networks and pipe data to other libraries.Availabilitytysserand software and tutorials with a Jupyter notebook to reproduce the results are available at https://github.com/VeraPancaldiLab/[email protected] informationSupplementary data are available at Bioarxiv online.

Download Full-text

pyseer: a comprehensive tool for microbial pangenome-wide association studies

10.1101/266312 ◽

2018 ◽

Cited By ~ 1

Author(s):

John A Lees ◽

Marco Galardini ◽

Stephen D Bentley ◽

Jeffrey N Weiser ◽

Jukka Corander

Keyword(s):

Input Data ◽

Association Studies ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Supplementary Data ◽

New Methods ◽

Link Type ◽

Genome Wide

AbstractSummaryGenome-wide association studies (GWAS) in microbes face different challenges to eukaryotes and have been addressed by a number of different methods. pyseer brings these techniques together in one package tailored to microbial GWAS, allows greater flexibility of the input data used, and adds new methods to interpret the association results.Availability and Implementationpyseer is written in python and is freely available at https://github.com/mgalardini/pyseer, or can be installed through pip. Documentation and a tutorial are available at http://[email protected] and [email protected] informationSupplementary data are available online.

Download Full-text

Crosslink: A fast, scriptable genetic mapper for outcrossing species

10.1101/135277 ◽

2017 ◽

Cited By ~ 6

Author(s):

Robert J. Vickerstaff ◽

Richard J. Harrison

Keyword(s):

Large Datasets ◽

Supplementary Information ◽

Supplementary Data ◽

Link Type ◽

Mapping Software ◽

Outcrossing Species ◽

Supplementary Material ◽

Novel Approaches ◽

Similar Accuracy ◽

General Public License

AbstractSummaryCrosslink is genetic mapping software for outcrossing species designed to run efficiently on large datasets by combining the best from existing tools with novel approaches. Tests show it runs much faster than several comparable programs whilst retaining a similar accuracy.Availability and implementationAvailable under the GNU General Public License version 2 from https://github.com/eastmallingresearch/[email protected] informationSupplementary data are available at Bioinformatics online and from https://github.com/eastmallingresearch/crosslink/releases/tag/v0.5.

Download Full-text

PhyloFold: Precise and Swift Prediction of RNA Secondary Structures to Incorporate Phylogeny among Homologs

10.1101/2020.03.05.975797 ◽

2020 ◽

Author(s):

Masaki Tagashira

Keyword(s):

Secondary Structure ◽

Rna Secondary Structure ◽

Prediction Accuracy ◽

Structural Alignment ◽

Source Code ◽

Secondary Structures ◽

Supplementary Information ◽

Supplementary Data ◽

Link Type ◽

Structural Alignments

AbstractMotivationThe simultaneous consideration of sequence alignment and RNA secondary structure, or structural alignment, is known to help predict more accurate secondary structures of homologs. However, the consideration is heavy and can be done only roughly to decompose structural alignments.ResultsThe PhyloFold method, which predicts secondary structures of homologs considering likely pairwise structural alignments, was developed in this study. The method shows the best prediction accuracy while demanding comparable running time compared to conventional methods.AvailabilityThe source code of the programs implemented in this study is available on “https://github.com/heartsh/phylofold” and “https://github.com/heartsh/phyloalifold“.Contact“[email protected]”.Supplementary informationSupplementary data are available.

Download Full-text

M3Drop: dropout-based feature selection for scRNASeq

Bioinformatics ◽

10.1093/bioinformatics/bty1044 ◽

2018 ◽

Vol 35 (16) ◽

pp. 2865-2867 ◽

Cited By ~ 61

Author(s):

Tallulah S Andrews ◽

Martin Hemberg

Keyword(s):

Feature Selection ◽

Single Cell ◽

R Package ◽

Supplementary Information ◽

Supplementary Data ◽

Selection Methods ◽

Functional Responses ◽

Technical Noise ◽

New Methods ◽

Selection For

Abstract Motivation Most genomes contain thousands of genes, but for most functional responses, only a subset of those genes are relevant. To facilitate many single-cell RNASeq (scRNASeq) analyses the set of genes is often reduced through feature selection, i.e. by removing genes only subject to technical noise. Results We present M3Drop, an R package that implements popular existing feature selection methods and two novel methods which take advantage of the prevalence of zeros (dropouts) in scRNASeq data to identify features. We show these new methods outperform existing methods on simulated and real datasets. Availability and implementation M3Drop is freely available on github as an R package and is compatible with other popular scRNASeq tools: https://github.com/tallulandrews/M3Drop. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

VCFShark: how to squeeze a VCF file

10.1101/2020.12.18.423437 ◽

2020 ◽

Author(s):

Sebastian Deorowicz ◽

Agnieszka Danek

Keyword(s):

Web Site ◽

Supplementary Information ◽

Supplementary Data ◽

Link Type ◽

Order Of Magnitude ◽

Better Than ◽

De Facto Standards

AbstractSummaryThe VCF files with results of sequencing projects take a lot of space. We propose VCFShark squeezing them up to an order of magnitude better than the de facto standards (gzipped VCF and BCF).Availability and Implementationhttps://github.com/refresh-bio/[email protected] informationSupplementary data are available at publisher’s Web site.

Download Full-text

PyRanges: efficient comparison of genomic intervals in Python

10.1101/609396 ◽

2019 ◽

Cited By ~ 1

Author(s):

Endre Bakken Stovner ◽

Pål Sætrom

Keyword(s):

Supplementary Information ◽

Supplementary Data ◽

Genomic Libraries ◽

Link Type ◽

Simple Set ◽

Set Operations ◽

Wide Range ◽

Genomic Analyses ◽

Associated Data ◽

Memory Efficient

AbstractSummaryComplex genomic analyses often use sequences of simple set operations like intersection, overlap, and nearest on genomic intervals. These operations, coupled with some custom programming, allow a wide range of analyses to be performed. To this end, we have written PyRanges, a data structure for representing and manipulating genomic intervals and their associated data in Python. Run single-threaded on binary set operations, PyRanges is in median 2.3-9.6 times faster than the popular R GenomicRanges library and is equally memory efficient; run multi-threaded on 8 cores, our library is up to 123 times faster. PyRanges is therefore ideally suited both for individual analyses and as a foundation for future genomic libraries in Python.AvailabilityPyRanges is available open-source under the MIT license at https://github.com/biocore-NTNU/pyranges and documentation exists at https://biocore-NTNU.github.io/pyranges/[email protected] informationSupplementary data are available.

Download Full-text

GTShark: Genotype compression in large project

10.1101/494104 ◽

2018 ◽

Author(s):

Sebastian Deorowicz ◽

Agnieszka Danek

Keyword(s):

Web Site ◽

Supplementary Information ◽

Supplementary Data ◽

Link Type ◽

Large Project ◽

Supplementary Material

AbstractSummaryNowadays large sequencing projects handle tens of thousands of individuals. The huge files summarizing the findings definitely require compression. We propose a tool able to compress large collections of genotypes as well as single samples in such projects to sizes not achievable to date.Availability and Implementationhttps://github.com/refresh-bio/[email protected] informationSupplementary data are available at publisher’s Web site.

Download Full-text

Varstation: a complete and efficient tool to support NGS data analysis

10.1101/833582 ◽

2019 ◽

Author(s):

ACO Faria ◽

MP Caraciolo ◽

RM Minillo ◽

TF Almeida ◽

SM Pereira ◽

...

Keyword(s):

Genetic Variation ◽

Data Analysis ◽

Supplementary Information ◽

Human Genetic Variation ◽

Supplementary Data ◽

Efficient Tool ◽

Link Type ◽

Data Processor ◽

Ngs Data Analysis ◽

Ngs Data

AbstractSummaryVarstation is a cloud-based NGS data processor and analyzer for human genetic variation. This resource provides a customizable, centralized, safe and clinically validated environment aiming to improve and optimize the flow of NGS analyses and reports related with clinical and research genetics.Availability and implementationVarstation is freely available at http://varstation.com, for academic [email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

Haplotype-aware graph indexes

10.1101/559583 ◽

2019 ◽

Cited By ~ 7

Author(s):

Jouni Sirén ◽

Erik Garrison ◽

Adam M. Novak ◽

Benedict Paten ◽

Richard Durbin

Keyword(s):

Genetic Variation ◽

Chromosome 17 ◽

Supplementary Information ◽

Whole Genome ◽

Supplementary Data ◽

1000 Genomes Project ◽

1000 Genomes ◽

Link Type ◽

Supplementary Material ◽

Haplotype Information

AbstractMotivationThe variation graph toolkit (VG) represents genetic variation as a graph. Although each path in the graph is a potential haplotype, most paths are nonbiological, unlikely recombinations of true haplotypes.ResultsWe augment the VG model with haplotype information to identify which paths are more likely to exist in nature. For this purpose, we develop a scalable implementation of the graph extension of the positional Burrows–Wheelertransform (GBWT). We demonstrate the scalability of the new implementation by building a whole-genome index of the 5,008 haplotypes of the 1000 Genomes Project, and an index of all 108,070 TOPMed Freeze 5 chromosome 17 haplotypes. We also develop an algorithm for simplifying variation graphs for k-mer indexing without losing any k-mers in the haplotypes.AvailabilityOur software is available at https://github.com/vgteam/vg, https://github.com/jltsiren/gbwt, and https://github.com/jltsiren/[email protected] informationSupplementary data are available.

Download Full-text

Crumble: reference free lossy compression of sequence quality values

10.1101/243030 ◽

2018 ◽

Cited By ~ 1

Author(s):

James K Bonfield ◽

Shane A McCarthy ◽

Richard Durbin

Keyword(s):

Variant Calling ◽

Lossy Compression ◽

Supplementary Information ◽

Supplementary Data ◽

Test Set ◽

Link Type ◽

Fold Reduction ◽

Space Saving ◽

Sequence Quality

AbstractMotivationThe bulk of space taken up by NGS sequencing CRAM files consists of per-base quality values. Most of these are unnecessary for variant calling, offering an opportunity for space saving.ResultsOn the CHM1+CHM13 test set, a 17 fold reduction in quality storage can be achieved while maintaining variant calling accuracy.AvailabilityCrumble is OpenSource and can be obtained from https://github.com/jkbonfield/[email protected] informationSupplementary data are available.

Download Full-text