Temporal network alignment via GoT-WAVE

David Aparício; Pedro Ribeiro; Tijana Milenković; Fernando Silva

doi:10.1093/bioinformatics/btz119

Temporal network alignment via GoT-WAVE

Bioinformatics ◽

10.1093/bioinformatics/btz119 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3527-3529 ◽

Cited By ~ 3

Author(s):

David Aparício ◽

Pedro Ribeiro ◽

Tijana Milenković ◽

Fernando Silva

Keyword(s):

User Interface ◽

State Of The Art ◽

Source Code ◽

Network Alignment ◽

Supplementary Information ◽

Temporal Network ◽

Temporal Networks ◽

Supplementary Data ◽

Node Similarity ◽

User Friendly

Abstract Motivation Network alignment (NA) finds conserved regions between two networks. NA methods optimize node conservation (NC) and edge conservation. Dynamic graphlet degree vectors are a state-of-the-art dynamic NC measure, used within the fastest and most accurate NA method for temporal networks: DynaWAVE. Here, we use graphlet-orbit transitions (GoTs), a different graphlet-based measure of temporal node similarity, as a new dynamic NC measure within DynaWAVE, resulting in GoT-WAVE. Results On synthetic networks, GoT-WAVE improves DynaWAVE’s accuracy by 30% and speed by 64%. On real networks, when optimizing only dynamic NC, the methods are complementary. Furthermore, only GoT-WAVE supports directed edges. Hence, GoT-WAVE is a promising new temporal NA algorithm, which efficiently optimizes dynamic NC. We provide a user-friendly user interface and source code for GoT-WAVE. Availability and implementation http://www.dcc.fc.up.pt/got-wave/ Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PATO: Pangenome Analysis Toolkit

10.1101/2021.01.30.428878 ◽

2021 ◽

Author(s):

Miguel D. Fernández-de-Bobadilla ◽

Alba Talavera-Rodríguez ◽

Lucía Chacón ◽

Fernando Baquero ◽

Teresa M. Coque ◽

...

Keyword(s):

Population Structure ◽

Statistical Analysis ◽

Core Genome ◽

State Of The Art ◽

Source Code ◽

Supplementary Information ◽

Complete Analysis ◽

Large Set ◽

Supplementary Data ◽

Desktop Computer

AbstractMotivationComparative genomics is a growing field but one that will be eventually overtaken by sample size studies and the increase of available genomes in public databases. We present the Pangenome Analysis Toolkit (PATO) designed to simultaneously analyze thousands of genomes using a desktop computer. The tool performs common tasks of pangenome analysis such as core-genome definition and accessory genome properties and includes new features that help characterize population structure, annotate pathogenic features and create gene sharedness networks. PATO has been developed in R to integrate with the large set of tools available for genetic, phylogenetic and statistical analysis in this environment.ResultsPATO can perform the most demanding bioinformatic analyses in minutes with an accuracy comparable to state-of-the-art software but 20–30x times faster. PATO also integrates all the necessary functions for the complete analysis of the most common objectives in microbiology studies. Lastly, PATO includes the necessary tools for visualizing the results and can be integrated with other analytical packages available in R.AvailabilityThe source code for PATO is freely available at https://github.com/irycisBioinfo/PATO under the GPLv3 [email protected] informationSupplementary data are available at Bioinformatics online

Download Full-text

mixtureS: a novel tool for bacterial strain genome reconstruction from reads

Bioinformatics ◽

10.1093/bioinformatics/btaa728 ◽

2020 ◽

Author(s):

Xin Li ◽

Haiyan Hu ◽

Xiaoman Li

Keyword(s):

Environmental Samples ◽

De Novo ◽

Source Code ◽

Supplementary Information ◽

Supplementary Data ◽

Bacterial Strains ◽

Metagenomic Sample ◽

Almost All ◽

User Friendly ◽

Strain Genome

Abstract Motivation It is essential to study bacterial strains in environmental samples. Existing methods and tools often depend on known strains or known variations, cannot work on individual samples, not reliable, or not easy to use, etc. It is thus important to develop more user-friendly tools that can identify bacterial strains more accurately. Results We developed a new tool called mixtureS that can de novo identify bacterial strains from shotgun reads of a clonal or metagenomic sample, without prior knowledge about the strains and their variations. Tested on 243 simulated datasets and 195 experimental datasets, mixtureS reliably identified the strains, their numbers and their abundance. Compared with three tools, mixtureS showed better performance in almost all simulated datasets and the vast majority of experimental datasets. Availability and implementation The source code and tool mixtureS is available at http://www.cs.ucf.edu/˜xiaoman/mixtureS/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MolRep: A Deep Representation Learning Library for Molecular Property Prediction

10.1101/2021.01.13.426489 ◽

2021 ◽

Author(s):

Jiahua Rao ◽

Shuangjia Zheng ◽

Ying Song ◽

Jianwen Chen ◽

Chengtao Li ◽

...

Keyword(s):

State Of The Art ◽

Source Code ◽

Representation Learning ◽

Supplementary Information ◽

Data Sets ◽

Supplementary Data ◽

Property Prediction ◽

Average Rank ◽

Benchmark Data ◽

Classification Tasks

AbstractSummaryRecently, novel representation learning algorithms have shown potential for predicting molecular properties. However, unified frameworks have not yet emerged for fairly measuring algorithmic progress, and experimental procedures of different representation models often lack rigorousness and are hardly reproducible. Herein, we have developed MolRep by unifying 16 state-of-the-art models across 4 popular molecular representations for application and comparison. Furthermore, we ran more than 12.5 million experiments to optimize hyperparameters for each method on 12 common benchmark data sets. As a result, CMPNN achieves the best results ranked the 1st in 5 out of 12 tasks with an average rank of 1.75. Relatively, ECC has good performance in classification tasks and MAT good for regression (both ranked 1st for 3 tasks) with an average rank of 2.71 and 2.6, respectively.AvailabilityThe source code is available at: https://github.com/biomed-AI/MolRepSupplementary informationSupplementary data are available online.

Download Full-text

Nubeam-dedup: a fast and RAM-efficient tool to de-duplicate sequencing reads without mapping

Bioinformatics ◽

10.1093/bioinformatics/btaa112 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3254-3256 ◽

Cited By ~ 2

Author(s):

Hang Dai ◽

Yongtao Guan

Keyword(s):

Hash Function ◽

Reference Genome ◽

State Of The Art ◽

Source Code ◽

Supplementary Information ◽

Supplementary Data ◽

Efficient Tool ◽

Cpu Time ◽

Products Of Matrices

Abstract Summary We present Nubeam-dedup, a fast and RAM-efficient tool to de-duplicate sequencing reads without reference genome. Nubeam-dedup represents nucleotides by matrices, transforms reads into products of matrices, and based on which assigns a unique number to a read. Thus, duplicate reads can be efficiently removed by using a collisionless hash function. Compared with other state-of-the-art reference-free tools, Nubeam-dedup uses 50–70% of CPU time and 10–15% of RAM. Availability and implementation Source code in C++ and manual are available at https://github.com/daihang16/nubeamdedup and https://haplotype.org. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ATLAS: Analysis Tools for Low-depth and Ancient Samples

10.1101/105346 ◽

2017 ◽

Cited By ~ 22

Author(s):

Vivian Link ◽

Athanasios Kousathanas ◽

Krishna Veeramah ◽

Christian Sell ◽

Amelie Scheu ◽

...

Keyword(s):

Genetic Diversity ◽

Ancient Dna ◽

State Of The Art ◽

Supplementary Information ◽

Post Mortem ◽

Supplementary Data ◽

Supplementary Material ◽

C Program ◽

User Friendly ◽

Proper Analysis

AbstractSummaryPost-mortem damage (PMD) obstructs the proper analysis of ancient DNA samples and can currently only be addressed by removing or down-weighting potentially damaged data. Here we present ATLAS, a suite of methods to accurately genotype and estimate genetic diversity from ancient samples, while accounting for PMD. It works directly from raw BAM files and enables the building of complete and customized pipelines for the analysis of ancient and other low-depth samples in a very user-friendly way. Based on simulations we show that, in the presence of PMD, a dedicated pipeline of ATLAS calls genotypes more accurately than the state-of-the-art pipeline of GATK combined with mapDamage 2.0.AvailabilityATLAS is an open-source C++ program freely available at https://bitbucket.org/phaentu/[email protected] informationSupplementary data are available at Bioinformatics online.

Download Full-text

PATO: Pangenome Analysis Toolkit

Bioinformatics ◽

10.1093/bioinformatics/btab697 ◽

2021 ◽

Author(s):

Miguel D Fernández-de-Bobadilla ◽

Alba Talavera-Rodríguez ◽

Lucía Chacón ◽

Fernando Baquero ◽

Teresa M Coque ◽

...

Keyword(s):

Population Structure ◽

Statistical Analysis ◽

Core Genome ◽

State Of The Art ◽

Source Code ◽

Supplementary Information ◽

Complete Analysis ◽

Large Set ◽

Supplementary Data ◽

Desktop Computer

Abstract Motivation We present the Pangenome Analysis Toolkit (PATO) designed to simultaneously analyze thousands of genomes using a desktop computer. The tool performs common tasks of pangenome analysis such as core-genome definition and accessory genome properties and includes new features that help characterize population structure, annotate pathogenic features and create gene sharedness networks. PATO has been developed in R to integrate with the large set of tools available for genetic, phylogenetic and statistical analysis in this environment. Results PATO can perform the most demanding bioinformatic analyses in minutes with an accuracy comparable to state-of-the-art software but 20–30x times faster. PATO also integrates all the necessary functions for the complete analysis of the most common objectives in microbiology studies. Lastly, PATO includes the necessary tools for visualizing the results and can be integrated with other analytical packages available in R. Availability The source code for PATO is freely available at https://github.com/irycisBioinfo/PATO under the GPLv3 license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

KEC: unique sequence search by K-mer exclusion

Bioinformatics ◽

10.1093/bioinformatics/btab196 ◽

2021 ◽

Author(s):

Pavel Beran ◽

Dagmar Stehlíková ◽

Stephen P Cohen ◽

Vladislav Čurn

Keyword(s):

Amino Acid ◽

Nucleic Acid ◽

Source Code ◽

Unique Sequence ◽

Supplementary Information ◽

Supplementary Data ◽

Laptop Computers ◽

Sequence Search ◽

Target Sequences ◽

Cross Reference

Abstract Summary Searching for amino acid or nucleic acid sequences unique to one organism may be challenging depending on size of the available datasets. K-mer elimination by cross-reference (KEC) allows users to quickly and easily find unique sequences by providing target and non-target sequences. Due to its speed, it can be used for datasets of genomic size and can be run on desktop or laptop computers with modest specifications. Availability and implementation KEC is freely available for non-commercial purposes. Source code and executable binary files compiled for Linux, Mac and Windows can be downloaded from https://github.com/berybox/KEC. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CorGAT: a tool for the functional annotation of SARS-CoV-2 genomes

Bioinformatics ◽

10.1093/bioinformatics/btaa1047 ◽

2020 ◽

Author(s):

Matteo Chiara ◽

Federico Zambelli ◽

Marco Antonio Tangaro ◽

Pietro Mandreoli ◽

David S Horner ◽

...

Keyword(s):

Functional Annotation ◽

Ad Hoc ◽

State Of The Art ◽

Supplementary Information ◽

Genomic Sequences ◽

Supplementary Data ◽

Evolutionary Patterns ◽

Genomic Variants ◽

Art Methods ◽

Available Resources

Abstract Summary While over 200 000 genomic sequences are currently available through dedicated repositories, ad hoc methods for the functional annotation of SARS-CoV-2 genomes do not harness all currently available resources for the annotation of functionally relevant genomic sites. Here, we present CorGAT, a novel tool for the functional annotation of SARS-CoV-2 genomic variants. By comparisons with other state of the art methods we demonstrate that, by providing a more comprehensive and rich annotation, our method can facilitate the identification of evolutionary patterns in the genome of SARS-CoV-2. Availabilityand implementation Galaxy http://corgat.cloud.ba.infn.it/galaxy; software: https://github.com/matteo14c/CorGAT/tree/Revision_V1; docker: https://hub.docker.com/r/laniakeacloud/galaxy_corgat. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

UniBioDicts: Unified access to Biological Dictionaries

Bioinformatics ◽

10.1093/bioinformatics/btaa1065 ◽

2020 ◽

Author(s):

John Zobolas ◽

Vasundra Touré ◽

Martin Kuiper ◽

Steven Vercruysse

Keyword(s):

User Interface ◽

Life Science ◽

Biological Data ◽

Supplementary Information ◽

Supplementary Data ◽

Query Interface ◽

Controlled Vocabularies ◽

Search String ◽

Software Packages ◽

The Right

Abstract Summary We present a set of software packages that provide uniform access to diverse biological vocabulary resources that are instrumental for current biocuration efforts and tools. The Unified Biological Dictionaries (UniBioDicts or UBDs) provide a single query-interface for accessing the online API services of leading biological data providers. Given a search string, UBDs return a list of matching term, identifier and metadata units from databases (e.g. UniProt), controlled vocabularies (e.g. PSI-MI) and ontologies (e.g. GO, via BioPortal). This functionality can be connected to input fields (user-interface components) that offer autocomplete lookup for these dictionaries. UBDs create a unified gateway for accessing life science concepts, helping curators find annotation terms across resources (based on descriptive metadata and unambiguous identifiers), and helping data users search and retrieve the right query terms. Availability and implementation The UBDs are available through npm and the code is available in the GitHub organisation UniBioDicts (https://github.com/UniBioDicts) under the Affero GPL license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

BioCommons: a robust java library for RNA structural bioinformatics

Bioinformatics ◽

10.1093/bioinformatics/btab069 ◽

2021 ◽

Author(s):

Tomasz Zok

Keyword(s):

Source Code ◽

Structural Bioinformatics ◽

Supplementary Information ◽

Supplementary Data ◽

Bioinformatic Tools ◽

Data Formats ◽

Central Repository ◽

Diverse Data ◽

2D And 3D ◽

Java Library

Abstract Motivation Biomolecular structures come in multiple representations and diverse data formats. Their incompatibility with the requirements of data analysis programs significantly hinders the analytics and the creation of new structure-oriented bioinformatic tools. Therefore, the need for robust libraries of data processing functions is still growing. Results BioCommons is an open-source, Java library for structural bioinformatics. It contains many functions working with the 2D and 3D structures of biomolecules, with a particular emphasis on RNA. Availability and implementation The library is available in Maven Central Repository and its source code is hosted on GitHub: https://github.com/tzok/BioCommons Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text