SMART: SuperMaximal approximate repeats tool

Lorraine A K Ayad; Panagiotis Charalampopoulos; Solon P Pissis

doi:10.1093/bioinformatics/btz953

SMART: SuperMaximal approximate repeats tool

Bioinformatics ◽

10.1093/bioinformatics/btz953 ◽

2019 ◽

Vol 36 (8) ◽

pp. 2589-2591

Author(s):

Lorraine A K Ayad ◽

Panagiotis Charalampopoulos ◽

Solon P Pissis

Keyword(s):

State Of The Art ◽

Input Sequence ◽

The State ◽

Supplementary Information ◽

Greedy Heuristics ◽

Supplementary Data ◽

Repeat Analysis ◽

Analysis Tools ◽

Speed Up

Abstract Summary State-of-the-art repeat analysis tools rely on extending maximal repeated pairs to enumerate maximal k-mismatch repeats. These pairs can be quadratic in n, the length of the input sequence, and thus greedy heuristics are applied to speed up the extension. Here, we introduce supermaximal k-mismatch repeats, which are linear in n and capture all maximal k-mismatch repeats: every maximal k-mismatch repeat is a substring of some supermaximal k-mismatch repeat. We present SMART, a tool based on recent algorithmic advances implemented in C++ to compute supermaximal k-mismatch repeats directly, and show that these elements are statistically much more significant than the output of the state-of-the-art. Availability and implementation http://github.com/lorrainea/smart (GNU GPL v3.0). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Curse: building expression atlases and co-expression networks from public RNA-Seq data

Bioinformatics ◽

10.1093/bioinformatics/bty1052 ◽

2018 ◽

Vol 35 (16) ◽

pp. 2880-2881 ◽

Cited By ~ 6

Author(s):

Dries Vaneechoutte ◽

Klaas Vandepoele

Keyword(s):

Rna Sequencing ◽

State Of The Art ◽

The State ◽

Supplementary Information ◽

Supplementary Data ◽

Rna Seq ◽

Valuable Resource ◽

Sequencing Data ◽

Imperfect Quality ◽

Sequence Read Archive

Abstract Summary Public RNA-Sequencing (RNA-Seq) datasets are a valuable resource for transcriptome analyses, but their accessibility is hindered by the imperfect quality and presentation of their metadata and by the complexity of processing raw sequencing data. The Curse suite was created to alleviate these problems. It consists of an online curation tool named Curse to efficiently build compendia of experiments hosted on the Sequence Read Archive, and a lightweight pipeline named Prose to download and process the RNA-Seq data into expression atlases and co-expression networks. Curse networks showed improved linking of functionally related genes compared to the state-of-the-art. Availability and implementation Curse, Prose and their manuals are available at http://bioinformatics.psb.ugent.be/webtools/Curse/. Prose was implemented in Java. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CorGAT: a tool for the functional annotation of SARS-CoV-2 genomes

Bioinformatics ◽

10.1093/bioinformatics/btaa1047 ◽

2020 ◽

Author(s):

Matteo Chiara ◽

Federico Zambelli ◽

Marco Antonio Tangaro ◽

Pietro Mandreoli ◽

David S Horner ◽

...

Keyword(s):

Functional Annotation ◽

Ad Hoc ◽

State Of The Art ◽

Supplementary Information ◽

Genomic Sequences ◽

Supplementary Data ◽

Evolutionary Patterns ◽

Genomic Variants ◽

Art Methods ◽

Available Resources

Abstract Summary While over 200 000 genomic sequences are currently available through dedicated repositories, ad hoc methods for the functional annotation of SARS-CoV-2 genomes do not harness all currently available resources for the annotation of functionally relevant genomic sites. Here, we present CorGAT, a novel tool for the functional annotation of SARS-CoV-2 genomic variants. By comparisons with other state of the art methods we demonstrate that, by providing a more comprehensive and rich annotation, our method can facilitate the identification of evolutionary patterns in the genome of SARS-CoV-2. Availabilityand implementation Galaxy http://corgat.cloud.ba.infn.it/galaxy; software: https://github.com/matteo14c/CorGAT/tree/Revision_V1; docker: https://hub.docker.com/r/laniakeacloud/galaxy_corgat. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CPS analysis: self-contained validation of biomedical data clustering

Bioinformatics ◽

10.1093/bioinformatics/btaa165 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3516-3521 ◽

Cited By ~ 1

Author(s):

Lixiang Zhang ◽

Lin Lin ◽

Jia Li

Keyword(s):

Data Clustering ◽

State Of The Art ◽

R Package ◽

Research Community ◽

Supplementary Information ◽

Biomedical Data ◽

Data Generation ◽

Supplementary Data ◽

Point Set ◽

Class Labels

Abstract Motivation Cluster analysis is widely used to identify interesting subgroups in biomedical data. Since true class labels are unknown in the unsupervised setting, it is challenging to validate any cluster obtained computationally, an important problem barely addressed by the research community. Results We have developed a toolkit called covering point set (CPS) analysis to quantify uncertainty at the levels of individual clusters and overall partitions. Functions have been developed to effectively visualize the inherent variation in any cluster for data of high dimension, and provide more comprehensive view on potentially interesting subgroups in the data. Applying to three usage scenarios for biomedical data, we demonstrate that CPS analysis is more effective for evaluating uncertainty of clusters comparing to state-of-the-art measurements. We also showcase how to use CPS analysis to select data generation technologies or visualization methods. Availability and implementation The method is implemented in an R package called OTclust, available on CRAN. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

miqoGraph: fitting admixture graphs using mixed-integer quadratic optimization

Bioinformatics ◽

10.1093/bioinformatics/btaa988 ◽

2020 ◽

Author(s):

Julia Yan ◽

Nick Patterson ◽

Vagheesh M Narasimhan

Keyword(s):

Genetic Relationship ◽

Real Data ◽

Quadratic Optimization ◽

Supplementary Information ◽

Mixed Integer ◽

Supplementary Data ◽

Integer Optimization ◽

Speed Up

Abstract Summary Admixture graphs represent the genetic relationship between a set of populations through splits, drift and admixture. In this article, we present the Julia package miqoGraph, which uses mixed-integer quadratic optimization to fit topology, drift lengths and admixture proportions simultaneously. Through applications of miqoGraph to both simulated and real data, we show that integer optimization can greatly speed up and automate what is usually an arduous manual process. Availability and implementation https://github.com/juliayyan/PhylogeneticTrees.jl. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Temporal network alignment via GoT-WAVE

Bioinformatics ◽

10.1093/bioinformatics/btz119 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3527-3529 ◽

Cited By ~ 3

Author(s):

David Aparício ◽

Pedro Ribeiro ◽

Tijana Milenković ◽

Fernando Silva

Keyword(s):

User Interface ◽

State Of The Art ◽

Source Code ◽

Network Alignment ◽

Supplementary Information ◽

Temporal Network ◽

Temporal Networks ◽

Supplementary Data ◽

Node Similarity ◽

User Friendly

Abstract Motivation Network alignment (NA) finds conserved regions between two networks. NA methods optimize node conservation (NC) and edge conservation. Dynamic graphlet degree vectors are a state-of-the-art dynamic NC measure, used within the fastest and most accurate NA method for temporal networks: DynaWAVE. Here, we use graphlet-orbit transitions (GoTs), a different graphlet-based measure of temporal node similarity, as a new dynamic NC measure within DynaWAVE, resulting in GoT-WAVE. Results On synthetic networks, GoT-WAVE improves DynaWAVE’s accuracy by 30% and speed by 64%. On real networks, when optimizing only dynamic NC, the methods are complementary. Furthermore, only GoT-WAVE supports directed edges. Hence, GoT-WAVE is a promising new temporal NA algorithm, which efficiently optimizes dynamic NC. We provide a user-friendly user interface and source code for GoT-WAVE. Availability and implementation http://www.dcc.fc.up.pt/got-wave/ Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PATO: Pangenome Analysis Toolkit

10.1101/2021.01.30.428878 ◽

2021 ◽

Author(s):

Miguel D. Fernández-de-Bobadilla ◽

Alba Talavera-Rodríguez ◽

Lucía Chacón ◽

Fernando Baquero ◽

Teresa M. Coque ◽

...

Keyword(s):

Population Structure ◽

Statistical Analysis ◽

Core Genome ◽

State Of The Art ◽

Source Code ◽

Supplementary Information ◽

Complete Analysis ◽

Large Set ◽

Supplementary Data ◽

Desktop Computer

AbstractMotivationComparative genomics is a growing field but one that will be eventually overtaken by sample size studies and the increase of available genomes in public databases. We present the Pangenome Analysis Toolkit (PATO) designed to simultaneously analyze thousands of genomes using a desktop computer. The tool performs common tasks of pangenome analysis such as core-genome definition and accessory genome properties and includes new features that help characterize population structure, annotate pathogenic features and create gene sharedness networks. PATO has been developed in R to integrate with the large set of tools available for genetic, phylogenetic and statistical analysis in this environment.ResultsPATO can perform the most demanding bioinformatic analyses in minutes with an accuracy comparable to state-of-the-art software but 20–30x times faster. PATO also integrates all the necessary functions for the complete analysis of the most common objectives in microbiology studies. Lastly, PATO includes the necessary tools for visualizing the results and can be integrated with other analytical packages available in R.AvailabilityThe source code for PATO is freely available at https://github.com/irycisBioinfo/PATO under the GPLv3 [email protected] informationSupplementary data are available at Bioinformatics online

Download Full-text

Incrementally updating the high average-utility patterns with pre-large concept

Applied Intelligence ◽

10.1007/s10489-020-01743-y ◽

2020 ◽

Vol 50 (11) ◽

pp. 3788-3807

Author(s):

Jerry Chun-Wei Lin ◽

Matin Pirouz ◽

Youcef Djenouri ◽

Chien-Fu Cheng ◽

Usman Ahmed

Keyword(s):

State Of The Art ◽

The State ◽

Batch Mode ◽

Itemset Mining ◽

The Past ◽

Dynamic Databases ◽

Speed Up ◽

Average Utility ◽

High Utility ◽

High Utility Patterns

Abstract High-utility itemset mining (HUIM) is considered as an emerging approach to detect the high-utility patterns from databases. Most existing algorithms of HUIM only consider the itemset utility regardless of the length. This limitation raises the utility as a result of a growing itemset size. High average-utility itemset mining (HAUIM) considers the size of the itemset, thus providing a more balanced scale to measure the average-utility for decision-making. Several algorithms were presented to efficiently mine the set of high average-utility itemsets (HAUIs) but most of them focus on handling static databases. In the past, a fast-updated (FUP)-based algorithm was developed to efficiently handle the incremental problem but it still has to re-scan the database when the itemset in the original database is small but there is a high average-utility upper-bound itemset (HAUUBI) in the newly inserted transactions. In this paper, an efficient framework called PRE-HAUIMI for transaction insertion in dynamic databases is developed, which relies on the average-utility-list (AUL) structures. Moreover, we apply the pre-large concept on HAUIM. A pre-large concept is used to speed up the mining performance, which can ensure that if the total utility in the newly inserted transaction is within the safety bound, the small itemsets in the original database could not be the large ones after the database is updated. This, in turn, reduces the recurring database scans and obtains the correct HAUIs. Experiments demonstrate that the PRE-HAUIMI outperforms the state-of-the-art batch mode HAUI-Miner, and the state-of-the-art incremental IHAUPM and FUP-based algorithms in terms of runtime, memory, number of assessed patterns and scalability.

Download Full-text

HaploGrouper: a generalized approach to haplogroup classification

Bioinformatics ◽

10.1093/bioinformatics/btaa729 ◽

2020 ◽

Author(s):

Anuradha Jagadeesan ◽

S Sunna Ebenesersdóttir ◽

Valdis B Guðmundsdóttir ◽

Elisabet Linda Thordardottir ◽

Kristjan H S Moore ◽

...

Keyword(s):

Mitochondrial Dna ◽

Phylogenetic Tree ◽

Y Chromosome ◽

State Of The Art ◽

Supplementary Information ◽

Sequence Variants ◽

Use Case ◽

Supplementary Data ◽

Human Mitochondrial Dna ◽

Comparable Accuracy

Abstract Motivation We introduce HaploGrouper, a versatile software to classify haplotypes into haplogroups on the basis of a known phylogenetic tree. A typical use case for this software is the assignment of haplogroups to human mitochondrial DNA (mtDNA) or Y-chromosome haplotypes. Existing state-of-the-art haplogroup-calling software is typically hard-wired to work only with either mtDNA or Y-chromosome haplotypes from humans. Results HaploGrouper exhibits comparable accuracy in these instances and has the advantage of being able to assign haplogroups to any kind of haplotypes from any species—given an extant annotated phylogenetic tree defined by sequence variants. Availability and implementation The software is available at the following URL https://gitlab.com/bio_anth_decode/haploGrouper. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

QPARSE: searching for long-looped or multimeric G-quadruplexes potentially distinctive and druggable

Bioinformatics ◽

10.1093/bioinformatics/btz569 ◽

2019 ◽

Cited By ~ 1

Author(s):

Michele Berselli ◽

Enrico Lavezzo ◽

Stefano Toppo

Keyword(s):

Human Gene ◽

State Of The Art ◽

Comprehensive Analysis ◽

Supplementary Information ◽

Gene Promoters ◽

Supplementary Data ◽

Stem Loop ◽

Hiv 1 ◽

Rna And Dna ◽

The Web

Abstract Motivation G-quadruplexes (G4s) are non-canonical nucleic acid conformations that are widespread in all kingdoms of life and are emerging as important regulators both in RNA and DNA. Recently, two new higher-order architectures have been reported: adjacent interacting G4s, and G4s with stable long loops forming stem-loop structures. As there are no specialized tools to identify these conformations, we developed QPARSE. Results QPARSE can exhaustively search for degenerate potential quadruplex-forming sequences (PQSs) containing bulges and/or mismatches at genomic level, as well as either multimeric or long-looped PQS (MPQS and LLPQS respectively). While its assessment vs. known reference datasets is comparable with the state-of-the-art, what is more interesting is its performance in the identification of MPQS and LLPQS that present algorithms are not designed to search for. We report a comprehensive analysis of MPQS in human gene promoters and the analysis of LLPQS on three experimentally validated case studies from HIV-1, BCL2, and hTERT. Availability QPARSE is freely accessible on the web at http://www.medcomp.medicina.unipd.it/qparse/index or downloadable from github as a python 2.7 program https://github.com/B3rse/qparse Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Symbolic analysis tools-the state of the art

1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96 ◽

10.1109/iscas.1996.542145 ◽

2002 ◽

Cited By ~ 3

Author(s):

F.V. Fernandez ◽

A. Rodriguez-Vazquez

Keyword(s):

State Of The Art ◽

Symbolic Analysis ◽

The State ◽

Analysis Tools

Download Full-text