scholarly journals Triplet-based similarity score for fully multilabeled trees with poly-occurring labels

Author(s):  
Simone Ciccolella ◽  
Giulia Bernardini ◽  
Luca Denti ◽  
Paola Bonizzoni ◽  
Marco Previtali ◽  
...  

Abstract Motivation The latest advances in cancer sequencing, and the availability of a wide range of methods to infer the evolutionary history of tumors, have made it important to evaluate, reconcile and cluster different tumor phylogenies. Recently, several notions of distance or similarities have been proposed in the literature, but none of them has emerged as the golden standard. Moreover, none of the known similarity measures is able to manage mutations occurring multiple times in the tree, a circumstance often occurring in real cases. Results To overcome these limitations, in this article, we propose MP3, the first similarity measure for tumor phylogenies able to effectively manage cases where multiple mutations can occur at the same time and mutations can occur multiple times. Moreover, a comparison of MP3 with other measures shows that it is able to classify correctly similar and dissimilar trees, both on simulated and on real data. Availability and implementation An open source implementation of MP3 is publicly available at https://github.com/AlgoLab/mp3treesim. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Simone Ciccolella ◽  
Giulia Bernardini ◽  
Luca Denti ◽  
Paola Bonizzoni ◽  
Marco Previtali ◽  
...  

AbstractThe latest advances in cancer sequencing, and the availability of a wide range of methods to infer the evolutionary history of tumors, have made it important to evaluate, reconcile and cluster different tumor phylogenies.Recently, several notions of distance or similarities have been proposed in the literature, but none of them has emerged as the golden standard. Moreover, none of the known similarity measures is able to manage mutations occurring multiple times in the tree, a circumstance often occurring in real cases.To overcome these limitations, in this paper we propose MP3, the first similarity measure for tumor phylogenies able to effectively manage cases where multiple mutations can occur at the same time and mutations can occur multiple times. Moreover, a comparison of MP3 with other measures shows that it is able to classify correctly similar and dissimilar trees, both on simulated and on real data.


Author(s):  
Kai Cheng ◽  
Gabrielle Pawlowski ◽  
Xinheng Yu ◽  
Yusen Zhou ◽  
Sriram Neelamegham

Abstract Summary This manuscript describes an open-source program, DrawGlycan-SNFG (version 2), that accepts IUPAC (International Union of Pure and Applied Chemist)-condensed inputs to render Symbol Nomenclature For Glycans (SNFG) drawings. A wide range of local and global options enable display of various glycan/peptide modifications including bond breakages, adducts, repeat structures, ambiguous identifications etc. These facilities make DrawGlycan-SNFG ideal for integration into various glycoinformatics software, including glycomics and glycoproteomics mass spectrometry (MS) applications. As a demonstration of such usage, we incorporated DrawGlycan-SNFG into gpAnnotate, a standalone application to score and annotate individual MS/MS glycopeptide spectrum in different fragmentation modes. Availability and implementation DrawGlycan-SNFG and gpAnnotate are platform independent. While originally coded using MATLAB, compiled packages are also provided to enable DrawGlycan-SNFG implementation in Python and Java. All programs are available from https://virtualglycome.org/drawglycan; https://virtualglycome.org/gpAnnotate. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (20) ◽  
pp. 4072-4080 ◽  
Author(s):  
Timo M Deist ◽  
Andrew Patti ◽  
Zhaoqi Wang ◽  
David Krane ◽  
Taylor Sorenson ◽  
...  

Abstract Motivation In a predictive modeling setting, if sufficient details of the system behavior are known, one can build and use a simulation for making predictions. When sufficient system details are not known, one typically turns to machine learning, which builds a black-box model of the system using a large dataset of input sample features and outputs. We consider a setting which is between these two extremes: some details of the system mechanics are known but not enough for creating simulations that can be used to make high quality predictions. In this context we propose using approximate simulations to build a kernel for use in kernelized machine learning methods, such as support vector machines. The results of multiple simulations (under various uncertainty scenarios) are used to compute similarity measures between every pair of samples: sample pairs are given a high similarity score if they behave similarly under a wide range of simulation parameters. These similarity values, rather than the original high dimensional feature data, are used to build the kernel. Results We demonstrate and explore the simulation-based kernel (SimKern) concept using four synthetic complex systems—three biologically inspired models and one network flow optimization model. We show that, when the number of training samples is small compared to the number of features, the SimKern approach dominates over no-prior-knowledge methods. This approach should be applicable in all disciplines where predictive models are sought and informative yet approximate simulations are available. Availability and implementation The Python SimKern software, the demonstration models (in MATLAB, R), and the datasets are available at https://github.com/davidcraft/SimKern. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (16) ◽  
pp. 4527-4529
Author(s):  
Ales Saska ◽  
David Tichy ◽  
Robert Moore ◽  
Achilles Rasquinha ◽  
Caner Akdas ◽  
...  

Abstract Summary Visualizing a network provides a concise and practical understanding of the information it represents. Open-source web-based libraries help accelerate the creation of biologically based networks and their use. ccNetViz is an open-source, high speed and lightweight JavaScript library for visualization of large and complex networks. It implements customization and analytical features for easy network interpretation. These features include edge and node animations, which illustrate the flow of information through a network as well as node statistics. Properties can be defined a priori or dynamically imported from models and simulations. ccNetViz is thus a network visualization library particularly suited for systems biology. Availability and implementation The ccNetViz library, demos and documentation are freely available at http://helikarlab.github.io/ccNetViz/. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Richard Jiang ◽  
Bruno Jacob ◽  
Matthew Geiger ◽  
Sean Matthew ◽  
Bryan Rumsey ◽  
...  

Abstract Summary We present StochSS Live!, a web-based service for modeling, simulation and analysis of a wide range of mathematical, biological and biochemical systems. Using an epidemiological model of COVID-19, we demonstrate the power of StochSS Live! to enable researchers to quickly develop a deterministic or a discrete stochastic model, infer its parameters and analyze the results. Availability and implementation StochSS Live! is freely available at https://live.stochss.org/ Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Caitlin Cherryh ◽  
Bui Quang Minh ◽  
Rob Lanfear

AbstractMost phylogenetic analyses assume that the evolutionary history of an alignment (either that of a single locus, or of multiple concatenated loci) can be described by a single bifurcating tree, the so-called the treelikeness assumption. Treelikeness can be violated by biological events such as recombination, introgression, or incomplete lineage sorting, and by systematic errors in phylogenetic analyses. The incorrect assumption of treelikeness may then mislead phylogenetic inferences. To quantify and test for treelikeness in alignments, we develop a test statistic which we call the tree proportion. This statistic quantifies the proportion of the edge weights in a phylogenetic network that are represented in a bifurcating phylogenetic tree of the same alignment. We extend this statistic to a statistical test of treelikeness using a parametric bootstrap. We use extensive simulations to compare tree proportion to a range of related approaches. We show that tree proportion successfully identifies non-treelikeness in a wide range of simulation scenarios, and discuss its strengths and weaknesses compared to other approaches. The power of the tree-proportion test to reject non-treelike alignments can be lower than some other approaches, but these approaches tend to be limited in their scope and/or the ease with which they can be interpreted. Our recommendation is to test treelikeness of sequence alignments with both tree proportion and mosaic methods such as 3Seq. The scripts necessary to replicate this study are available at https://github.com/caitlinch/treelikeness


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Sven D. Schrinner ◽  
Rebecca Serra Mari ◽  
Jana Ebler ◽  
Mikko Rautiainen ◽  
Lancelot Seillier ◽  
...  

Abstract Resolving genomes at haplotype level is crucial for understanding the evolutionary history of polyploid species and for designing advanced breeding strategies. Polyploid phasing still presents considerable challenges, especially in regions of collapsing haplotypes.We present WhatsHap polyphase, a novel two-stage approach that addresses these challenges by (i) clustering reads and (ii) threading the haplotypes through the clusters. Our method outperforms the state-of-the-art in terms of phasing quality. Using a real tetraploid potato dataset, we demonstrate how to assemble local genomic regions of interest at the haplotype level. Our algorithm is implemented as part of the widely used open source tool WhatsHap.


2020 ◽  
Vol 66 (3-4) ◽  
pp. 142-150
Author(s):  
Jessica Worthington Wilmer ◽  
Andrew P. Amey ◽  
Carmel McDougall ◽  
Melanie Venz ◽  
Stephen Peck ◽  
...  

Sclerophyll woodlands and open forests once covered vast areas of eastern Australia, but have been greatly fragmented and reduced in extent since European settlement. The biogeographic and evolutionary history of the biota of eastern Australia’s woodlands also remains poorly known, especially when compared to rainforests to the east, or the arid biome to the west. Here we present an analysis of patterns of mitochondrial genetic diversity in two species of Pygopodid geckos with distributions centred on the Brigalow Belt Bioregion of eastern Queensland. One moderately large and semi-arboreal species, Paradelma orientalis, shows low genetic diversity and no clear geographic structuring across its wide range. In contrast a small and semi-fossorial species, Delma torquata, consists of two moderately divergent clades, one from the ranges and upland of coastal areas of south-east Queensland, and other centred in upland areas further inland. These data point to varying histories of geneflow and refugial persistance in eastern Australia’s vast but now fragmented open woodlands. The Carnarvon Ranges of central Queensland are also highlighted as a zone of persistence for cool and/or wet-adapted taxa, however the evolutionary history and divergence of most outlying populations in these mountains remains unstudied.


Author(s):  
Julia Yan ◽  
Nick Patterson ◽  
Vagheesh M Narasimhan

Abstract Summary Admixture graphs represent the genetic relationship between a set of populations through splits, drift and admixture. In this article, we present the Julia package miqoGraph, which uses mixed-integer quadratic optimization to fit topology, drift lengths and admixture proportions simultaneously. Through applications of miqoGraph to both simulated and real data, we show that integer optimization can greatly speed up and automate what is usually an arduous manual process. Availability and implementation https://github.com/juliayyan/PhylogeneticTrees.jl. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Pierre Morisse ◽  
Claire Lemaitre ◽  
Fabrice Legeai

Abstract Motivation Linked-Reads technologies combine both the high-quality and low cost of short-reads sequencing and long-range information, through the use of barcodes tagging reads which originate from a common long DNA molecule. This technology has been employed in a broad range of applications including genome assembly, phasing and scaffolding, as well as structural variant calling. However, to date, no tool or API dedicated to the manipulation of Linked-Reads data exist. Results We introduce LRez, a C ++ API and toolkit which allows easy management of Linked-Reads data. LRez includes various functionalities, for computing numbers of common barcodes between genomic regions, extracting barcodes from BAM files, as well as indexing and querying BAM, FASTQ and gzipped FASTQ files to quickly fetch all reads or alignments containing a given barcode. LRez is compatible with a wide range of Linked-Reads sequencing technologies, and can thus be used in any tool or pipeline requiring barcode processing or indexing, in order to improve their performances. Availability and implementation LRez is implemented in C ++, supported on Unix-based platforms, and available under AGPL-3.0 License at https://github.com/morispi/LRez, and as a bioconda module. Supplementary information Supplementary data are available at Bioinformatics Advances


Sign in / Sign up

Export Citation Format

Share Document