Merging Arcs to Produce Acyclic Phylogenetic Networks and Normal Networks

Stephen J. Willson

doi:10.1007/s11538-021-00986-1

Merging Arcs to Produce Acyclic Phylogenetic Networks and Normal Networks

Bulletin of Mathematical Biology ◽

10.1007/s11538-021-00986-1 ◽

2022 ◽

Vol 84 (2) ◽

Author(s):

Stephen J. Willson

Keyword(s):

Phylogenetic Network ◽

Phylogenetic Networks ◽

Original Network ◽

Acyclic Network ◽

Normal Network ◽

Normal Networks ◽

The Given

AbstractAs phylogenetic networks grow increasingly complicated, systematic methods for simplifying them to reveal properties will become more useful. This paper considers how to modify acyclic phylogenetic networks into other acyclic networks by contracting specific arcs that include a set D. The networks need not be binary, so vertices in the networks may have more than two parents and/or more than two children. In general, in order to make the resulting network acyclic, additional arcs not in D must also be contracted. This paper shows how to choose D so that the resulting acyclic network is “pre-normal”. As a result, removal of all redundant arcs yields a normal network. The set D can be selected based only on the geometry of the network, giving a well-defined normal phylogenetic network depending only on the given network. There are CSD maps relating most of the networks. The resulting network can be visualized as a “wired lift” in the original network, which appears as the original network with each arc drawn in one of three ways.

Download Full-text

Display Sets of Normal and Tree-Child Networks

The Electronic Journal of Combinatorics ◽

10.37236/9128 ◽

2021 ◽

Vol 28 (1) ◽

Author(s):

Janosch Döcker ◽

Simone Linz ◽

Charles Semple

Keyword(s):

Decision Problem ◽

Phylogenetic Trees ◽

Phylogenetic Network ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Directed Acyclic Graphs ◽

Phylogenetic Networks ◽

Acyclic Graphs ◽

Normal Network ◽

Normal Networks

Phylogenetic networks are leaf-labelled directed acyclic graphs that are used in computational biology to analyse and represent the evolutionary relationships of a set of species or viruses. In contrast to phylogenetic trees, phylogenetic networks have vertices of in-degree at least two that represent reticulation events such as hybridisation, lateral gene transfer, or reassortment. By systematically deleting various combinations of arcs in a phylogenetic network $\mathcal N$, one derives a set of phylogenetic trees that are embedded in $\mathcal N$. We recently showed that the problem of deciding if two binary phylogenetic networks embed the same set of phylogenetic trees is computationally hard, in particular, we showed it to be $\Pi^P_2$-complete. In this paper, we establish a polynomial-time algorithm for this decision problem if the initial two networks consist of a normal network and a tree-child network; two well-studied topologically restricted subclasses of phylogenetic networks, with normal networks being more structurally constrained than tree-child networks. The running time of the algorithm is quadratic in the size of the leaf sets.

Download Full-text

Ranking top-k trees in tree-based phylogenetic networks

10.21203/rs.2.15349/v1 ◽

2019 ◽

Author(s):

Momoko Hayamizu ◽

Kazuhisa Makino

Keyword(s):

Optimal Algorithm ◽

Linear Time ◽

Fundamental Problem ◽

Phylogenetic Network ◽

Reticulate Evolution ◽

Interesting Property ◽

Biological Data ◽

Phylogenetic Networks ◽

Linear Delay ◽

Algorithmic Problems

Abstract 'Tree-based' phylogenetic networks provide a mathematically-tractable model for representing reticulate evolution in biology. Such networks consist of an underlying 'support tree' together with arcs between the edges of this tree. However, a tree-based network can have several such support trees, and this leads to a variety of algorithmic problems that are relevant to the analysis of biological data. Recently, Hayamizu (arXiv:1811.05849 [math.CO]) proved a structure theorem for tree-based phylogenetic networks and obtained linear-time and linear-delay algorithms for many basic problems on support trees, such as counting, optimisation, and enumeration. In the present paper, we consider the following fundamental problem in statistical data analysis: given a tree-based phylogenetic network $N$ whose arcs are associated with probability, create the top-$k$ support tree ranking for $N$ by their likelihood values. We provide a linear-delay (and hence optimal) algorithm for the problem and thus reveal the interesting property of tree-based phylogenetic networks that ranking top-$k$ support trees is as computationally easy as picking $k$ arbitrary support trees.

Download Full-text

Implementing Large Genomic Single Nucleotide Polymorphism Data Sets in Phylogenetic Network Reconstructions: A Case Study of Particularly Rapid Radiations of Cichlid Fish

Systematic Biology ◽

10.1093/sysbio/syaa005 ◽

2020 ◽

Vol 69 (5) ◽

pp. 848-862 ◽

Cited By ~ 2

Author(s):

Melisa Olave ◽

Axel Meyer

Keyword(s):

Single Nucleotide Polymorphism ◽

Gene Flow ◽

Genetic Material ◽

Cichlid Fish ◽

Phylogenetic Network ◽

Phylogenetic Networks ◽

Nucleotide Polymorphism ◽

Rapid Radiation ◽

Data Set ◽

Single Nucleotide

Abstract The Midas cichlids of the Amphilophus citrinellus spp. species complex from Nicaragua (13 species) are an extraordinary example of adaptive and rapid radiation ($<$24,000 years old). These cichlids are a very challenging group to infer its evolutionary history in phylogenetic analyses, due to the apparent prevalence of incomplete lineage sorting (ILS), as well as past and current gene flow. Assuming solely a vertical transfer of genetic material from an ancestral lineage to new lineages is not appropriate in many cases of genes transferred horizontally in nature. Recently developed methods to infer phylogenetic networks under such circumstances might be able to circumvent these problems. These models accommodate not just ILS, but also gene flow, under the multispecies network coalescent (MSNC) model, processes that are at work in young, hybridizing, and/or rapidly diversifying lineages. There are currently only a few programs available that implement MSNC for estimating phylogenetic networks. Here, we present a novel way to incorporate single nucleotide polymorphism (SNP) data into the currently available PhyloNetworks program. Based on simulations, we demonstrate that SNPs can provide enough power to recover the true phylogenetic network. We also show that it can accurately infer the true network more often than other similar SNP-based programs (PhyloNet and HyDe). Moreover, our approach results in a faster algorithm compared to the original pipeline in PhyloNetworks, without losing power. We also applied our new approach to infer the phylogenetic network of Midas cichlid radiation. We implemented the most comprehensive genomic data set to date (RADseq data set of 679 individuals and $>$37K SNPs from 19 ingroup lineages) and present estimated phylogenetic networks for this extremely young and fast-evolving radiation of cichlid fish. We demonstrate that the MSNC is more appropriate than the multispecies coalescent alone for the analysis of this rapid radiation. [Genomics; multispecies network coalescent; phylogenetic networks; phylogenomics; RADseq; SNPs.]

Download Full-text

Phylogenetic network analysis of SARS-CoV-2 genomes

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2004999117 ◽

2020 ◽

Vol 117 (17) ◽

pp. 9241-9243 ◽

Cited By ~ 305

Author(s):

Peter Forster ◽

Lucy Forster ◽

Colin Renfrew ◽

Michael Forster

Keyword(s):

Amino Acid ◽

Network Analysis ◽

East Asia ◽

Phylogenetic Network ◽

Common Type ◽

Phylogenetic Networks ◽

Ancestral Genome ◽

Founder Effects ◽

Environmental Resistance ◽

Ancestral Type

In a phylogenetic network analysis of 160 complete human severe acute respiratory syndrome coronavirus 2 (SARS-Cov-2) genomes, we find three central variants distinguished by amino acid changes, which we have named A, B, and C, with A being the ancestral type according to the bat outgroup coronavirus. The A and C types are found in significant proportions outside East Asia, that is, in Europeans and Americans. In contrast, the B type is the most common type in East Asia, and its ancestral genome appears not to have spread outside East Asia without first mutating into derived B types, pointing to founder effects or immunological or environmental resistance against this type outside Asia. The network faithfully traces routes of infections for documented coronavirus disease 2019 (COVID-19) cases, indicating that phylogenetic networks can likewise be successfully used to help trace undocumented COVID-19 infection sources, which can then be quarantined to prevent recurrent spread of the disease worldwide.

Download Full-text

Assessing the fit of the multi-species network coalescent to multi-locus data

Bioinformatics ◽

10.1093/bioinformatics/btaa863 ◽

2020 ◽

Author(s):

Ruoyi Cai ◽

Cécile Ané

Keyword(s):

Model Selection ◽

Goodness Of Fit ◽

Network Inference ◽

Phylogenetic Network ◽

Phylogenetic Networks ◽

Supplementary Information ◽

Goodness Of Fit Test ◽

Full Likelihood ◽

Genome Wide ◽

Inference Methods

Abstract Motivation With growing genome-wide molecular datasets from next-generation sequencing, phylogenetic networks can be estimated using a variety of approaches. These phylogenetic networks include events like hybridization, gene flow or horizontal gene transfer explicitly. However, the most accurate network inference methods are computationally heavy. Methods that scale to larger datasets do not calculate a full likelihood, such that traditional likelihood-based tools for model selection are not applicable to decide how many past hybridization events best fit the data. We propose here a goodness-of-fit test to quantify the fit between data observed from genome-wide multi-locus data, and patterns expected under the multi-species coalescent model on a candidate phylogenetic network. Results We identified weaknesses in the previously proposed TICR test, and proposed corrections. The performance of our new test was validated by simulations on real-world phylogenetic networks. Our test provides one of the first rigorous tools for model selection, to select the adequate network complexity for the data at hand. The test can also work for identifying poorly inferred areas on a network. Availability and implementation Software for the goodness-of-fit test is available as a Julia package at https://github.com/cecileane/QuartetNetworkGoodnessFit.jl. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A Set of Inverse Telecommunication Network Problems

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.760-762.665 ◽

2013 ◽

Vol 760-762 ◽

pp. 665-668

Author(s):

Zhe Heng Ding ◽

Jing Wang ◽

Qin Wang

Keyword(s):

Inverse Problem ◽

Polynomial Time ◽

Telecommunication Network ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

Original Network ◽

Network Problem ◽

Network Problems ◽

Dominant Set ◽

The Given

In this paper, we consider a set of inverse telecommunication network problem under norm. With the expansion of telecommunication network, more and more links and nodes will be added to the existed telecommunication network. The original network can not cover new nodes and some old links become useless. The telecommunication company wants to sell some old links and purchase some new links within a given budget, such that the network of the company is able to access all nodes. We consider the inverse problem by using weakly dominant set, which is to change the weights of the edges as little as possible such that the given edge set becomes a weakly dominant set under the new weights. In this paper, we propose a polynomial time algorithm for the inverse problem under norm, and we also present an example to illustrate the algorithm.

Download Full-text

THE NET-HMM APPROACH: PHYLOGENETIC NETWORK INFERENCE BY COMBINING MAXIMUM LIKELIHOOD AND HIDDEN MARKOV MODELS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972000900428x ◽

2009 ◽

Vol 07 (04) ◽

pp. 625-644 ◽

Cited By ~ 2

Author(s):

SAGI SNIR ◽

TAMIR TULLER

Keyword(s):

Network Inference ◽

Markov Models ◽

Hidden Markov ◽

Bacterial Genome ◽

Genetic Material ◽

Phylogenetic Network ◽

Significance Test ◽

Amino Acid Sequences ◽

Phylogenetic Networks ◽

Significant Mechanism

Horizontal gene transfer (HGT) is the event of transferring genetic material from one lineage in the evolutionary tree to a different lineage. HGT plays a major role in bacterial genome diversification and is a significant mechanism by which bacteria develop resistance to antibiotics. Although the prevailing assumption is of complete HGT, cases of partial HGT (which are also named chimeric HGT) where only part of a gene is horizontally transferred, have also been reported, albeit less frequently. In this work we suggest a new probabilistic model, the NET-HMM, for analyzing and modeling phylogenetic networks. This new model captures the biologically realistic assumption that neighboring sites of DNA or amino acid sequences are not independent, which increases the accuracy of the inference. The model describes the phylogenetic network as a Hidden Markov Model (HMM), where each hidden state is related to one of the network's trees. One of the advantages of the NET-HMM is its ability to infer partial HGT as well as complete HGT. We describe the properties of the NET-HMM, devise efficient algorithms for solving a set of problems related to it, and implement them in software. We also provide a novel complementary significance test for evaluating the fitness of a model (NET-HMM) to a given dataset. Using NET-HMM, we are able to answer interesting biological questions, such as inferring the length of partial HGT's and the affected nucleotides in the genomic sequences, as well as inferring the exact location of HGT events along the tree branches. These advantages are demonstrated through the analysis of synthetical inputs and three different biological inputs.

Download Full-text

A review of metrics measuring dissimilarity for rooted phylogenetic networks

Briefings in Bioinformatics ◽

10.1093/bib/bby062 ◽

2018 ◽

Vol 20 (6) ◽

pp. 1972-1980 ◽

Cited By ~ 4

Author(s):

Juan Wang ◽

Maozu Guo

Keyword(s):

Polynomial Time ◽

Phylogenetic Network ◽

Phylogenetic Networks ◽

Evolutionary Relationships ◽

Comprehensive Review ◽

Phylogenic Analysis ◽

The Past ◽

Important Structure

Abstract A rooted phylogenetic network is an important structure in the description of evolutionary relationships. Computing the distance (topological dissimilarity) between two rooted phylogenetic networks is a fundamental in phylogenic analysis. During the past few decades, several polynomial-time computable metrics have been described. Here, we give a comprehensive review and analysis on those metrics, including the correlation among metrics and the distribution of distance values computed by each metric. Moreover, we describe the software and website, CDRPN (Computing Distance for Rooted Phylogenetic Networks), for measuring the topological dissimilarity between rooted phylogenetic networks. Availability http://bioinformatics.imu.edu.cn/distance/ Contact [email protected]

Download Full-text

A Divide-and-Conquer Method for Scalable Phylogenetic Network Inference from Multi-locus Data

10.1101/587725 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jiafan Zhu ◽

Xinhao Liu ◽

Huw A. Ogilvie ◽

Luay K. Nakhleh

Keyword(s):

Large Scale ◽

Network Inference ◽

Incomplete Lineage Sorting ◽

Phylogenetic Network ◽

Biological Data ◽

Phylogenetic Networks ◽

Divide And Conquer ◽

Lineage Sorting ◽

Step Method ◽

Sequence Alignments

AbstractReticulate evolutionary histories, such as those arising in the presence of hybridization, are best modeled as phylogenetic networks. Recently developed methods allow for statistical inference of phylogenetic networks while also accounting for other processes, such as incomplete lineage sorting (ILS). However, these methods can only handle a small number of loci from a handful of genomes.In this paper, we introduce a novel two-step method for scalable inference of phylogenetic networks from the sequence alignments of multiple, unlinked loci. The method infers networks on subproblems and then merges them into a network on the full set of taxa. To reduce the number of trinets to infer, we formulate a Hitting Set version of the problem of finding a small number of subsets, and implement a simple heuristic to solve it. We studied their performance, in terms of both running time and accuracy, on simulated as well as on biological data sets. The two-step method accurately infers phylogenetic networks at a scale that is infeasible with existing methods. The results are a significant and promising step towards accurate, large-scale phylogenetic network inference.We implemented the algorithms in the publicly available software package PhyloNet (https://bioinfocs.rice.edu/PhyloNet)[email protected]

Download Full-text

Practical Aspects of Phylogenetic Network Analysis Using PhyloNet

10.1101/746362 ◽

2019 ◽

Author(s):

Zhen Cao ◽

Xinhao Liu ◽

Huw A. Ogilvie ◽

Zhi Yan ◽

Luay Nakhleh

Keyword(s):

Incomplete Lineage Sorting ◽

Phylogenetic Network ◽

Synthetic Data ◽

Simulated Data ◽

Single Species ◽

Phylogenetic Networks ◽

Lineage Sorting ◽

Data Set ◽

Types Of Information ◽

Analyze Data

AbstractPhylogenetic networks extend trees to enable simultaneous modeling of both vertical and horizontal evolutionary processes. PhyloNet is a software package that has been under constant development for over 10 years and includes a wide array of functionalities for inferring and analyzing phylogenetic networks. These functionalities differ in terms of the input data they require, the criteria and models they employ, and the types of information they allow to infer about the networks beyond their topologies. Furthermore, PhyloNet includes functionalities for simulating synthetic data on phylogenetic networks, quantifying the topological differences between phylogenetic networks, and evaluating evolutionary hypotheses given in the form of phylogenetic networks.In this paper, we use a simulated data set to illustrate the use of several of PhyloNet’s functionalities and make recommendations on how to analyze data sets and interpret the results when using these functionalities. All inference methods that we illustrate are incomplete lineage sorting (ILS) aware; that is, they account for the potential of ILS in the data while inferring the phylogenetic network. While the models do not include gene duplication and loss, we discuss how the methods can be used to analyze data in the presence of polyploidy.The concept of species is irrelevant for the computational analyses enabled by PhyloNet in that species-individuals mappings are user-defined. Consequently, none of the functionalities in PhyloNet deals with the task of species delimitation. In this sense, the data being analyzed could come from different individuals within a single species, in which case population structure along with potential gene flow is inferred (assuming the data has sufficient signal), or from different individuals sampled from different species, in which case the species phylogeny is being inferred.

Download Full-text