scholarly journals AllCoPol: inferring allele co-ancestry in polyploids

2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Ulrich Lautenschlager ◽  
Florian Wagner ◽  
Christoph Oberprieler

Abstract Background Inferring phylogenetic relationships of polyploid species and their diploid ancestors (leading to reticulate phylogenies in the case of an allopolyploid origin) based on multi-locus sequence data is complicated by the unknown assignment of alleles found in polyploids to diploid subgenomes. A parsimony-based approach to this problem has been proposed by Oberprieler et al. (Methods Ecol Evol 8:835–849, 2017), however, its implementation is of limited practical value. In addition to previously identified shortcomings, it has been found that in some cases, the obtained results barely satisfy the applied criterion. To be of better use to other researchers, a reimplementation with methodological refinement appears to be indispensable. Results We present the AllCoPol package, which provides a heuristic method for assigning alleles from polyploids to diploid subgenomes based on the Minimizing Deep Coalescences (MDC) criterion in multi-locus sequence datasets. An additional consensus approach further allows to assess the confidence of phylogenetic reconstructions. Simulations of tetra- and hexaploids show that under simplifying assumptions such as completely disomic inheritance, the topological errors of reconstructed phylogenies are similar to those of MDC species trees based on the true allele partition. Conclusions AllCoPol is a Python package for phylogenetic reconstructions of polyploids offering enhanced functionality as well as improved usability. The included methods are supplied as command line tools without the need for prior programming knowledge.

2005 ◽  
Vol 37 (6) ◽  
pp. 491-498 ◽  
Author(s):  
Anders NORDIN ◽  
Leif TIBELL

Tetramelas phaeophysciae, a new obligately lichenicolous species occurring in Scandinavia, Iceland and Greenland, is described, and the closely related Buellia pulverulenta, together with B. triphragmioides, are transferred to Tetramelas. Phylogenetic reconstructions based on sequence data from nITS1-5.8S-ITS2 rDNA, using Bayesian inference and parsimony analyses, support the segregation of the new species from B. pulverulenta as well as the segregation of Tetramelas and Diplotomma from Buellia s. str.


2015 ◽  
Vol 61 (5) ◽  
pp. 866-873 ◽  
Author(s):  
Itzue W. Caviedes-Solis ◽  
Nassima M. Bouzid ◽  
Barbara L. Banbury ◽  
Adam D. Leaché

Abstract Phylogenetic and phylogeographic studies rely on the accurate quantification of biodiversity. In recent studies of taxonomically ambiguous groups, species boundaries are often determined based on multi-locus sequence data. Bayesian Phylogenetics and Phylogeography (BPP) is a coalescent-based method frequently used to delimit species; however, empirical studies suggest that the requirement of a user-specified guide tree biases the range of possible outcomes. We evaluate fifteen multi-locus datasets using the most recent iteration of BPP, which eliminates the need for a user-specified guide tree and reconstructs the species tree in synchrony with species delimitation (= unguided species delimitation). We found that the number of species recovered with guided versus unguided species delimitation was the same except for two cases, and that posterior probabilities were generally lower for the unguided analyses as a result of searching across species trees in addition to species delimitation models. The guide trees used in previous studies were often discordant with the species tree topologies estimated by BPP. We also compared species trees estimated using BPP and *BEAST and found that when the topologies are the same, BPP tends to give higher posterior probabilities.


Author(s):  
Lianming Du ◽  
Qin Liu ◽  
Zhenxin Fan ◽  
Jie Tang ◽  
Xiuyue Zhang ◽  
...  

Abstract FASTA and FASTQ are the most widely used biological data formats that have become the de facto standard to exchange sequence data between bioinformatics tools. With the avalanche of next-generation sequencing data, the amount of sequence data being deposited and accessed in FASTA/Q formats is increasing dramatically. However, the existing tools have very low efficiency at random retrieval of subsequences due to the requirement of loading the entire index into memory. In addition, most existing tools have no capability to build index for large FASTA/Q files because of the limited memory. Furthermore, the tools do not provide support to randomly accessing sequences from FASTA/Q files compressed by gzip, which is extensively adopted by most public databases to compress data for saving storage. In this study, we developed pyfastx as a versatile Python package with commonly used command-line tools to overcome the above limitations. Compared to other tools, pyfastx yielded the highest performance in terms of building index and random access to sequences, particularly when dealing with large FASTA/Q files with hundreds of millions of sequences. A key advantage of pyfastx over other tools is that it offers an efficient way to randomly extract subsequences directly from gzip compressed FASTA/Q files without needing to uncompress beforehand. Pyfastx can easily be installed from PyPI (https://pypi.org/project/pyfastx) and the source code is freely available at https://github.com/lmdu/pyfastx.


2004 ◽  
Vol 78 (5) ◽  
pp. 2537-2544 ◽  
Author(s):  
Nadjia Radjef ◽  
Emmanuel Gordien ◽  
Valeria Ivaniushina ◽  
Elyanne Gault ◽  
Patricia Anaïs ◽  
...  

ABSTRACT Hepatitis D virus (HDV) is a satellite of hepatitis B virus (HBV) for transmission and propagation and infects nearly 20 million people worldwide. The HDV genome is a compact circular single-stranded RNA genome with extensive intramolecular complementarity. Despite its different epidemiological and pathological patterns, the variability and geographical distribution of HDV are limited to three genotypes and two subtypes that have been characterized to date. Phylogenetic reconstructions based on the delta antigen gene and full-length genome sequence data show an extensive and probably ancient radiation of African lineages, suggesting that the genetic variability of HDV is much more complex than was previously thought, with evidence of additional clades. These results relate the geographic distribution of HDV more closely to the genetic variability of its helper HBV.


Author(s):  
Paul Zaharias ◽  
Tandy Warnow

With the increased availability of sequence data and even of fully sequenced and assembled genomes, phylogeny estimation of very large trees (even of hundreds of thousands of sequences) is now a goal for some biologists. Yet, the construction of these phylogenies is a complex pipeline presenting analytical and computational challenges, especially when the number of sequences is very large. In the last few years, new methods have been developed that aim to enable highly accurate phylogeny estimations on these large datasets, including divide-and-conquer techniques for multiple sequence alignment and/or tree estimation, methods that can estimate species trees from multi-locus datasets while addressing heterogeneity due to biological processes (e.g., incomplete lineage sorting and gene duplication and loss), and methods to add sequences into large gene trees or species trees. Here we present some of these recent advances and discuss opportunities for future improvements.


Author(s):  
Patrick F. McKenzie ◽  
Deren A. R. Eaton

AbstractSummaryipcoal is a free and open source Python package for simulating and analyzing genealogies and sequences. It automates the task of describing complex demographic models (e.g., with divergence times, effective population sizes, migration events) to the msprime coalescent simulator by parsing a user-supplied species tree or network. Genealogies, sequences, and metadata are returned in tabular format allowing for easy downstream analyses. ipcoal includes phylogenetic inference tools to automate gene tree inference from simulated sequence data, and visualization tools for analyzing results and verifying model accuracy. The ipcoal package is a powerful tool for posterior predictive data analysis, for methods validation, and for teaching coalescent methods in an interactive and visual environment.Availability and implementationSource code is available from the GitHub repository (https://github.com/pmckenz1/ipcoal/) and is distributed for packaged installation with conda. Complete documentation and interactive notebooks prepared for teaching purposes are available at https://ipcoal.readthedocs.io/.


2020 ◽  
Author(s):  
Michael J. Sanderson ◽  
Alberto Búrquez ◽  
Dario Copetti ◽  
Michelle M. McMahon ◽  
Yichao Zeng ◽  
...  

AbstractGenome sequence data are routinely being used to infer phylogenetic history within and between closely related diploid species, but few tree inference methods are specifically tailored to diploid genotype data. Here we re-examine the method of “polymorphism parsimony” (Inger 1967; Farris 1978; Felsenstein 1979), originally introduced to study morphological characters and chromosome inversion polymorphisms, to evaluate its utility for unphased diploid genotype data in large scale phylogenomic data sets. We show that it is equivalent to inferring species trees by minimizing deep coalescences—assuming an infinite sites model. Two potential advantages of this approach are scalability and estimation of a rooted tree. As with some other single nucleotide polymorphism (SNP) based methods, it requires thinning of data sets to statistically independent sites, and we describe a genotype-based test for phylogenetic independence. To evaluate this approach in genome scale data, we construct intraspecific phylogenies for 10 populations of the saguaro cactus using 200 Gbp of resequencing data, and then use these methods to test whether the population with highest genetic diversity corresponds to the root of the genotype trees. Results were highly congruent with the (unrooted) trees obtained using SVDquartets, a scalable alternative method of phylogenomic inference.


2014 ◽  
Author(s):  
Lingfei Cui ◽  
Laura Kubatko

One of the central tasks in evolutionary biology is to reconstruct the evolutionary relationships among species from sequence data, particularly from multilocus data. In the last ten years, many methods have been proposed to use the variance in the gene histories to estimate species trees by explicitly modeling deep coalescence. However, gene flow, another process that may produce gene history variance, has been less studied. In this paper, we propose a simple yet innovative method for species trees estimation in the presence of gene flow. Our method, called STEST (Species Tree Estimation from Speciation Times), constructs species tree estimates from pairwise speciation time or species divergence time estimates. By using methods that estimate speciation times in the presence of gene flow, (for example, M1 (Yang 2010) or SIM3s (Zhu and Yang 2012)), STEST is able to estimate species trees from data subject to gene flow. We develop two methods, called STEST (M1) and STEST (SIM3s), for this purpose. Additionally, we consider the method STEST (M0), which instead uses the M0 method (Yang 2002), a coalescent-based method that does not assume gene flow, to estimate speciation times. It is therefore devised to estimate species trees in the absence of gene flow. Our simulation studies show that STEST (M0) outperforms STEST(M1), STEST (SIM3s) and STEM in terms of estimation accuracy and outperfroms *BEAST in terms of running time when the degree of gene flow is small. STEST (M1) outperforms STEST (M0), STEST (SIM3s), STEM and *BEAST in term of estimation accuracy when the degree of gene flow is large. An empirical data set analyzed by these methods gives species tree estimates that are consistent with the previous results.


2018 ◽  
Author(s):  
Huw A. Ogilvie ◽  
Timothy G. Vaughan ◽  
Nicholas J. Matzke ◽  
Graham J. Slater ◽  
Tanja Stadler ◽  
...  

AbstractBayesian methods can be used to accurately estimate species tree topologies, times and other parameters, but only when the models of evolution which are available and utilized sufficiently account for the underlying evolutionary processes. Multispecies coalescent (MSC) models have been shown to accurately account for the evolution of genes within species in the absence of strong gene flow between lineages, and fossilized birth-death (FBD) models have been shown to estimate divergence times from fossil data in good agreement with expert opinion. Until now dating analyses using the MSC have been based on a fixed clock or informally derived node priors instead of the FBD. On the other hand, dating analyses using an FBD process have concatenated all gene sequences and ignored coalescence processes. To address these mirror-image deficiencies in evolutionary models, we have developed an integrative model of evolution which combines both the FBD and MSC models. By applying concatenation and the MSC (without employing the FBD process) to an exemplar data set consisting of molecular sequence data and morphological characters from the dog and fox subfamily Caninae, we show that concatenation causes predictable biases in estimated branch lengths. We then applied concatenation using the FBD process and the combined FBD-MSC model to show that the same biases are still observed when the FBD process is employed. These biases can be avoided by using the FBD-MSC model, which coherently models fossilization and gene evolution, and does not require an a priori substitution rate estimate to calibrate the molecular clock. We have implemented the FBD-MSC in a new version of StarBEAST2, a package developed for the BEAST2 phylogenetic software.


Sign in / Sign up

Export Citation Format

Share Document