scholarly journals Complete Characterization of Incorrect Orthology Assignments in Best Match Graphs

2021 ◽  
Vol 82 (3) ◽  
Author(s):  
David Schaller ◽  
Manuela Geiß ◽  
Peter F. Stadler ◽  
Marc Hellmuth

AbstractGenome-scale orthology assignments are usually based on reciprocal best matches. In the absence of horizontal gene transfer (HGT), every pair of orthologs forms a reciprocal best match. Incorrect orthology assignments therefore are always false positives in the reciprocal best match graph. We consider duplication/loss scenarios and characterize unambiguous false-positive (u-fp) orthology assignments, that is, edges in the best match graphs (BMGs) that cannot correspond to orthologs for any gene tree that explains the BMG. Moreover, we provide a polynomial-time algorithm to identify all u-fp orthology assignments in a BMG. Simulations show that at least $$75\%$$ 75 % of all incorrect orthology assignments can be detected in this manner. All results rely only on the structure of the BMGs and not on any a priori knowledge about underlying gene or species trees.

2020 ◽  
Vol 12 (4) ◽  
pp. 381-395
Author(s):  
Nilson Da Rocha Coimbra ◽  
Aristoteles Goes-Neto ◽  
Vasco Azevedo ◽  
Aïda Ouangraoua

Abstract Horizontal gene transfer is a common mechanism in Bacteria that has contributed to the genomic content of existing organisms. Traditional methods for estimating bacterial phylogeny, however, assume only vertical inheritance in the evolution of homologous genes, which may result in errors in the estimated phylogenies. We present a new method for estimating bacterial phylogeny that accounts for the presence of genes acquired by horizontal gene transfer between genomes. The method identifies and corrects putative transferred genes in gene families, before applying a gene tree-based summary method to estimate bacterial species trees. The method was applied to estimate the phylogeny of the order Corynebacteriales, which is the largest clade in the phylum Actinobacteria. We report a collection of 14 phylogenetic trees on 360 Corynebacteriales genomes. All estimated trees display each genus as a monophyletic clade. The trees also display several relationships proposed by past studies, as well as new relevant relationships between and within the main genera of Corynebacteriales: Corynebacterium, Mycobacterium, Nocardia, Rhodococcus, and Gordonia. An implementation of the method in Python is available on GitHub at https://github.com/UdeS-CoBIUS/EXECT (last accessed April 2, 2020).


Algorithms ◽  
2021 ◽  
Vol 14 (4) ◽  
pp. 110
Author(s):  
David Schaller ◽  
Manuela Geiß ◽  
Marc Hellmuth ◽  
Peter F. Stadler

Best match graphs (BMGs) are vertex-colored digraphs that naturally arise in mathematical phylogenetics to formalize the notion of evolutionary closest genes w.r.t. an a priori unknown phylogenetic tree. BMGs are explained by unique least resolved trees. We prove that the property of a rooted, leaf-colored tree to be least resolved for some BMG is preserved by the contraction of inner edges. For the special case of two-colored BMGs, this leads to a characterization of the least resolved trees (LRTs) of binary-explainable trees and a simple, polynomial-time algorithm for the minimum cardinality completion of the arc set of a BMG to reach a BMG that can be explained by a binary tree.


2015 ◽  
Author(s):  
Ruth Davidson ◽  
Pranjal Vachaspati ◽  
Siavash Mirarab ◽  
Tandy Warnow

Background: Species tree estimation is challenged by gene tree heterogeneity resulting from biological processes such as duplication and loss, hybridization, incomplete lineage sorting (ILS), and horizontal gene transfer (HGT). Mathematical theory about reconstructing species trees in the presence of HGT alone or ILS alone suggests that quartet-based species tree methods (known to be statistically consistent under ILS, or under bounded amounts of HGT) might be effective techniques for estimating species trees when both HGT and ILS are present. Results: We evaluated several publicly available coalescent-based methods and concatenation under maximum likelihood on simulated datasets with moderate ILS and varying levels of HGT. Our study shows that two quartet-based species tree estimation methods (ASTRAL-2 and weighted Quartets MaxCut) are both highly accurate, even on datasets with high rates of HGT. In contrast, although NJst and concatenation using maximum likelihood are highly accurate under low HGT, they are less robust to high HGT rates. Conclusion: Our study shows that quartet-based species-tree estimation methods can be highly accurate under the presence of both HGT and ILS. The study suggests the possibility that some quartet-based methods might be statistically consistent under phylogenomic models of gene tree heterogeneity with both HGT and ILS. Keywords: phylogenomics; HGT; ILS; summary methods; concatenation


2017 ◽  
Author(s):  
Nikolai Nøjgaard ◽  
Manuela Geiß ◽  
Peter F. Stadler ◽  
Daniel Merkle ◽  
Nicolas Wieseke ◽  
...  

AbstractBackgroundIn the absence of horizontal gene transfer it is possible to reconstruct the history of gene families from empirically determined orthology relations, which are equivalent to event-labeled gene trees. Knowledge of the event labels considerably simplifies the problem of reconciling a gene tree T with a species trees S, relative to the reconciliation problem without prior knowledge of the event types. It is well-known that optimal reconciliations in the unlabeled case may violate time-consistency and thus are not biologically feasible. Here we investigate the mathematical structure of the event labeled reconciliation problem with horizontal transfer.ResultsWe investigate the issue of time-consistency for the event-labeled version of the reconciliation problem, provide a convenient axiomatic framework, and derive a complete characterization of time-consistent reconciliations. This characterization depends on certain weak conditions on the event-labeled gene trees that reflect conditions under which evolutionary events are observable at least in principle. We give an 𝒪(|V(T)|log(|V(S)|))-time algorithm to decide whether a time-consistent reconciliation map exists. It does not require the construction of explicit timing maps, but relies entirely on the comparably easy task of checking whether a small auxiliary graph is acyclic. The algorithms are implemented in C++ using the boost graph library and are freely available at https://github.com/Nojgaard/tc-recon.SignificanceThe combinatorial characterization of time consistency and thus biologically feasible reconciliation is an important step towards the inference of gene family histories with horizontal transfer from orthology data, i.e., without presupposed gene and species trees. The fast algorithm to decide time consistency is useful in a broader context because it constitutes an attractive component for all tools that address tree reconciliation problems.


Author(s):  
Huateng Huang ◽  
Jeet Sukumaran ◽  
Stephen A Smith ◽  
L.Lacey Knowles

Despite recent efforts that have produced data sets with hundreds and thousands of gene regions to resolve regions of the tree of life, recalcitrant nodes persist and disagreement among genes as well as disagreement between individual gene trees and species trees are common. There are a number of evolutionary processes that contribute to these conflicts between gene trees and species trees, including deep coalescence (lineage sorting), horizontal gene transfer or hybridization, etc. While for some of these processes, we have very powerful and sophisticated models that uses the conflict in the gene trees as information that contributes materially to correctly inferring the species tree, such as the multispecies coalescent (MSC). However, usage of these models require a priori recognition of relevant processes, which is often unknown for empirical dataset. Here we propose a new perspective to not only identify the cause of discord among gene trees, but also use it to classify loci by the underlying cause of discord to identify subsets of loci for analysis with the goal of improving phylogenetic accuracy. This approach differs fundamentally from all other criteria used for making decisions about which loci to include in a phylogenetic analysis. In particular, the choice of loci in this framework is based on identifying those that reflect descent from a common ancestor (as opposed to other processes), and thereby can minimize problems with model misspecification. We present preliminary results that demonstrate the potential of this framework in distinguishing the lateral gene transfer (LGT) from incomplete lineage sorting (ILS) process, as implemented in a new software package CLASSIPHY, while also highlighting areas for further development and testing. We discussed why such methods (i) are critical to improving phylogenetic accuracy with the increased complexity of genomic/transcriptomic datasets, and that (ii) characterizing patterns of discordance and the contribution of different processes to this discordance is itself of interest for generating hypotheses about the role of lateral gene transfer, gene duplication, and incomplete lineage sorting during the divergence of different taxa.


2017 ◽  
Author(s):  
Huateng Huang ◽  
Jeet Sukumaran ◽  
Stephen A Smith ◽  
L.Lacey Knowles

Despite recent efforts that have produced data sets with hundreds and thousands of gene regions to resolve regions of the tree of life, recalcitrant nodes persist and disagreement among genes as well as disagreement between individual gene trees and species trees are common. There are a number of evolutionary processes that contribute to these conflicts between gene trees and species trees, including deep coalescence (lineage sorting), horizontal gene transfer or hybridization, etc. While for some of these processes, we have very powerful and sophisticated models that uses the conflict in the gene trees as information that contributes materially to correctly inferring the species tree, such as the multispecies coalescent (MSC). However, usage of these models require a priori recognition of relevant processes, which is often unknown for empirical dataset. Here we propose a new perspective to not only identify the cause of discord among gene trees, but also use it to classify loci by the underlying cause of discord to identify subsets of loci for analysis with the goal of improving phylogenetic accuracy. This approach differs fundamentally from all other criteria used for making decisions about which loci to include in a phylogenetic analysis. In particular, the choice of loci in this framework is based on identifying those that reflect descent from a common ancestor (as opposed to other processes), and thereby can minimize problems with model misspecification. We present preliminary results that demonstrate the potential of this framework in distinguishing the lateral gene transfer (LGT) from incomplete lineage sorting (ILS) process, as implemented in a new software package CLASSIPHY, while also highlighting areas for further development and testing. We discussed why such methods (i) are critical to improving phylogenetic accuracy with the increased complexity of genomic/transcriptomic datasets, and that (ii) characterizing patterns of discordance and the contribution of different processes to this discordance is itself of interest for generating hypotheses about the role of lateral gene transfer, gene duplication, and incomplete lineage sorting during the divergence of different taxa.


2005 ◽  
Vol 389 (1) ◽  
pp. 117-125 ◽  
Author(s):  
Taisei KIKUCHI ◽  
Hajime SHIBUYA ◽  
John T. JONES

We report the cloning and functional characterization of an endo-β-1,3-glucanase from the pinewood nematode Bursaphelenchus xylophilus acquired by horizontal gene transfer from bacteria. This is the first gene of this type from any nematode species. We show that a similar cDNA is also present in another closely related species B. mucronatus, but that similar sequences are not present in any other nematode studied to date. The B. xylophilus gene is expressed solely in the oesophageal gland cells of the nematode and the protein is present in the nematode's secretions. The deduced amino acid sequence of the gene is very similar to glycosyl hydrolase family 16 proteins. The recombinant protein, expressed in Escherichia coli, preferentially hydrolysed the β-1,3-glucan laminarin, and had very low levels of activity on β-1,3-1,4-glucan, lichenan and barley β-glucan. Laminarin was degraded in an endoglucanase mode by the enzyme. The optimal temperature and pH for activity of the recombinant enzyme were 65 °C and pH 4.9. The protein is probably important in allowing the nematodes to feed on fungi. Sequence comparisons suggest that the gene encoding the endo-β-1,3-glucanase was acquired by horizontal gene transfer from bacteria. B. xylophilus therefore contains genes that have been acquired by this process from both bacteria and fungi. These findings support the idea that multiple independent horizontal gene transfer events have helped in shaping the evolution of several different life strategies in nematodes.


Author(s):  
Erel Segal-Halevi ◽  
Haris Aziz ◽  
Avinatan Hassidim

Ranking alternatives is a natural way for humans to explain their preferences. It is being used in many settings, such as school choice (NY, Boston), Course allocations, and the Israeli medical lottery. In some cases (such as the latter two), several ``items'' are given to each participant. Without having any information on the underlying cardinal utilities, arguing about fairness of allocation requires extending the ordinal item ranking to ordinal bundle ranking. The most commonly used such extension is stochastic dominance (SD), where a bundle X is preferred over a bundle Y if its score is better according to all additive score functions. SD is a very conservative extension, by which few allocations are necessarily fair while many allocations are possibly fair. We propose to make a natural assumption on the underlying cardinal utilities of the players, namely that the difference between two items at the top is larger than the difference between two items at the bottom. This assumption implies a preference extension which we call diminishing differences (DD), where a X is preferred over Y if its score is better according to all additive score functions satisfying the DD assumption. We give a full characterization of allocations that are necessarily-proportional or possibly-proportional according to this assumption. Based on this characterization, we present a polynomial-time algorithm for finding a necessarily-DD-proportional allocation if it exists. Using simulations, we show that with high probability, a necessarily-proportional allocation does not exist but a necessarily-DD-proportional allocation exists, and moreover, that allocation is proportional according to the underlying cardinal utilities.


Sign in / Sign up

Export Citation Format

Share Document