scholarly journals Tropical optimal transport and Wasserstein distances

Author(s):  
Wonjun Lee ◽  
Wuchen Li ◽  
Bo Lin ◽  
Anthea Monod

AbstractWe study the problem of optimal transport in tropical geometry and define the Wasserstein-p distances in the continuous metric measure space setting of the tropical projective torus. We specify the tropical metric—a combinatorial metric that has been used to study of the tropical geometric space of phylogenetic trees—as the ground metric and study the cases of $$p=1,2$$ p = 1 , 2 in detail. The case of $$p=1$$ p = 1 gives an efficient computation of the infinitely-many geodesics on the tropical projective torus, while the case of $$p=2$$ p = 2 gives a form for Fréchet means and a general inner product structure. Our results also provide theoretical foundations for geometric insight a statistical framework in a tropical geometric setting. We construct explicit algorithms for the computation of the tropical Wasserstein-1 and 2 distances and prove their convergence. Our results provide the first study of the Wasserstein distances and optimal transport in tropical geometry. Several numerical examples are provided.

2019 ◽  
Vol 2019 ◽  
pp. 1-9
Author(s):  
Alberto López Rosado ◽  
Federico Prieto Muñoz ◽  
Roberto Alvarez Fernández

This article introduces new types of rational approximations of the inverse involute function, widely used in gear engineering, allowing the processing of this function with a very low error. This approximated function is appropriate for engineering applications, with a much reduced number of operations than previous formulae in the existing literature, and a very efficient computation. The proposed expressions avoid the use of iterative methods. The theoretical foundations of the approximation theory of rational functions, the Chebyshev and Jacobi polynomials that allow these approximations to be obtained, are presented in this work, and an adaptation of the Remez algorithm is also provided, which gets a null error at the origin. This way, approximations in ranges or degrees different from those presented here can be obtained. A rational approximation of the direct involute function is computed, which avoids the computation of the tangent function. Finally, the direct polar equation of the circle involute curve is approximated with some application examples.


Genetics ◽  
1992 ◽  
Vol 131 (3) ◽  
pp. 753-760 ◽  
Author(s):  
J G Lawrence ◽  
D L Hartl

Abstract Inconsistencies in taxonomic relationships implicit in different sets of nucleic acid sequences potentially result from horizontal transfer of genetic material between genomes. A nonparametric method is proposed to determine whether such inconsistencies are statistically significant. A similarity coefficient is calculated from ranked pairwise identities and evaluated against a distribution of similarity coefficients generated from resampled data. Subsequent analyses of partial data sets, obtained by the elimination of individual taxa, identify particular taxa to which the significance may be attributed, and can sometimes help in distinguishing horizontal genetic transfer from inconsistencies due to convergent evolution or variation in evolutionary rate. The method was successfully applied to data sets that were not found to be significantly different with existing methods that use comparisons of phylogenetic trees. The new statistical framework is also applicable to the inference of horizontal transfer from restriction fragment length polymorphism distributions and protein sequences.


2020 ◽  
Vol 68 ◽  
pp. 1-19
Author(s):  
Jérémie Bigot

This paper is concerned by statistical inference problems from a data set whose elements may be modeled as random probability measures such as multiple histograms or point clouds. We propose to review recent contributions in statistics on the use of Wasserstein distances and tools from optimal transport to analyse such data. In particular, we highlight the benefits of using the notions of barycenter and geodesic PCA in the Wasserstein space for the purpose of learning the principal modes of geometric variation in a dataset. In this setting, we discuss existing works and we present some research perspectives related to the emerging field of statistical optimal transport.


2012 ◽  
Vol DMTCS Proceedings vol. AR,... (Proceedings) ◽  
Author(s):  
Christopher Manon

International audience We will discuss some recent theorems relating the space of weighted phylogenetic trees to the tropical varieties of each flag variety of type A. We will also discuss the tropicalizations of the functions corresponding to semi-standard tableaux, in particular we relate them to familiar functions from phylogenetics. We close with some remarks on the generalization of these results to the tropical geometry of arbitrary flag varieties. This involves the family of Bergman complexes derived from the hyperplane arrangements associated to simple Dynkin diagrams. Nous allons discuter de quelques théorèmes récents concernant l'espace des arbres phylogénétiques aux variétés Tropicales de chaque variété de drapeaux de type A. Nous allons également discuter des tropicalisations des fonctions correspondant à tableaux semi-standard, en particulier, nous les rapporter à des fonctions familières de la phylogénétique. Nous terminerons avec quelques remarques sur la généralisation de ces résultats à la géométrie tropicale de variétés de drapeaux arbitraires. Il s'agit de la famille de complexes Bergman provenant des arrangements d'hyperplans associés à des diagrammes de Dynkin simples.


Author(s):  
Heinz Stockinger ◽  
Alexander F. Auch ◽  
Markus Göker ◽  
Jan Meier-Kolthoff ◽  
Alexandros Stamatakis

Phylogenetic data analysis represents an extremely compute-intensive area of Bioinformatics and thus requires high-performance technologies. Another compute- and memory-intensive problem is that of host-parasite co-phylogenetic analysis: given two phylogenetic trees, one for the hosts (e.g., mammals) and one for their respective parasites (e.g., lice) the question arises whether host and parasite trees are more similar to each other than expected by chance alone. CopyCat is an easy-to-use tool that allows biologists to conduct such co-phylogenetic studies within an elaborate statistical framework based on the highly optimized sequential and parallel AxParafit program. We have developed enhanced versions of these tools that efficiently exploit a Grid environment and therefore facilitate large-scale data analyses. Furthermore, we developed a freely accessible client tool that provides co-phylogenetic analysis capabilities. Since the computational bulk of the problem is embarrassingly parallel, it fits well to a computational Grid and reduces the response time of large scale analyses.


2020 ◽  
Author(s):  
Shijia Wang ◽  
Shufei Ge ◽  
Caroline Coljin ◽  
Liangliang Wang ◽  
Lloyd T Elliott

AbstractGenetic similarity is a measure for the genetic relatedness among individuals. The standard method for computing similarity matrices involves the inner product of observed genetic variant matrices. Such an approach is inaccurate or impossible if genotypes are not available, or not densely sampled, or of poor quality (for example, genetic analysis of extinct species). We provide a new method for computing genetic similarities among individuals using phylogenetic trees. Our method can supplement (or stand in for) computations based on genetic sequences. We show that the genetic similarity matrices computed from trees are consistent with those computed from genotypes. Quantitative analysis on genetic traits and analysis of heritability and co-heritability can be conducted directly using genetic similarity matrices and so in the absence of genotype data, and the presence of phylogenetic trees derived from morphological data or geological dates, such analyses can be undertaken using our methods. We use simulation studies to demonstrate the advantages of our method, and we provide an application to ancient hominin data.


Author(s):  
Heinz Stockinger ◽  
Alexander Auch ◽  
Markus Goeker ◽  
Jan Meier-Kolthoff ◽  
Alexandros Stamatakis

Phylogenetic data analysis represents an extremely compute-intensive area of Bioinformatics and thus requires high-performance technologies. Another compute- and memory-intensive problem is that of host-parasite co-phylogenetic analysis: given two phylogenetic trees, one for the hosts (e.g., mammals) and one for their respective parasites (e.g., lice) the question arises whether host and parasite trees are more similar to each other than expected by chance alone. CopyCat is an easy-to-use tool that allows biologists to conduct such co-phylogenetic studies within an elaborate statistical framework based on the highly optimized sequential and parallel AxParafit program. We have developed enhanced versions of these tools that efficiently exploit a Grid environment and therefore facilitate large-scale data analyses. Furthermore, we developed a freely accessible client tool that provides co-phylogenetic analysis capabilities. Since the computational bulk of the problem is embarrassingly parallel, it fits well to a computational Grid and reduces the response time of large scale analyses.


2019 ◽  
Vol 56 (3) ◽  
pp. 830-857 ◽  
Author(s):  
Jose Blanchet ◽  
Yang Kang ◽  
Karthyek Murthy

AbstractWe show that several machine learning estimators, including square-root least absolute shrinkage and selection and regularized logistic regression, can be represented as solutions to distributionally robust optimization problems. The associated uncertainty regions are based on suitably defined Wasserstein distances. Hence, our representations allow us to view regularization as a result of introducing an artificial adversary that perturbs the empirical distribution to account for out-of-sample effects in loss estimation. In addition, we introduce RWPI (robust Wasserstein profile inference), a novel inference methodology which extends the use of methods inspired by empirical likelihood to the setting of optimal transport costs (of which Wasserstein distances are a particular case). We use RWPI to show how to optimally select the size of uncertainty regions, and as a consequence we are able to choose regularization parameters for these machine learning estimators without the use of cross validation. Numerical experiments are also given to validate our theoretical findings.


2005 ◽  
Vol 15 (2) ◽  
pp. 83-92 ◽  
Author(s):  
Idris A. Eckley ◽  
Guy P. Nason

Sign in / Sign up

Export Citation Format

Share Document