scholarly journals On Markov Earth Mover's Distance

2014 ◽  
Vol 14 (04) ◽  
pp. 1450016 ◽  
Author(s):  
Jie Wei

In statistics, pattern recognition and signal processing, it is of utmost importance to have an effective and efficient distance to measure the similarity between two distributions and sequences. In statistics this is referred to as goodness-of-fit problem. Two leading goodness of fit methods are chi-square and Kolmogorov–Smirnov distances. The strictly localized nature of these two measures hinders their practical utilities in patterns and signals where the sample size is usually small. In view of this problem Rubner and colleagues developed the earth mover's distance (EMD) to allow for cross-bin moves in evaluating the distance between two patterns, which find a broad spectrum of applications. EMD-L1 was later proposed to reduce the time complexity of EMD from super-cubic by one order of magnitude by exploiting the special L1 metric. EMD-hat was developed to turn the global EMD to a localized one by discarding long-distance earth movements. In this work, we introduce a Markov EMD (MEMD) by treating the source and destination nodes absolutely symmetrically. In MEMD, like hat-EMD, the earth is only moved locally as dictated by the degree d of neighborhood system. Nodes that cannot be matched locally is handled by dummy source and destination nodes. By use of this localized network structure, a greedy algorithm that is linear to the degree d and number of nodes is then developed to evaluate the MEMD. Empirical studies on the use of MEMD on deterministic and statistical synthetic sequences and SIFT-based image retrieval suggested encouraging performances.

2020 ◽  
Author(s):  
Cameron Hargreaves ◽  
Matthew Dyer ◽  
Michael Gaultois ◽  
Vitaliy Kurlin ◽  
Matthew J Rosseinsky

It is a core problem in any field to reliably tell how close two objects are to being the same, and once this relation has been established we can use this information to precisely quantify potential relationships, both analytically and with machine learning (ML). For inorganic solids, the chemical composition is a fundamental descriptor, which can be represented by assigning the ratio of each element in the material to a vector. These vectors are a convenient mathematical data structure for measuring similarity, but unfortunately, the standard metric (the Euclidean distance) gives little to no variance in the resultant distances between chemically dissimilar compositions. We present the Earth Mover’s Distance (EMD) for inorganic compositions, a well-defined metric which enables the measure of chemical similarity in an explainable fashion. We compute the EMD between two compositions from the ratio of each of the elements and the absolute distance between the elements on the modified Pettifor scale. This simple metric shows clear strength at distinguishing compounds and is efficient to compute in practice. The resultant distances have greater alignment with chemical understanding than the Euclidean distance, which is demonstrated on the binary compositions of the Inorganic Crystal Structure Database (ICSD). The EMD is a reliable numeric measure of chemical similarity that can be incorporated into automated workflows for a range of ML techniques. We have found that with no supervision the use of this metric gives a distinct partitioning of binary compounds into clear trends and families of chemical property, with future applications for nearest neighbor search queries in chemical database retrieval systems and supervised ML techniques.


2018 ◽  
Vol 10 (12) ◽  
pp. 534
Author(s):  
Janilson Pinheiro de Assis ◽  
Roberto Pequeno de Sousa ◽  
Ben Deivide de Oliveira Batista ◽  
Paulo César Ferreira Linhares ◽  
Eudes de Almeida Cardoso ◽  
...  

We fitted the following seven distribution probabilities to the data of monthly average temperature in Mossoró, northeastern Brazil: Normal, Log-Normal, Beta, Gamma, Log-Pearson (Type III), Gumbel, and Weibull. To assess the goodness of fit the empirical distributions to the theoretical distribution, we applied the tests of Kolmogorov-Smirnov, Chi-square, Cramer-von Mises, Anderson-Darling, Kuiper, and Logarithm of Maximum Likelihood, at 10% of probability. The temperature series were obtained from 1970 to 2007. The Normal distribution provided the best fit to the historical series of average monthly temperature. Although the Kolmogorov-Smirnov test showed a very high level of approval, which generated some uncertainty regarding the test criteria, it is the more recommended to studies with approximately symmetric data and small series.


Sign in / Sign up

Export Citation Format

Share Document