scholarly journals Graph diffusion distance: Properties and efficient computation

PLoS ONE ◽  
2021 ◽  
Vol 16 (4) ◽  
pp. e0249624
Author(s):  
C. B. Scott ◽  
Eric Mjolsness

We define a new family of similarity and distance measures on graphs, and explore their theoretical properties in comparison to conventional distance metrics. These measures are defined by the solution(s) to an optimization problem which attempts find a map minimizing the discrepancy between two graph Laplacian exponential matrices, under norm-preserving and sparsity constraints. Variants of the distance metric are introduced to consider such optimized maps under sparsity constraints as well as fixed time-scaling between the two Laplacians. The objective function of this optimization is multimodal and has discontinuous slope, and is hence difficult for univariate optimizers to solve. We demonstrate a novel procedure for efficiently calculating these optima for two of our distance measure variants. We present numerical experiments demonstrating that (a) upper bounds of our distance metrics can be used to distinguish between lineages of related graphs; (b) our procedure is faster at finding the required optima, by as much as a factor of 103; and (c) the upper bounds satisfy the triangle inequality exactly under some assumptions and approximately under others. We also derive an upper bound for the distance between two graph products, in terms of the distance between the two pairs of factors. Additionally, we present several possible applications, including the construction of infinite “graph limits” by means of Cauchy sequences of graphs related to one another by our distance measure.

Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 436
Author(s):  
Ruirui Zhao ◽  
Minxia Luo ◽  
Shenggang Li

Picture fuzzy sets, which are the extension of intuitionistic fuzzy sets, can deal with inconsistent information better in practical applications. A distance measure is an important mathematical tool to calculate the difference degree between picture fuzzy sets. Although some distance measures of picture fuzzy sets have been constructed, there are some unreasonable and counterintuitive cases. The main reason is that the existing distance measures do not or seldom consider the refusal degree of picture fuzzy sets. In order to solve these unreasonable and counterintuitive cases, in this paper, we propose a dynamic distance measure of picture fuzzy sets based on a picture fuzzy point operator. Through a numerical comparison and multi-criteria decision-making problems, we show that the proposed distance measure is reasonable and effective.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Shumpei Haginoya ◽  
Aiko Hanayama ◽  
Tamae Koike

Purpose The purpose of this paper was to compare the accuracy of linking crimes using geographical proximity between three distance measures: Euclidean (distance measured by the length of a straight line between two locations), Manhattan (distance obtained by summing north-south distance and east-west distance) and the shortest route distances. Design/methodology/approach A total of 194 cases committed by 97 serial residential burglars in Aomori Prefecture in Japan between 2004 and 2015 were used in the present study. The Mann–Whitney U test was used to compare linked (two offenses committed by the same offender) and unlinked (two offenses committed by different offenders) pairs for each distance measure. Discrimination accuracy between linked and unlinked crime pairs was evaluated using area under the receiver operating characteristic curve (AUC). Findings The Mann–Whitney U test showed that the distances of the linked pairs were significantly shorter than those of the unlinked pairs for all distance measures. Comparison of the AUCs showed that the shortest route distance achieved significantly higher accuracy compared with the Euclidean distance, whereas there was no significant difference between the Euclidean and the Manhattan distance or between the Manhattan and the shortest route distance. These findings give partial support to the idea that distance measures taking the impact of environmental factors into consideration might be able to identify a crime series more accurately than Euclidean distances. Research limitations/implications Although the results suggested a difference between the Euclidean and the shortest route distance, it was small, and all distance measures resulted in outstanding AUC values, probably because of the ceiling effects. Further investigation that makes the same comparison in a narrower area is needed to avoid this potential inflation of discrimination accuracy. Practical implications The shortest route distance might contribute to improving the accuracy of crime linkage based on geographical proximity. However, further investigation is needed to recommend using the shortest route distance in practice. Given that the targeted area in the present study was relatively large, the findings may contribute especially to improve the accuracy of proactive comparative case analysis for estimating the whole picture of the distribution of serial crimes in the region by selecting more effective distance measure. Social implications Implications to improve the accuracy in linking crimes may contribute to assisting crime investigations and the earlier arrest of offenders. Originality/value The results of the present study provide an initial indication of the efficacy of using distance measures taking environmental factors into account.


2020 ◽  
Vol 34 (04) ◽  
pp. 5444-5453
Author(s):  
Edward Raff ◽  
Charles Nicholas ◽  
Mark McLean

Prior work inspired by compression algorithms has described how the Burrows Wheeler Transform can be used to create a distance measure for bioinformatics problems. We describe issues with this approach that were not widely known, and introduce our new Burrows Wheeler Markov Distance (BWMD) as an alternative. The BWMD avoids the shortcomings of earlier efforts, and allows us to tackle problems in variable length DNA sequence clustering. BWMD is also more adaptable to other domains, which we demonstrate on malware classification tasks. Unlike other compression-based distance metrics known to us, BWMD works by embedding sequences into a fixed-length feature vector. This allows us to provide significantly improved clustering performance on larger malware corpora, a weakness of prior methods.


Author(s):  
Liguo Fei ◽  
Yuqiang Feng

Belief function has always played an indispensable role in modeling cognitive uncertainty. As an inherited version, the theory of D numbers has been proposed and developed in a more efficient and robust way. Within the framework of D number theory, two more generalized properties are extended: (1) the elements in the frame of discernment (FOD) of D numbers do not required to be mutually exclusive strictly; (2) the completeness constraint is released. The investigation shows that the distance function is very significant in measuring the difference between two D numbers, especially in information fusion and decision. Modeling methods of uncertainty that incorporate D numbers have become increasingly popular, however, very few approaches have tackled the challenges of distance metrics. In this study, the distance measure of two D numbers is presented in cases, including complete information, incomplete information, and non-exclusive elements


2019 ◽  
Vol 0 (0) ◽  
pp. 1-29 ◽  
Author(s):  
Víctor G. Alfaro-García ◽  
José M. Merigó ◽  
Leobardo Plata-Pérez ◽  
Gerardo G. Alfaro-Calderón ◽  
Anna M. Gil-Lafuente

This paper introduces the induced ordered weighted logarithmic averaging IOWLAD and multiregion induced ordered weighted logarithmic averaging MR-IOWLAD operators. The distinctive characteristic of these operators lies in the notion of distance measures combined with the complex reordering mechanism of inducing variables and the properties of the logarithmic averaging operators. The main advantage of MR-IOWLAD operators is their design, which is specifically thought to aid in decision-making when a set of diverse regions with different properties must be considered. Moreover, the induced weighting vector and the distance measure mechanisms of the operator allow for the wider modeling of problems, including heterogeneous information and the complex attitudinal character of experts, when aiming for an ideal scenario. Along with analyzing the main properties of the IOWLAD operators, their families and specific cases, we also introduce some extensions, such as the induced generalized ordered weighted averaging IGOWLAD operator and Choquet integrals. We present the induced Choquet logarithmic distance averaging ICLD operator and the generalized induced Choquet logarithmic distance averaging IGCLD operator. Finally, an illustrative example is proposed, including real-world information retrieved from the United Nations World Statistics for global regions.


SPE Journal ◽  
2021 ◽  
pp. 1-25
Author(s):  
Chang Gao ◽  
Juliana Y. Leung

Summary The steam-assisted gravity drainage (SAGD) recovery process is strongly impacted by the spatial distributions of heterogeneous shale barriers. Though detailed compositional flow simulators are available for SAGD recovery performance evaluation, the simulation process is usually quite computationally demanding, rendering their use over a large number of reservoir models for assessing the impacts of heterogeneity (uncertainties) to be impractical. In recent years, data-driven proxies have been widely proposed to reduce the computational effort; nevertheless, the proxy must be trained using a large data set consisting of many flow simulation cases that are ideally spanning the model parameter spaces. The question remains: is there a more efficient way to screen a large number of heterogeneous SAGD models? Such techniques could help to construct a training data set with less redundancy; they can also be used to quickly identify a subset of heterogeneous models for detailed flow simulation. In this work, we formulated two particular distance measures, flow-based and static-based, to quantify the similarity among a set of 3D heterogeneous SAGD models. First, to formulate the flow-based distance measure, a physics-basedparticle-tracking model is used: Darcy’s law and energy balance are integrated to mimic the steam chamber expansion process; steam particles that are located at the edge of the chamber would release their energy to the surrounding cold bitumen, while detailed fluid displacements are not explicitly simulated. The steam chamber evolution is modeled, and a flow-based distance between two given reservoir models is defined as the difference in their chamber sizes over time. Second, to formulate the static-based distance, the Hausdorff distance (Hausdorff 1914) is used: it is often used in image processing to compare two images according to their corresponding spatial arrangement and shapes of various objects. A suite of 3D models is constructed using representative petrophysical properties and operating constraints extracted from several pads in Suncor Energy’s Firebag project. The computed distance measures are used to partition the models into different groups. To establish a baseline for comparison, flow simulations are performed on these models to predict the actual chamber evolution and production profiles. The grouping results according to the proposed flow- and static-based distance measures match reasonably well to those obtained from detailed flow simulations. Significant improvement in computational efficiency is achieved with the proposed techniques. They can be used to efficiently screen a large number of reservoir models and facilitate the clustering of these models into groups with distinct shale heterogeneity characteristics. It presents a significant potential to be integrated with other data-driven approaches for reducing the computational load typically associated with detailed flow simulations involving multiple heterogeneous reservoir realizations.


2019 ◽  
Vol 59 (4) ◽  
pp. 722-741 ◽  
Author(s):  
Paul Phillips ◽  
Nuno Antonio ◽  
Ana de Almeida ◽  
Luís Nunes

This study examines the relationship between distance measures and a Portuguese data set consisting of 34,622 online hotel reviews extracted from Booking.com and TripAdvisor written in Portuguese, Spanish, and English. Based on the country of origin of each review author, a geographic and a psychic distance measure is calculated for Portugal. Data and text mining analysis provides additional insights into online hotel ratings. The authors confirm that online travelers’ evaluations are multifaceted constructs displaying varying patterns of rating behavior among the traveler base. By investigating the contemporary relevance of geographic and psychic distance, a key finding of this study is that travelers with less distance both in terms of psychic and geographic distance give a lower rating score than travelers with greater distance. The inclusion of psychic and geographic distance is advocated as a salient aspect for future researchers and for those practitioners who wish to enhance hotel product and service features.


1993 ◽  
Vol 25 (01) ◽  
pp. 1-23 ◽  
Author(s):  
L. G. Hanin ◽  
S. T. Rachev ◽  
A. Yu. Yakovlev

Optimization problems in cancer radiation therapy are considered, with the efficiency functional defined as the difference between expected survival probabilities for normal and neoplastic tissues. Precise upper bounds of the efficiency functional over natural classes of cellular response functions are found. The ‘Lipschitz' upper bound gives rise to a new family of probability metrics. In the framework of the ‘m hit-one target' model of irradiated cell survival the problem of optimal fractionation of the given total dose into n fractions is treated. For m = 1, n arbitrary, and n = 1, 2, m arbitrary, complete solution is obtained. In other cases an approximation procedure is constructed. Stability of extremal values and upper bounds of the efficiency functional with respect to perturbation of radiosensitivity distributions for normal and tumor tissues is demonstrated.


Author(s):  
Semra Erpolat Taşabat ◽  
Tuğba Kıral Özkan

In this chapter, an alternative measure to Euclidean distance measurement is proposed which is used to calculate positive and negative ideal solutions in the traditional TOPSIS method. Lp Minkowski family and L1 family distance measures were used for this purpose. By taking the averages of the distance measurements in the Lq and L1 families, more general and accurate level units were tried to be obtained. Thus, it was shown that TOPSIS method can give different results according to the distance measure used. The importance of the distance measurement unit was emphasized to rank the alternatives correctly. The implementation and evaluation of the proposed method was carried out through the financial performance of the deposit bank operating in the Turkish Banking Sector. It was seen that the rankings of the alternatives changed according to the distance measurements used. By referring to the distance measurements that can be used in the TOPSIS method, it was shown that the rank of the alternatives can vary according to the preferred distance measure.


2018 ◽  
Vol 4 (1) ◽  
pp. 525-528
Author(s):  
Christian Bayer ◽  
Robin Seidel

AbstractMany machine learning algorithms depend on the choice of an appropriate similarity or distance measure. Comparing such measures in different domains and on diversely structured data is common, but often performed in regards of an algorithm to cluster or classify the data. In this study, data assessed by experts is analyzed instead. The data is taken from the database of the Federal Institute for Drugs and Medical Devices (BfArM) and represents free text incident reports. The Average Silhouette Width, a cluster density measure, is used to compare the distance measures’ ability to discriminate the data according to the experts’ assessments. The Euclidean distance and four distance measures derived from the Jaccard similarity, the Simple Matching similarity, the Cosine similarity and the Yule similarity are compared on four subsets of this database. The results show, that a better data preprocessing is necessary, possibly due to boilerplate texts being used to write incident reports. These results will also provide the basis to compare improvements by different methods of data preprocessing in the future.


Sign in / Sign up

Export Citation Format

Share Document