Graph diffusion distance: Properties and efficient computation

We define a new family of similarity and distance measures on graphs, and explore their theoretical properties in comparison to conventional distance metrics. These measures are defined by the solution(s) to an optimization problem which attempts find a map minimizing the discrepancy between two graph Laplacian exponential matrices, under norm-preserving and sparsity constraints. Variants of the distance metric are introduced to consider such optimized maps under sparsity constraints as well as fixed time-scaling between the two Laplacians. The objective function of this optimization is multimodal and has discontinuous slope, and is hence difficult for univariate optimizers to solve. We demonstrate a novel procedure for efficiently calculating these optima for two of our distance measure variants. We present numerical experiments demonstrating that (a) upper bounds of our distance metrics can be used to distinguish between lineages of related graphs; (b) our procedure is faster at finding the required optima, by as much as a factor of 103; and (c) the upper bounds satisfy the triangle inequality exactly under some assumptions and approximately under others. We also derive an upper bound for the distance between two graph products, in terms of the distance between the two pairs of factors. Additionally, we present several possible applications, including the construction of infinite “graph limits” by means of Cauchy sequences of graphs related to one another by our distance measure.

Download Full-text

A Dynamic Distance Measure of Picture Fuzzy Sets and Its Application

Symmetry ◽

10.3390/sym13030436 ◽

2021 ◽

Vol 13 (3) ◽

pp. 436

Author(s):

Ruirui Zhao ◽

Minxia Luo ◽

Shenggang Li

Keyword(s):

Fuzzy Sets ◽

Distance Measure ◽

Distance Measures ◽

Numerical Comparison ◽

Mathematical Tool ◽

Practical Applications ◽

Fuzzy Point ◽

Point Operator ◽

The Difference ◽

Picture Fuzzy Sets

Picture fuzzy sets, which are the extension of intuitionistic fuzzy sets, can deal with inconsistent information better in practical applications. A distance measure is an important mathematical tool to calculate the difference degree between picture fuzzy sets. Although some distance measures of picture fuzzy sets have been constructed, there are some unreasonable and counterintuitive cases. The main reason is that the existing distance measures do not or seldom consider the refusal degree of picture fuzzy sets. In order to solve these unreasonable and counterintuitive cases, in this paper, we propose a dynamic distance measure of picture fuzzy sets based on a picture fuzzy point operator. Through a numerical comparison and multi-criteria decision-making problems, we show that the proposed distance measure is reasonable and effective.

Download Full-text

Linkage analysis using geographical proximity: a test of the efficacy of distance measures

Journal of Criminological Research Policy and Practice ◽

10.1108/jcrpp-01-2020-0006 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Shumpei Haginoya ◽

Aiko Hanayama ◽

Tamae Koike

Keyword(s):

Environmental Factors ◽

Euclidean Distance ◽

Distance Measure ◽

Distance Measures ◽

Manhattan Distance ◽

Geographical Proximity ◽

Discrimination Accuracy ◽

Content Type ◽

Shortest Route ◽

The Impact

Purpose The purpose of this paper was to compare the accuracy of linking crimes using geographical proximity between three distance measures: Euclidean (distance measured by the length of a straight line between two locations), Manhattan (distance obtained by summing north-south distance and east-west distance) and the shortest route distances. Design/methodology/approach A total of 194 cases committed by 97 serial residential burglars in Aomori Prefecture in Japan between 2004 and 2015 were used in the present study. The Mann–Whitney U test was used to compare linked (two offenses committed by the same offender) and unlinked (two offenses committed by different offenders) pairs for each distance measure. Discrimination accuracy between linked and unlinked crime pairs was evaluated using area under the receiver operating characteristic curve (AUC). Findings The Mann–Whitney U test showed that the distances of the linked pairs were significantly shorter than those of the unlinked pairs for all distance measures. Comparison of the AUCs showed that the shortest route distance achieved significantly higher accuracy compared with the Euclidean distance, whereas there was no significant difference between the Euclidean and the Manhattan distance or between the Manhattan and the shortest route distance. These findings give partial support to the idea that distance measures taking the impact of environmental factors into consideration might be able to identify a crime series more accurately than Euclidean distances. Research limitations/implications Although the results suggested a difference between the Euclidean and the shortest route distance, it was small, and all distance measures resulted in outstanding AUC values, probably because of the ceiling effects. Further investigation that makes the same comparison in a narrower area is needed to avoid this potential inflation of discrimination accuracy. Practical implications The shortest route distance might contribute to improving the accuracy of crime linkage based on geographical proximity. However, further investigation is needed to recommend using the shortest route distance in practice. Given that the targeted area in the present study was relatively large, the findings may contribute especially to improve the accuracy of proactive comparative case analysis for estimating the whole picture of the distribution of serial crimes in the region by selecting more effective distance measure. Social implications Implications to improve the accuracy in linking crimes may contribute to assisting crime investigations and the earlier arrest of offenders. Originality/value The results of the present study provide an initial indication of the efficacy of using distance measures taking environmental factors into account.

Download Full-text

A New Burrows Wheeler Transform Markov Distance

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5994 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5444-5453

Author(s):

Edward Raff ◽

Charles Nicholas ◽

Mark McLean

Keyword(s):

Dna Sequence ◽

Distance Measure ◽

Feature Vector ◽

Distance Metrics ◽

Prior Work ◽

Compression Algorithms ◽

Fixed Length ◽

Malware Classification ◽

Classification Tasks ◽

Burrows Wheeler Transform

Prior work inspired by compression algorithms has described how the Burrows Wheeler Transform can be used to create a distance measure for bioinformatics problems. We describe issues with this approach that were not widely known, and introduce our new Burrows Wheeler Markov Distance (BWMD) as an alternative. The BWMD avoids the shortcomings of earlier efforts, and allows us to tackle problems in variable length DNA sequence clustering. BWMD is also more adaptable to other domains, which we demonstrate on malware classification tasks. Unlike other compression-based distance metrics known to us, BWMD works by embedding sequences into a fixed-length feature vector. This allows us to provide significantly improved clustering performance on larger malware corpora, a weakness of prior methods.

Download Full-text

Distance Metrics of D Numbers

Fuzzy Systems and Data Mining VI - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200694 ◽

2020 ◽

Author(s):

Liguo Fei ◽

Yuqiang Feng

Keyword(s):

Number Theory ◽

Incomplete Information ◽

Distance Function ◽

Distance Measure ◽

Complete Information ◽

Belief Function ◽

Distance Metrics ◽

Modeling Methods ◽

The Difference ◽

D Numbers

Belief function has always played an indispensable role in modeling cognitive uncertainty. As an inherited version, the theory of D numbers has been proposed and developed in a more efficient and robust way. Within the framework of D number theory, two more generalized properties are extended: (1) the elements in the frame of discernment (FOD) of D numbers do not required to be mutually exclusive strictly; (2) the completeness constraint is released. The investigation shows that the distance function is very significant in measuring the difference between two D numbers, especially in information fusion and decision. Modeling methods of uncertainty that incorporate D numbers have become increasingly popular, however, very few approaches have tackled the challenges of distance metrics. In this study, the distance measure of two D numbers is presented in cases, including complete information, incomplete information, and non-exclusive elements

Download Full-text

INDUCED AND LOGARITHMIC DISTANCES WITH MULTI-REGION AGGREGATION OPERATORS

Technological and Economic Development of Economy ◽

10.3846/tede.2019.9382 ◽

2019 ◽

Vol 0 (0) ◽

pp. 1-29 ◽

Cited By ~ 3

Author(s):

Víctor G. Alfaro-García ◽

José M. Merigó ◽

Leobardo Plata-Pérez ◽

Gerardo G. Alfaro-Calderón ◽

Anna M. Gil-Lafuente

Keyword(s):

Distance Measure ◽

Distance Measures ◽

Weighted Averaging ◽

Averaging Operators ◽

Heterogeneous Information ◽

Weighting Vector ◽

Choquet Integrals ◽

World Information ◽

Logarithmic Distance ◽

Region Aggregation

This paper introduces the induced ordered weighted logarithmic averaging IOWLAD and multiregion induced ordered weighted logarithmic averaging MR-IOWLAD operators. The distinctive characteristic of these operators lies in the notion of distance measures combined with the complex reordering mechanism of inducing variables and the properties of the logarithmic averaging operators. The main advantage of MR-IOWLAD operators is their design, which is specifically thought to aid in decision-making when a set of diverse regions with different properties must be considered. Moreover, the induced weighting vector and the distance measure mechanisms of the operator allow for the wider modeling of problems, including heterogeneous information and the complex attitudinal character of experts, when aiming for an ideal scenario. Along with analyzing the main properties of the IOWLAD operators, their families and specific cases, we also introduce some extensions, such as the induced generalized ordered weighted averaging IGOWLAD operator and Choquet integrals. We present the induced Choquet logarithmic distance averaging ICLD operator and the generalized induced Choquet logarithmic distance averaging IGCLD operator. Finally, an illustrative example is proposed, including real-world information retrieved from the United Nations World Statistics for global regions.

Download Full-text

Techniques for Fast Screening of 3D Heterogeneous Shale Barrier Configurations and Their Impacts on SAGD Chamber Development

SPE Journal ◽

10.2118/199906-pa ◽

2021 ◽

pp. 1-25

Author(s):

Chang Gao ◽

Juliana Y. Leung

Keyword(s):

Distance Measure ◽

Flow Simulation ◽

Training Data ◽

Distance Measures ◽

Data Driven ◽

Data Set ◽

Flow Simulations ◽

Steam Chamber ◽

Reservoir Models ◽

Tracking Model

Summary The steam-assisted gravity drainage (SAGD) recovery process is strongly impacted by the spatial distributions of heterogeneous shale barriers. Though detailed compositional flow simulators are available for SAGD recovery performance evaluation, the simulation process is usually quite computationally demanding, rendering their use over a large number of reservoir models for assessing the impacts of heterogeneity (uncertainties) to be impractical. In recent years, data-driven proxies have been widely proposed to reduce the computational effort; nevertheless, the proxy must be trained using a large data set consisting of many flow simulation cases that are ideally spanning the model parameter spaces. The question remains: is there a more efficient way to screen a large number of heterogeneous SAGD models? Such techniques could help to construct a training data set with less redundancy; they can also be used to quickly identify a subset of heterogeneous models for detailed flow simulation. In this work, we formulated two particular distance measures, flow-based and static-based, to quantify the similarity among a set of 3D heterogeneous SAGD models. First, to formulate the flow-based distance measure, a physics-basedparticle-tracking model is used: Darcy’s law and energy balance are integrated to mimic the steam chamber expansion process; steam particles that are located at the edge of the chamber would release their energy to the surrounding cold bitumen, while detailed fluid displacements are not explicitly simulated. The steam chamber evolution is modeled, and a flow-based distance between two given reservoir models is defined as the difference in their chamber sizes over time. Second, to formulate the static-based distance, the Hausdorff distance (Hausdorff 1914) is used: it is often used in image processing to compare two images according to their corresponding spatial arrangement and shapes of various objects. A suite of 3D models is constructed using representative petrophysical properties and operating constraints extracted from several pads in Suncor Energy’s Firebag project. The computed distance measures are used to partition the models into different groups. To establish a baseline for comparison, flow simulations are performed on these models to predict the actual chamber evolution and production profiles. The grouping results according to the proposed flow- and static-based distance measures match reasonably well to those obtained from detailed flow simulations. Significant improvement in computational efficiency is achieved with the proposed techniques. They can be used to efficiently screen a large number of reservoir models and facilitate the clustering of these models into groups with distinct shale heterogeneity characteristics. It presents a significant potential to be integrated with other data-driven approaches for reducing the computational load typically associated with detailed flow simulations involving multiple heterogeneous reservoir realizations.

Download Full-text

The Influence of Geographic and Psychic Distance on Online Hotel Ratings

Journal of Travel Research ◽

10.1177/0047287519858400 ◽

2019 ◽

Vol 59 (4) ◽

pp. 722-741 ◽

Cited By ~ 2

Author(s):

Paul Phillips ◽

Nuno Antonio ◽

Ana de Almeida ◽

Luís Nunes

Keyword(s):

Text Mining ◽

Distance Measure ◽

Country Of Origin ◽

Geographic Distance ◽

Distance Measures ◽

Review Author ◽

Psychic Distance ◽

Rating Score ◽

Data Set ◽

The Relationship

This study examines the relationship between distance measures and a Portuguese data set consisting of 34,622 online hotel reviews extracted from Booking.com and TripAdvisor written in Portuguese, Spanish, and English. Based on the country of origin of each review author, a geographic and a psychic distance measure is calculated for Portugal. Data and text mining analysis provides additional insights into online hotel ratings. The authors confirm that online travelers’ evaluations are multifaceted constructs displaying varying patterns of rating behavior among the traveler base. By investigating the contemporary relevance of geographic and psychic distance, a key finding of this study is that travelers with less distance both in terms of psychic and geographic distance give a lower rating score than travelers with greater distance. The inclusion of psychic and geographic distance is advocated as a salient aspect for future researchers and for those practitioners who wish to enhance hotel product and service features.

Download Full-text

On the optimal control of cancer radiotherapy for non-homogeneous cell populations

Advances in Applied Probability ◽

10.1017/s0001867800025143 ◽

1993 ◽

Vol 25 (01) ◽

pp. 1-23 ◽

Cited By ~ 1

Author(s):

L. G. Hanin ◽

S. T. Rachev ◽

A. Yu. Yakovlev

Keyword(s):

Cellular Response ◽

Optimization Problems ◽

Complete Solution ◽

Upper Bounds ◽

Approximation Procedure ◽

Probability Metrics ◽

New Family ◽

Homogeneous Cell ◽

The Difference ◽

Extremal Values

Optimization problems in cancer radiation therapy are considered, with the efficiency functional defined as the difference between expected survival probabilities for normal and neoplastic tissues. Precise upper bounds of the efficiency functional over natural classes of cellular response functions are found. The ‘Lipschitz' upper bound gives rise to a new family of probability metrics. In the framework of the ‘m hit-one target' model of irradiated cell survival the problem of optimal fractionation of the given total dose into n fractions is treated. For m = 1, n arbitrary, and n = 1, 2, m arbitrary, complete solution is obtained. In other cases an approximation procedure is constructed. Stability of extremal values and upper bounds of the efficiency functional with respect to perturbation of radiosensitivity distributions for normal and tumor tissues is demonstrated.

Download Full-text

Modified TOPSIS Method With Banking Case Study

Multi-Criteria Decision Analysis in Management - Advances in Logistics, Operations, and Management Science ◽

10.4018/978-1-7998-2216-5.ch009 ◽

2020 ◽

pp. 189-224

Author(s):

Semra Erpolat Taşabat ◽

Tuğba Kıral Özkan

Keyword(s):

Banking Sector ◽

Distance Measure ◽

Distance Measurement ◽

Distance Measures ◽

Measurement Unit ◽

Alternative Measure ◽

Topsis Method ◽

Distance Measurements ◽

Modified Topsis

In this chapter, an alternative measure to Euclidean distance measurement is proposed which is used to calculate positive and negative ideal solutions in the traditional TOPSIS method. Lp Minkowski family and L1 family distance measures were used for this purpose. By taking the averages of the distance measurements in the Lq and L1 families, more general and accurate level units were tried to be obtained. Thus, it was shown that TOPSIS method can give different results according to the distance measure used. The importance of the distance measurement unit was emphasized to rank the alternatives correctly. The implementation and evaluation of the proposed method was carried out through the financial performance of the deposit bank operating in the Turkish Banking Sector. It was seen that the rankings of the alternatives changed according to the distance measurements used. By referring to the distance measurements that can be used in the TOPSIS method, it was shown that the rank of the alternatives can vary according to the preferred distance measure.

Download Full-text

Comparing distance measures on assessed medical device incident data using Average Silhouette Width

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2018-0126 ◽

2018 ◽

Vol 4 (1) ◽

pp. 525-528

Author(s):

Christian Bayer ◽

Robin Seidel

Keyword(s):

Distance Measure ◽

Data Preprocessing ◽

Study Data ◽

Machine Learning Algorithms ◽

Distance Measures ◽

Free Text ◽

Silhouette Width ◽

Federal Institute ◽

Cluster Density ◽

Incident Reports

AbstractMany machine learning algorithms depend on the choice of an appropriate similarity or distance measure. Comparing such measures in different domains and on diversely structured data is common, but often performed in regards of an algorithm to cluster or classify the data. In this study, data assessed by experts is analyzed instead. The data is taken from the database of the Federal Institute for Drugs and Medical Devices (BfArM) and represents free text incident reports. The Average Silhouette Width, a cluster density measure, is used to compare the distance measures’ ability to discriminate the data according to the experts’ assessments. The Euclidean distance and four distance measures derived from the Jaccard similarity, the Simple Matching similarity, the Cosine similarity and the Yule similarity are compared on four subsets of this database. The results show, that a better data preprocessing is necessary, possibly due to boilerplate texts being used to write incident reports. These results will also provide the basis to compare improvements by different methods of data preprocessing in the future.

Download Full-text