Redefining the Graph Edit Distance

Francesc Serratosa

doi:10.1007/s42979-021-00792-5

Redefining the Graph Edit Distance

SN Computer Science ◽

10.1007/s42979-021-00792-5 ◽

2021 ◽

Vol 2 (6) ◽

Author(s):

Francesc Serratosa

Keyword(s):

Machine Learning ◽

Pattern Recognition ◽

Triangle Inequality ◽

Edit Distance ◽

Minimum Amount ◽

Graph Edit Distance ◽

Edit Operation ◽

Attributed Graphs ◽

Distance Properties

AbstractGraph edit distance has been used since 1983 to compare objects in machine learning when these objects are represented by attributed graphs instead of vectors. In these cases, the graph edit distance is usually applied to deduce a distance between attributed graphs. This distance is defined as the minimum amount of edit operations (deletion, insertion and substitution of nodes and edges) needed to transform a graph into another. Since now, it has been stated that the distance properties have to be applied [(1) non-negativity (2) symmetry (3) identity and (4) triangle inequality] to the involved edit operations in the process of computing the graph edit distance to make the graph edit distance a metric. In this paper, we show that there is no need to impose the triangle inequality in each edit operation. This is an important finding since in pattern recognition applications, the classification ratio usually maximizes in the edit operation combinations (deletion, insertion and substitution of nodes and edges) that the triangle inequality is not fulfilled.

Download Full-text

Upper Bounding Graph Edit Distance Based on Rings and Machine Learning

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421510083 ◽

2021 ◽

pp. 2151008 ◽

Cited By ~ 1

Author(s):

David B. Blumenthal ◽

Johann Gamper ◽

Sébastien Bougleux ◽

Luc Brun

Keyword(s):

Machine Learning ◽

Edit Distance ◽

Distance Measure ◽

Graph Matching ◽

Machine Learning Techniques ◽

Graph Edit Distance ◽

Local Structures ◽

Learning Techniques ◽

Inexact Graph Matching ◽

Node Labels

The graph edit distance (GED) is a flexible distance measure which is widely used for inexact graph matching. Since its exact computation is [Formula: see text]-hard, heuristics are used in practice. A popular approach is to obtain upper bounds for GED via transformations to the linear sum assignment problem with error-correction (LSAPE). Typically, local structures and distances between them are employed for carrying out this transformation, but recently also machine learning techniques have been used. In this paper, we formally define a unifying framework LSAPE-GED for transformations from GED to LSAPE. We also introduce rings, a new kind of local structures designed for graphs where most information resides in the topology rather than in the node labels. Furthermore, we propose two new ring-based heuristics RING and RING-ML, which instantiate LSAPE-GED using the traditional and the machine learning-based approach for transforming GED to LSAPE, respectively. Extensive experiments show that using rings for upper bounding GED significantly improves the state of the art on datasets where most information resides in the graphs’ topologies. This closes the gap between fast but rather inaccurate LSAPE-based heuristics and more accurate but significantly slower GED algorithms based on local search.

Download Full-text

Relative Hausdorff distance for network analysis

Applied Network Science ◽

10.1007/s41109-019-0198-0 ◽

2019 ◽

Vol 4 (1) ◽

Cited By ~ 1

Author(s):

Sinan G. Aksoy ◽

Kathleen E. Nowak ◽

Emilie Purvine ◽

Stephen J. Young

Keyword(s):

Machine Learning ◽

Network Analysis ◽

Similarity Measure ◽

Hausdorff Distance ◽

Data Science ◽

Edit Distance ◽

Similarity Measures ◽

Graph Edit Distance ◽

Computationally Intensive

Abstract Similarity measures are used extensively in machine learning and data science algorithms. The newly proposed graph Relative Hausdorff (RH) distance is a lightweight yet nuanced similarity measure for quantifying the closeness of two graphs. In this work we study the effectiveness of RH distance as a tool for detecting anomalies in time-evolving graph sequences. We apply RH to cyber data with given red team events, as well to synthetically generated sequences of graphs with planted attacks. In our experiments, the performance of RH distance is at times comparable, and sometimes superior, to graph edit distance in detecting anomalous phenomena. Our results suggest that in appropriate contexts, RH distance has advantages over more computationally intensive similarity measures.

Download Full-text

Ligand-Based Virtual Screening Based on the Graph Edit Distance

International Journal of Molecular Sciences ◽

10.3390/ijms222312751 ◽

2021 ◽

Vol 22 (23) ◽

pp. 12751

Author(s):

Elena Rica ◽

Susana Álvarez ◽

Francesc Serratosa

Keyword(s):

Virtual Screening ◽

Edit Distance ◽

Chemical Compounds ◽

Screening Methods ◽

Graph Edit Distance ◽

Attributed Graph ◽

Attributed Graphs ◽

Type Node ◽

The Cost

Chemical compounds can be represented as attributed graphs. An attributed graph is a mathematical model of an object composed of two types of representations: nodes and edges. Nodes are individual components, and edges are relations between these components. In this case, pharmacophore-type node descriptions are represented by nodes and chemical bounds by edges. If we want to obtain the bioactivity dissimilarity between two chemical compounds, a distance between attributed graphs can be used. The Graph Edit Distance allows computing this distance, and it is defined as the cost of transforming one graph into another. Nevertheless, to define this dissimilarity, the transformation cost must be properly tuned. The aim of this paper is to analyse the structural-based screening methods to verify the quality of the Harper transformation costs proposal and to present an algorithm to learn these transformation costs such that the bioactivity dissimilarity is properly defined in a ligand-based virtual screening application. The goodness of the dissimilarity is represented by the classification accuracy. Six publicly available datasets—CAPST, DUD-E, GLL&GDD, NRLiSt-BDB, MUV and ULS-UDS—have been used to validate our methodology and show that with our learned costs, we obtain the highest ratios in identifying the bioactivity similarity in a structurally diverse group of molecules.

Download Full-text

An Exact Graph Edit Distance Algorithm for Solving Pattern Recognition Problems

Proceedings of the International Conference on Pattern Recognition Applications and Methods ◽

10.5220/0005209202710278 ◽

2015 ◽

Cited By ~ 21

Author(s):

Zeina Abu-Aisheh ◽

Romain Raveaux ◽

Jean-Yves Ramel ◽

Patrick Martineau

Keyword(s):

Pattern Recognition ◽

Edit Distance ◽

Graph Edit Distance

Download Full-text

Structural Pattern Recognition with Graph Edit Distance

10.1007/978-3-319-27252-8 ◽

2015 ◽

Cited By ~ 26

Author(s):

Kaspar Riesen

Keyword(s):

Pattern Recognition ◽

Edit Distance ◽

Graph Edit Distance ◽

Structural Pattern ◽

Structural Pattern Recognition

Download Full-text

A Methodology to Generate Attributed Graphs with a Bounded Graph Edit Distance for Graph-Matching Testing

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001418500386 ◽

2018 ◽

Vol 32 (11) ◽

pp. 1850038 ◽

Cited By ~ 1

Author(s):

Francesc Serratosa

Keyword(s):

Edit Distance ◽

Graph Matching ◽

Optimal Algorithm ◽

Computational Cost ◽

Graph Databases ◽

Graph Edit Distance ◽

Problem Size ◽

Large Graphs ◽

Attributed Graphs ◽

Bounded Graph

This paper presents a methodology for generating pairs of attributed graphs with a lower and upper- bounded graph edit distance (GED). It is independent of the type of attributes on nodes and edges. The algorithm is composed of three steps: randomly generating a graph, generating another graph as a sub-graph of the first, and adding structural and semantic noise to both. These graphs, together with their bounded distances, can be used to manufacture synthetic databases of large graphs. The exact GED between large graphs cannot be obtained for runtime reasons since it has to be computed through an optimal algorithm with an exponential computational cost. Through this database, we can test the behavior of the known or new sub-optimal error-tolerant graph-matching algorithms against a lower and an upper bound GED on large graphs, even though we do not have the true distance. It is not clear how the error induced by the use of sub-optimal algorithms grows with problem size. Thus, with this methodology, we can generate graph databases and analyze if the current assumption that we can extrapolate algorithms’ behavior from matching small graphs to large graphs is correct or not. We also show that with some restrictions, the methodology returns the optimal GED in a quadratic time and that it can also be used to generate graph databases to test exact sub-graph isomorphism algorithms.

Download Full-text

Artificial neural network models for coronary artery disease

Current Bioinformatics ◽

10.2174/1574893615666200214102837 ◽

2020 ◽

Vol 15 ◽

Author(s):

Elham Shamsara ◽

Sara Saffar Soflaei ◽

Mohammad Tajfard ◽

Ivan Yamshchikov ◽

Habibollah Esmaili ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Coronary Artery Disease ◽

Pattern Recognition ◽

Artificial Neural Network ◽

Coronary Artery ◽

Diagnostic Model ◽

Early Prediction ◽

Artificial Neural ◽

Artery Disease

Background: Coronary artery disease (CAD) is an important cause of mortality and morbidity globally. Objective : The early prediction of the CAD would be valuable in identifying individuals at risk, and in focusing resources on its prevention. In this paper, we aimed to establish a diagnostic model to predict CAD by using three approaches of ANN (pattern recognition-ANN, LVQ-ANN, and competitive ANN). Methods: One promising method for early prediction of disease based on risk factors is machine learning. Among diﬀerent machine learning algorithms, the artificial neural network (ANN) algo-rithms have been applied widely in medicine and a variety of real-world classifications. ANN is a non-linear computational model, that is inspired by the human brain to analyze and process complex datasets. Results: Diﬀerent methods of ANN that are investigated in this paper indicates in both pattern recognition ANN and LVQ-ANN methods, the predictions of Angiography+ class have high accuracy. Moreover, in CNN the correlations between the individuals in cluster ”c” with the class of Angiography+ is strongly high. This accuracy indicates the significant diﬀerence among some of the input features in Angiography+ class and the other two output classes. A comparison among the chosen weights in these three methods in separating control class and Angiography+ shows that hs-CRP, FSG, and WBC are the most substantial excitatory weights in recognizing the Angiography+ individuals although, HDL-C and MCH are determined as inhibitory weights. Furthermore, the effect of decomposition of a multi-class problem to a set of binary classes and random sampling on the accuracy of the diagnostic model is investigated. Conclusion : This study confirms that pattern recognition-ANN had the most accuracy of performance among diﬀerent methods of ANN. That’s due to the back-propagation procedure of the process in which the network classify input variables based on labeled classes. The results of binarization show that decomposition of the multi-class set to binary sets could achieve higher accuracy.

Download Full-text

On the unification of the graph edit distance and graph matching problems

Pattern Recognition Letters ◽

10.1016/j.patrec.2021.02.014 ◽

2021 ◽

Vol 145 ◽

pp. 240-246

Author(s):

Romain Raveaux

Keyword(s):

Edit Distance ◽

Graph Matching ◽

Graph Edit Distance ◽

Matching Problems

Download Full-text

Pressure pattern recognition in buildings using an unsupervised machine-learning algorithm

Journal of Wind Engineering and Industrial Aerodynamics ◽

10.1016/j.jweia.2021.104629 ◽

2021 ◽

Vol 214 ◽

pp. 104629

Author(s):

Bubryur Kim ◽

N. Yuvaraj ◽

K.T. Tse ◽

Dong-Eun Lee ◽

Gang Hu

Keyword(s):

Machine Learning ◽

Pattern Recognition ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Unsupervised Machine Learning ◽

Pressure Pattern

Download Full-text

Application of Supervised Machine Learning to Recognize Competent Level and Mixed Antinuclear Antibody Patterns Based on ICAP International Consensus

Diagnostics ◽

10.3390/diagnostics11040642 ◽

2021 ◽

Vol 11 (4) ◽

pp. 642

Author(s):

Yi-Da Wu ◽

Ruey-Kai Sheu ◽

Chih-Wei Chung ◽

Yen-Ching Wu ◽

Chiao-Chi Ou ◽

...

Keyword(s):

Machine Learning ◽

Pattern Recognition ◽

Excellent Agreement ◽

Antinuclear Antibody ◽

Disease Diagnosis ◽

Machine Learning Algorithms ◽

Observer Agreement ◽

Supervised Machine Learning ◽

International Consensus ◽

Testing Data

Background: Antinuclear antibody pattern recognition is vital for autoimmune disease diagnosis but labor-intensive for manual interpretation. To develop an automated pattern recognition system, we established machine learning models based on the International Consensus on Antinuclear Antibody Patterns (ICAP) at a competent level, mixed patterns recognition, and evaluated their consistency with human reading. Methods: 51,694 human epithelial cells (HEp-2) cell images with patterns assigned by experienced medical technologists collected in a medical center were used to train six machine learning algorithms and were compared by their performance. Next, we choose the best performing model to test the consistency with five experienced readers and two beginners. Results: The mean F1 score in each classification of the best performing model was 0.86 evaluated by Testing Data 1. For the inter-observer agreement test on Testing Data 2, the average agreement was 0.849 (?) among five experienced readers, 0.844 between the best performing model and experienced readers, 0.528 between experienced readers and beginners. The results indicate that the proposed model outperformed beginners and achieved an excellent agreement with experienced readers. Conclusions: This study demonstrated that the developed model could reach an excellent agreement with experienced human readers using machine learning methods.

Download Full-text