scholarly journals Redefining the Graph Edit Distance

2021 ◽  
Vol 2 (6) ◽  
Author(s):  
Francesc Serratosa

AbstractGraph edit distance has been used since 1983 to compare objects in machine learning when these objects are represented by attributed graphs instead of vectors. In these cases, the graph edit distance is usually applied to deduce a distance between attributed graphs. This distance is defined as the minimum amount of edit operations (deletion, insertion and substitution of nodes and edges) needed to transform a graph into another. Since now, it has been stated that the distance properties have to be applied [(1) non-negativity (2) symmetry (3) identity and (4) triangle inequality] to the involved edit operations in the process of computing the graph edit distance to make the graph edit distance a metric. In this paper, we show that there is no need to impose the triangle inequality in each edit operation. This is an important finding since in pattern recognition applications, the classification ratio usually maximizes in the edit operation combinations (deletion, insertion and substitution of nodes and edges) that the triangle inequality is not fulfilled.

Author(s):  
David B. Blumenthal ◽  
Johann Gamper ◽  
Sébastien Bougleux ◽  
Luc Brun

The graph edit distance (GED) is a flexible distance measure which is widely used for inexact graph matching. Since its exact computation is [Formula: see text]-hard, heuristics are used in practice. A popular approach is to obtain upper bounds for GED via transformations to the linear sum assignment problem with error-correction (LSAPE). Typically, local structures and distances between them are employed for carrying out this transformation, but recently also machine learning techniques have been used. In this paper, we formally define a unifying framework LSAPE-GED for transformations from GED to LSAPE. We also introduce rings, a new kind of local structures designed for graphs where most information resides in the topology rather than in the node labels. Furthermore, we propose two new ring-based heuristics RING and RING-ML, which instantiate LSAPE-GED using the traditional and the machine learning-based approach for transforming GED to LSAPE, respectively. Extensive experiments show that using rings for upper bounding GED significantly improves the state of the art on datasets where most information resides in the graphs’ topologies. This closes the gap between fast but rather inaccurate LSAPE-based heuristics and more accurate but significantly slower GED algorithms based on local search.


2019 ◽  
Vol 4 (1) ◽  
Author(s):  
Sinan G. Aksoy ◽  
Kathleen E. Nowak ◽  
Emilie Purvine ◽  
Stephen J. Young

Abstract Similarity measures are used extensively in machine learning and data science algorithms. The newly proposed graph Relative Hausdorff (RH) distance is a lightweight yet nuanced similarity measure for quantifying the closeness of two graphs. In this work we study the effectiveness of RH distance as a tool for detecting anomalies in time-evolving graph sequences. We apply RH to cyber data with given red team events, as well to synthetically generated sequences of graphs with planted attacks. In our experiments, the performance of RH distance is at times comparable, and sometimes superior, to graph edit distance in detecting anomalous phenomena. Our results suggest that in appropriate contexts, RH distance has advantages over more computationally intensive similarity measures.


2021 ◽  
Vol 22 (23) ◽  
pp. 12751
Author(s):  
Elena Rica ◽  
Susana Álvarez ◽  
Francesc Serratosa

Chemical compounds can be represented as attributed graphs. An attributed graph is a mathematical model of an object composed of two types of representations: nodes and edges. Nodes are individual components, and edges are relations between these components. In this case, pharmacophore-type node descriptions are represented by nodes and chemical bounds by edges. If we want to obtain the bioactivity dissimilarity between two chemical compounds, a distance between attributed graphs can be used. The Graph Edit Distance allows computing this distance, and it is defined as the cost of transforming one graph into another. Nevertheless, to define this dissimilarity, the transformation cost must be properly tuned. The aim of this paper is to analyse the structural-based screening methods to verify the quality of the Harper transformation costs proposal and to present an algorithm to learn these transformation costs such that the bioactivity dissimilarity is properly defined in a ligand-based virtual screening application. The goodness of the dissimilarity is represented by the classification accuracy. Six publicly available datasets—CAPST, DUD-E, GLL&GDD, NRLiSt-BDB, MUV and ULS-UDS—have been used to validate our methodology and show that with our learned costs, we obtain the highest ratios in identifying the bioactivity similarity in a structurally diverse group of molecules.


Author(s):  
Francesc Serratosa

This paper presents a methodology for generating pairs of attributed graphs with a lower and upper- bounded graph edit distance (GED). It is independent of the type of attributes on nodes and edges. The algorithm is composed of three steps: randomly generating a graph, generating another graph as a sub-graph of the first, and adding structural and semantic noise to both. These graphs, together with their bounded distances, can be used to manufacture synthetic databases of large graphs. The exact GED between large graphs cannot be obtained for runtime reasons since it has to be computed through an optimal algorithm with an exponential computational cost. Through this database, we can test the behavior of the known or new sub-optimal error-tolerant graph-matching algorithms against a lower and an upper bound GED on large graphs, even though we do not have the true distance. It is not clear how the error induced by the use of sub-optimal algorithms grows with problem size. Thus, with this methodology, we can generate graph databases and analyze if the current assumption that we can extrapolate algorithms’ behavior from matching small graphs to large graphs is correct or not. We also show that with some restrictions, the methodology returns the optimal GED in a quadratic time and that it can also be used to generate graph databases to test exact sub-graph isomorphism algorithms.


2020 ◽  
Vol 15 ◽  
Author(s):  
Elham Shamsara ◽  
Sara Saffar Soflaei ◽  
Mohammad Tajfard ◽  
Ivan Yamshchikov ◽  
Habibollah Esmaili ◽  
...  

Background: Coronary artery disease (CAD) is an important cause of mortality and morbidity globally. Objective : The early prediction of the CAD would be valuable in identifying individuals at risk, and in focusing resources on its prevention. In this paper, we aimed to establish a diagnostic model to predict CAD by using three approaches of ANN (pattern recognition-ANN, LVQ-ANN, and competitive ANN). Methods: One promising method for early prediction of disease based on risk factors is machine learning. Among different machine learning algorithms, the artificial neural network (ANN) algo-rithms have been applied widely in medicine and a variety of real-world classifications. ANN is a non-linear computational model, that is inspired by the human brain to analyze and process complex datasets. Results: Different methods of ANN that are investigated in this paper indicates in both pattern recognition ANN and LVQ-ANN methods, the predictions of Angiography+ class have high accuracy. Moreover, in CNN the correlations between the individuals in cluster ”c” with the class of Angiography+ is strongly high. This accuracy indicates the significant difference among some of the input features in Angiography+ class and the other two output classes. A comparison among the chosen weights in these three methods in separating control class and Angiography+ shows that hs-CRP, FSG, and WBC are the most substantial excitatory weights in recognizing the Angiography+ individuals although, HDL-C and MCH are determined as inhibitory weights. Furthermore, the effect of decomposition of a multi-class problem to a set of binary classes and random sampling on the accuracy of the diagnostic model is investigated. Conclusion : This study confirms that pattern recognition-ANN had the most accuracy of performance among different methods of ANN. That’s due to the back-propagation procedure of the process in which the network classify input variables based on labeled classes. The results of binarization show that decomposition of the multi-class set to binary sets could achieve higher accuracy.


Diagnostics ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 642
Author(s):  
Yi-Da Wu ◽  
Ruey-Kai Sheu ◽  
Chih-Wei Chung ◽  
Yen-Ching Wu ◽  
Chiao-Chi Ou ◽  
...  

Background: Antinuclear antibody pattern recognition is vital for autoimmune disease diagnosis but labor-intensive for manual interpretation. To develop an automated pattern recognition system, we established machine learning models based on the International Consensus on Antinuclear Antibody Patterns (ICAP) at a competent level, mixed patterns recognition, and evaluated their consistency with human reading. Methods: 51,694 human epithelial cells (HEp-2) cell images with patterns assigned by experienced medical technologists collected in a medical center were used to train six machine learning algorithms and were compared by their performance. Next, we choose the best performing model to test the consistency with five experienced readers and two beginners. Results: The mean F1 score in each classification of the best performing model was 0.86 evaluated by Testing Data 1. For the inter-observer agreement test on Testing Data 2, the average agreement was 0.849 (?) among five experienced readers, 0.844 between the best performing model and experienced readers, 0.528 between experienced readers and beginners. The results indicate that the proposed model outperformed beginners and achieved an excellent agreement with experienced readers. Conclusions: This study demonstrated that the developed model could reach an excellent agreement with experienced human readers using machine learning methods.


Sign in / Sign up

Export Citation Format

Share Document