Learning the Edit Costs of Graph Edit Distance Applied to Ligand-Based Virtual Screening

Background: Graph edit distance is a methodology used to solve error-tolerant graph matching. This methodology estimates a distance between two graphs by determining the minimum number of modifications required to transform one graph into the other. These modifications, known as edit operations, have an edit cost associated that has to be determined depending on the problem. Objective: This study focuses on the use of optimization techniques in order to learn the edit costs used when comparing graphs by means of the graph edit distance. Methods: Graphs represent reduced structural representations of molecules using pharmacophore-type node descriptions to encode the relevant molecular properties. This reduction technique is known as extended reduced graphs. The screening and statistical tools available on the ligand-based virtual screening benchmarking platform and the RDKit were used. Results: In the experiments, the graph edit distance using learned costs performed better or equally good than using predefined costs. This is exemplified with six publicly available datasets: DUD-E, MUV, GLL&GDD, CAPST, NRLiSt BDB, and ULS-UDS. Conclusion: This study shows that the graph edit distance along with learned edit costs is useful to identify bioactivity similarities in a structurally diverse group of molecules. Furthermore, the target-specific edit costs might provide useful structure-activity information for future drug-design efforts.

Download Full-text

Ligand-Based Virtual Screening Based on the Graph Edit Distance

International Journal of Molecular Sciences ◽

10.3390/ijms222312751 ◽

2021 ◽

Vol 22 (23) ◽

pp. 12751

Author(s):

Elena Rica ◽

Susana Álvarez ◽

Francesc Serratosa

Keyword(s):

Virtual Screening ◽

Edit Distance ◽

Chemical Compounds ◽

Screening Methods ◽

Graph Edit Distance ◽

Attributed Graph ◽

Attributed Graphs ◽

Type Node ◽

The Cost

Chemical compounds can be represented as attributed graphs. An attributed graph is a mathematical model of an object composed of two types of representations: nodes and edges. Nodes are individual components, and edges are relations between these components. In this case, pharmacophore-type node descriptions are represented by nodes and chemical bounds by edges. If we want to obtain the bioactivity dissimilarity between two chemical compounds, a distance between attributed graphs can be used. The Graph Edit Distance allows computing this distance, and it is defined as the cost of transforming one graph into another. Nevertheless, to define this dissimilarity, the transformation cost must be properly tuned. The aim of this paper is to analyse the structural-based screening methods to verify the quality of the Harper transformation costs proposal and to present an algorithm to learn these transformation costs such that the bioactivity dissimilarity is properly defined in a ligand-based virtual screening application. The goodness of the dissimilarity is represented by the classification accuracy. Six publicly available datasets—CAPST, DUD-E, GLL&GDD, NRLiSt-BDB, MUV and ULS-UDS—have been used to validate our methodology and show that with our learned costs, we obtain the highest ratios in identifying the bioactivity similarity in a structurally diverse group of molecules.

Download Full-text

On the unification of the graph edit distance and graph matching problems

Pattern Recognition Letters ◽

10.1016/j.patrec.2021.02.014 ◽

2021 ◽

Vol 145 ◽

pp. 240-246

Author(s):

Romain Raveaux

Keyword(s):

Edit Distance ◽

Graph Matching ◽

Graph Edit Distance ◽

Matching Problems

Download Full-text

Flexible Graph Matching and Graph Edit Distance Using Answer Set Programming

Practical Aspects of Declarative Languages - Lecture Notes in Computer Science ◽

10.1007/978-3-030-39197-3_2 ◽

2020 ◽

pp. 20-36

Author(s):

Sheung Chi Chan ◽

James Cheney

Keyword(s):

Edit Distance ◽

Graph Matching ◽

Answer Set Programming ◽

Graph Edit Distance ◽

Answer Set

Download Full-text

Ligand-Based Virtual Screening Using Graph Edit Distance as Molecular Similarity Measure

Journal of Chemical Information and Modeling ◽

10.1021/acs.jcim.8b00820 ◽

2019 ◽

Vol 59 (4) ◽

pp. 1410-1421 ◽

Cited By ~ 5

Author(s):

Carlos Garcia-Hernandez ◽

Alberto Fernández ◽

Francesc Serratosa

Keyword(s):

Virtual Screening ◽

Similarity Measure ◽

Edit Distance ◽

Molecular Similarity ◽

Graph Edit Distance

Download Full-text

Upper Bounding Graph Edit Distance Based on Rings and Machine Learning

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421510083 ◽

2021 ◽

pp. 2151008 ◽

Cited By ~ 1

Author(s):

David B. Blumenthal ◽

Johann Gamper ◽

Sébastien Bougleux ◽

Luc Brun

Keyword(s):

Machine Learning ◽

Edit Distance ◽

Distance Measure ◽

Graph Matching ◽

Machine Learning Techniques ◽

Graph Edit Distance ◽

Local Structures ◽

Learning Techniques ◽

Inexact Graph Matching ◽

Node Labels

The graph edit distance (GED) is a flexible distance measure which is widely used for inexact graph matching. Since its exact computation is [Formula: see text]-hard, heuristics are used in practice. A popular approach is to obtain upper bounds for GED via transformations to the linear sum assignment problem with error-correction (LSAPE). Typically, local structures and distances between them are employed for carrying out this transformation, but recently also machine learning techniques have been used. In this paper, we formally define a unifying framework LSAPE-GED for transformations from GED to LSAPE. We also introduce rings, a new kind of local structures designed for graphs where most information resides in the topology rather than in the node labels. Furthermore, we propose two new ring-based heuristics RING and RING-ML, which instantiate LSAPE-GED using the traditional and the machine learning-based approach for transforming GED to LSAPE, respectively. Extensive experiments show that using rings for upper bounding GED significantly improves the state of the art on datasets where most information resides in the graphs’ topologies. This closes the gap between fast but rather inaccurate LSAPE-based heuristics and more accurate but significantly slower GED algorithms based on local search.

Download Full-text

Graph Edit Distance: Moving from global to local structure to solve the graph-matching problem

Pattern Recognition Letters ◽

10.1016/j.patrec.2015.08.003 ◽

2015 ◽

Vol 65 ◽

pp. 204-210 ◽

Cited By ~ 24

Author(s):

Francesc Serratosa ◽

Xavier Cortés

Keyword(s):

Local Structure ◽

Edit Distance ◽

Graph Matching ◽

Graph Edit Distance ◽

Matching Problem

Download Full-text

Combining Bipartite Graph Matching and Beam Search for Graph Edit Distance Approximation

Advanced Information Systems Engineering - Lecture Notes in Computer Science ◽

10.1007/978-3-319-11656-3_11 ◽

2014 ◽

pp. 117-128 ◽

Cited By ~ 7

Author(s):

Kaspar Riesen ◽

Andreas Fischer ◽

Horst Bunke

Keyword(s):

Bipartite Graph ◽

Edit Distance ◽

Graph Matching ◽

Beam Search ◽

Graph Edit Distance ◽

Distance Approximation ◽

Bipartite Graph Matching

Download Full-text

A First Step Towards Exact Graph Edit Distance Using Bipartite Graph Matching

Graph-Based Representations in Pattern Recognition - Lecture Notes in Computer Science ◽

10.1007/978-3-319-18224-7_8 ◽

2015 ◽

pp. 77-86 ◽

Cited By ~ 6

Author(s):

Miquel Ferrer ◽

Francesc Serratosa ◽

Kaspar Riesen

Keyword(s):

Bipartite Graph ◽

Edit Distance ◽

Graph Matching ◽

Graph Edit Distance ◽

Bipartite Graph Matching

Download Full-text

Approximate graph edit distance computation by means of bipartite graph matching

Image and Vision Computing ◽

10.1016/j.imavis.2008.04.004 ◽

2009 ◽

Vol 27 (7) ◽

pp. 950-959 ◽

Cited By ~ 364

Author(s):

Kaspar Riesen ◽

Horst Bunke

Keyword(s):

Bipartite Graph ◽

Edit Distance ◽

Graph Matching ◽

Graph Edit Distance ◽

Distance Computation ◽

Bipartite Graph Matching

Download Full-text

Unsupervised co-segmentation of actions in motion capture data and videos

10.12681/eadd/47237 ◽

2019 ◽

Author(s):

Κωνσταντίνος Παπουτσάκης

Keyword(s):

Motion Capture ◽

Dynamic Time Warping ◽

Edit Distance ◽

Graph Matching ◽

Motion Capture Data ◽

Time Warping ◽

Graph Edit Distance ◽

Swarm Optimization ◽

Dynamic Time ◽

Common Action

Στην παρούσα διατριβή εστιάζουμε στο πρόβλημα της χρονικής συντμηματοποίησης δράσεων σε ακολουθίες πολυδιάστατων δεδομένων κίνησης (motion capture data) και σε ακολουθίες εικόνων (βίντεο). Δοσμένων δύο ακολουθιών δεδομένων που αναπαριστούν δράσεις/δραστηριότητες, στόχος είναι να εντοπίσουμε και να ορίσουμε τα χρονικά όρια για όλα τα ζεύγη υπο-ακολουθιών που αναπαριστούν μια κοινή δράση (common action or commonality), δηλαδή μια δράση που επαναλαμβάνεται ταυτόσημη ή με παρόμοιο τρόπο μεταξύ των ακολουθιών. Το εν λόγω πρόβλημα αποτελεί ένα σημαντικό ερευνητικό θέμα στις περιοχές της Αναγνώρισης Προτύπων και της Υπολογιστικής ΄Ορασης και παρά την ερευνητική προσπάθεια που έχει αφιερωθεί σε αυτό, δεν έχει επιλυθεί πλήρως. Η παρούσα διατριβή περιγράφει μια νέα αποδοτική, μη-εποπτευόμενη προσέγγιση η οποία δεν προϋποθέτει εκ των προτέρων γνώση και μοντέλα των δράσεων που εκτελούνται, ενώ υιοθετεί μια γενική και ευέλικτη μοντελοποίηση των δεδομένων εισόδου ως πολυδιάστατες χρονοσειρές. Θεωρούμε διαφορετικά σενάρια για τις ακολουθίες δράσεων που δημιουργούν ενδιαφέρουσες προκλήσεις ως προς την επίλυση του προβλήματος: (α) σε κάθε ακολουθία εμφανίζονται μία ή περισσότερες δράσεις, εκτελούμενες απο ένα ή περισσότερα υποκείμενα (άτομα ή αντικείμενα), (β) στη γενική περίπτωση, ο αριθμός των κοινών δράσεων μεταξύ δύο ακολουθιών θεωρείται άγνωστος, (γ) μια κοινή δράση μπορεί να εντοπιστεί σε οποιοδήποτε χρονικό τμήμα μιας ακολουθίας, (δ) τα τμήματα μιας κοινής δράσης μεταξύ δύο ακολουθιών ενδέχεται να έχουν διαφορετική διάρκεια, να περιλαμβάνουν κινήσεις διαφορετικής ταχύτητας και τρόπου/τεχνικής εκτέλεσης, (ε) οι δράσεις που εμφανίζονται στις ακολουθίες ενδέχεται να αναπαριστούν φυσικές κινήσεις ενός ή περισσότερων ανθρώπων ή αντικειμένων, καθώς επίσης και περίπλοκες αλληλεπιδράσεις ανθρώπων με αντικείμενα. Προτείνουμε δύο καινοτόμες μεθόδους για την επίλυση του προβλήματος της χρονικής συν-τμηματοποίησης δράσεων σε ζεύγη ακολουθιών δεδομένων. Η πρώτη μέθοδος επιτυγχάνει την ανίχνευση και συν-τμηματοποίηση των N σημαντικότερων κοινών δράσεων μεταξύ των υπο σύγκριση ακολουθιών δεδομένων, βασιζόμενη στην ελαχιστοποίηση συνάρτησης κόστους που εκφράζει το κόστος μη-γραμμικής χρονικής στοίχισης των υπο-ακολουθιών των κοινών δράσεων, χρησιμοποιώντας την μέθοδο Dynamic Time Warping (DTW). Η διαδικασία ταυτόχρονης αναζήτησης λύσεων (κοινών δράσεων) και ελαχιστοποίησής της συνάρτησης κόστους μοντελοποιείται ως ένα στοχαστικό πρόβλημα, το οποίο λύνεται βάσει της εξελικτικής μεθόδου βελτιστοποίησης Canonical Particle Swarm Optimization (PSO). Η δεύτερη μέθοδος βασίζεται στην μοντελοποίηση του προβλήματος της συν-τμηματοποίησης των N σημαντικότερων κοινών δράσεων, ως ένα πρόβλημα αναζήτησης σε γράφο Ο γράφος ορίζεται ως ο πίνακας (μήτρα) που περιλαμβάνει τις Ευκλείδειες αποστάσεις όλων των δυνατών ζευγών καρέ των ακολουθιών εικόνων, καθένα από τα οποία αναπαρίσταται ως ένα διάνυσμα χαρακτηριστικών. Γίνεται χρήση του αλγορίθμου Johnson’s για την αναζήτηση των συντομότερων μονοπατιών σε γράφο και κατ' επέκταση για την επίλυση του προβλήματος μας. Και οι δύο πρωτότυπες μεθοδολογίες υποβάλλονται σε εκτενείς πειραματικές διαδικασίες χρησιμοποιώντας πλήθος απο ζεύγη ακολουθιών εικόνων (βίντεο) ή ακολουθιών που αναπαριστούν 3Δ δεδομένα καταγραφής κίνησης, αναδεικνύοντας την αποτελεσματικότητάς τους σε σύγκριση με άλλες υφιστάμενες αποδοτικές μεθόδους. Επιπρόσθετα, βασιζόμενοι στην εύρωστη απόδοση των μεθόδων αυτών, αναπτύσσουμε μια νέα μέθοδο για την εκτίμηση της ομοιότητας μεταξύ δύο ακολουθιών δράσεων, που επίσης υποστηρίζει την εξαγωγή επιχειρημάτων που αιτιολογούν τον υπολογισμό αυτό. Η μέθοδος αυτή βασίζεται στην χρονική συν-τμηματοποίηση των ζευγών 3Δ τροχιών κίνησης των ανθρώπινων αρθρώσεων και των αντικειμένων που παρατηρούνται στις ακολουθίες, συνδυάζοντας επιπλέον πληροφορία σχετικά με την σημασιολογική τους ομοιότητα. Τα αποτελέσματα αυτής της διαδικασίας ανα ακολουθία μοντελοποιούνται ως ένας γράφος που αναπαριστά το περιεχόμενό της ακολουθίας ανά αντικείμενο. Συγκεκριμένα, κάθε αντικείμενο αντιστοιχεί σε ένα κόμβο του γράφου. Οι ακμές του γράφου μοντελοποιούν πληροφορία με βάση τα αποτελέσματα της χρονικής συν-τμηματοποίησης μεταξύ των αντικειμένων της ακολουθίας και της σημασιολογικής τους πληροφορίας, εφόσον αυτή είναι διαθέσιμη. Στη συνέχεια η ομοιότητα/απόσταση μεταξύ δύο ακολουθιών δράσεων βασίζεται στην απόσταση (Graph Edit Distance) μεταξύ των αντίστοιχων γράφων τους, και υπολογίζεται ως το κόστος μιας βέλτιστης λύσης αντιστοίχησης (bipartite graph matching) σε διμερή γράφο που συντίθεται απο τους δύο επιμέρους γράφων. Η προτεινόμενη μεθοδολογία αξιολογείται πειραματικά στα προβλήματα της κατηγοριοποίησης δράσεων, της αντιστοιχίας δράσεων (action matching) και στον υπολογισμό της σειράς κατάταξης μεταξύ ζευγών δράσεων με βάση την ομοιότητά τους (pairwise action ranking) ανάμεσα σε τριπλέτες ακολουθιών εικόνων. Τα αποτελέσματα οδηγούν στο συμπέρασμα ότι η προτεινόμενη μέθοδος έχει αξιόλογη απόδοση, συγκρίσιμη ή και καλύτερη αυτής των καλύτερων γνωστών σύγχρονων μεθόδων μη εποπτευόμενης και εποπτευόμενης μάθησης.

Download Full-text