Ligand-Based Virtual Screening Using Graph Edit Distance as Molecular Similarity Measure

Background: Graph edit distance is a methodology used to solve error-tolerant graph matching. This methodology estimates a distance between two graphs by determining the minimum number of modifications required to transform one graph into the other. These modifications, known as edit operations, have an edit cost associated that has to be determined depending on the problem. Objective: This study focuses on the use of optimization techniques in order to learn the edit costs used when comparing graphs by means of the graph edit distance. Methods: Graphs represent reduced structural representations of molecules using pharmacophore-type node descriptions to encode the relevant molecular properties. This reduction technique is known as extended reduced graphs. The screening and statistical tools available on the ligand-based virtual screening benchmarking platform and the RDKit were used. Results: In the experiments, the graph edit distance using learned costs performed better or equally good than using predefined costs. This is exemplified with six publicly available datasets: DUD-E, MUV, GLL&GDD, CAPST, NRLiSt BDB, and ULS-UDS. Conclusion: This study shows that the graph edit distance along with learned edit costs is useful to identify bioactivity similarities in a structurally diverse group of molecules. Furthermore, the target-specific edit costs might provide useful structure-activity information for future drug-design efforts.

Download Full-text

Relative Hausdorff distance for network analysis

Applied Network Science ◽

10.1007/s41109-019-0198-0 ◽

2019 ◽

Vol 4 (1) ◽

Cited By ~ 1

Author(s):

Sinan G. Aksoy ◽

Kathleen E. Nowak ◽

Emilie Purvine ◽

Stephen J. Young

Keyword(s):

Machine Learning ◽

Network Analysis ◽

Similarity Measure ◽

Hausdorff Distance ◽

Data Science ◽

Edit Distance ◽

Similarity Measures ◽

Graph Edit Distance ◽

Computationally Intensive

Abstract Similarity measures are used extensively in machine learning and data science algorithms. The newly proposed graph Relative Hausdorff (RH) distance is a lightweight yet nuanced similarity measure for quantifying the closeness of two graphs. In this work we study the effectiveness of RH distance as a tool for detecting anomalies in time-evolving graph sequences. We apply RH to cyber data with given red team events, as well to synthetically generated sequences of graphs with planted attacks. In our experiments, the performance of RH distance is at times comparable, and sometimes superior, to graph edit distance in detecting anomalous phenomena. Our results suggest that in appropriate contexts, RH distance has advantages over more computationally intensive similarity measures.

Download Full-text

Ligand-Based Virtual Screening Based on the Graph Edit Distance

International Journal of Molecular Sciences ◽

10.3390/ijms222312751 ◽

2021 ◽

Vol 22 (23) ◽

pp. 12751

Author(s):

Elena Rica ◽

Susana Álvarez ◽

Francesc Serratosa

Keyword(s):

Virtual Screening ◽

Edit Distance ◽

Chemical Compounds ◽

Screening Methods ◽

Graph Edit Distance ◽

Attributed Graph ◽

Attributed Graphs ◽

Type Node ◽

The Cost

Chemical compounds can be represented as attributed graphs. An attributed graph is a mathematical model of an object composed of two types of representations: nodes and edges. Nodes are individual components, and edges are relations between these components. In this case, pharmacophore-type node descriptions are represented by nodes and chemical bounds by edges. If we want to obtain the bioactivity dissimilarity between two chemical compounds, a distance between attributed graphs can be used. The Graph Edit Distance allows computing this distance, and it is defined as the cost of transforming one graph into another. Nevertheless, to define this dissimilarity, the transformation cost must be properly tuned. The aim of this paper is to analyse the structural-based screening methods to verify the quality of the Harper transformation costs proposal and to present an algorithm to learn these transformation costs such that the bioactivity dissimilarity is properly defined in a ligand-based virtual screening application. The goodness of the dissimilarity is represented by the classification accuracy. Six publicly available datasets—CAPST, DUD-E, GLL&GDD, NRLiSt-BDB, MUV and ULS-UDS—have been used to validate our methodology and show that with our learned costs, we obtain the highest ratios in identifying the bioactivity similarity in a structurally diverse group of molecules.

Download Full-text

TAGER: Transition-Labeled Graph Edit Distance Similarity Measure on Process Models

On the Move to Meaningful Internet Systems: OTM 2014 Conferences - Lecture Notes in Computer Science ◽

10.1007/978-3-662-45563-0_11 ◽

2014 ◽

pp. 184-201 ◽

Cited By ~ 2

Author(s):

Zixuan Wang ◽

Lijie Wen ◽

Jianmin Wang ◽

Shuhao Wang

Keyword(s):

Similarity Measure ◽

Edit Distance ◽

Process Models ◽

Graph Edit Distance ◽

Labeled Graph

Download Full-text

On the unification of the graph edit distance and graph matching problems

Pattern Recognition Letters ◽

10.1016/j.patrec.2021.02.014 ◽

2021 ◽

Vol 145 ◽

pp. 240-246

Author(s):

Romain Raveaux

Keyword(s):

Edit Distance ◽

Graph Matching ◽

Graph Edit Distance ◽

Matching Problems

Download Full-text

On-line learning the graph edit distance costs

Pattern Recognition Letters ◽

10.1016/j.patrec.2021.02.019 ◽

2021 ◽

Author(s):

Elena Rica ◽

Susana Álvarez ◽

Francesc Serratosa

Keyword(s):

Edit Distance ◽

Graph Edit Distance ◽

On Line ◽

On Line Learning

Download Full-text

Inter-Structure and Intra-Structure Similarity of Use Case Diagram using Greedy Graph Edit Distance

2020 2nd International Conference on Cybernetics and Intelligent System (ICORIS) ◽

10.1109/icoris50180.2020.9320840 ◽

2020 ◽

Author(s):

Fatimatus Zulfa ◽

Daniel Oranova Siahaan ◽

Reza Fauzan ◽

Evi Triandini

Keyword(s):

Edit Distance ◽

Use Case ◽

Graph Edit Distance ◽

Structure Similarity ◽

Use Case Diagram

Download Full-text

A New Approach to Measuring the Similarity of Indoor Semantic Trajectories

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10020090 ◽

2021 ◽

Vol 10 (2) ◽

pp. 90

Author(s):

Jin Zhu ◽

Dayu Cheng ◽

Weiwei Zhang ◽

Ci Song ◽

Jie Chen ◽

...

Keyword(s):

Similarity Measure ◽

Semantic Information ◽

Edit Distance ◽

Similarity Measures ◽

Indoor Positioning ◽

Synthetic Dataset ◽

Shopping Mall ◽

Indoor Space ◽

Trajectory Similarity ◽

Indoor Spaces

People spend more than 80% of their time in indoor spaces, such as shopping malls and office buildings. Indoor trajectories collected by indoor positioning devices, such as WiFi and Bluetooth devices, can reflect human movement behaviors in indoor spaces. Insightful indoor movement patterns can be discovered from indoor trajectories using various clustering methods. These methods are based on a measure that reflects the degree of similarity between indoor trajectories. Researchers have proposed many trajectory similarity measures. However, existing trajectory similarity measures ignore the indoor movement constraints imposed by the indoor space and the characteristics of indoor positioning sensors, which leads to an inaccurate measure of indoor trajectory similarity. Additionally, most of these works focus on the spatial and temporal dimensions of trajectories and pay less attention to indoor semantic information. Integrating indoor semantic information such as the indoor point of interest into the indoor trajectory similarity measurement is beneficial to discovering pedestrians having similar intentions. In this paper, we propose an accurate and reasonable indoor trajectory similarity measure called the indoor semantic trajectory similarity measure (ISTSM), which considers the features of indoor trajectories and indoor semantic information simultaneously. The ISTSM is modified from the edit distance that is a measure of the distance between string sequences. The key component of the ISTSM is an indoor navigation graph that is transformed from an indoor floor plan representing the indoor space for computing accurate indoor walking distances. The indoor walking distances and indoor semantic information are fused into the edit distance seamlessly. The ISTSM is evaluated using a synthetic dataset and real dataset for a shopping mall. The experiment with the synthetic dataset reveals that the ISTSM is more accurate and reasonable than three other popular trajectory similarities, namely the longest common subsequence (LCSS), edit distance on real sequence (EDR), and the multidimensional similarity measure (MSM). The case study of a shopping mall shows that the ISTSM effectively reveals customer movement patterns of indoor customers.

Download Full-text

An efficient algorithm for graph edit distance computation

Knowledge-Based Systems ◽

10.1016/j.knosys.2018.10.002 ◽

2019 ◽

Vol 163 ◽

pp. 762-775 ◽

Cited By ~ 6

Author(s):

Xiaoyang Chen ◽

Hongwei Huo ◽

Jun Huan ◽

Jeffrey Scott Vitter

Keyword(s):

Efficient Algorithm ◽

Edit Distance ◽

Graph Edit Distance ◽

Distance Computation

Download Full-text

Redefining the Graph Edit Distance

SN Computer Science ◽

10.1007/s42979-021-00792-5 ◽

2021 ◽

Vol 2 (6) ◽

Author(s):

Francesc Serratosa

Keyword(s):

Machine Learning ◽

Pattern Recognition ◽

Triangle Inequality ◽

Edit Distance ◽

Minimum Amount ◽

Graph Edit Distance ◽

Edit Operation ◽

Attributed Graphs ◽

Distance Properties

AbstractGraph edit distance has been used since 1983 to compare objects in machine learning when these objects are represented by attributed graphs instead of vectors. In these cases, the graph edit distance is usually applied to deduce a distance between attributed graphs. This distance is defined as the minimum amount of edit operations (deletion, insertion and substitution of nodes and edges) needed to transform a graph into another. Since now, it has been stated that the distance properties have to be applied [(1) non-negativity (2) symmetry (3) identity and (4) triangle inequality] to the involved edit operations in the process of computing the graph edit distance to make the graph edit distance a metric. In this paper, we show that there is no need to impose the triangle inequality in each edit operation. This is an important finding since in pattern recognition applications, the classification ratio usually maximizes in the edit operation combinations (deletion, insertion and substitution of nodes and edges) that the triangle inequality is not fulfilled.

Download Full-text