Ligand-Based Virtual Screening Based on the Graph Edit Distance

Chemical compounds can be represented as attributed graphs. An attributed graph is a mathematical model of an object composed of two types of representations: nodes and edges. Nodes are individual components, and edges are relations between these components. In this case, pharmacophore-type node descriptions are represented by nodes and chemical bounds by edges. If we want to obtain the bioactivity dissimilarity between two chemical compounds, a distance between attributed graphs can be used. The Graph Edit Distance allows computing this distance, and it is defined as the cost of transforming one graph into another. Nevertheless, to define this dissimilarity, the transformation cost must be properly tuned. The aim of this paper is to analyse the structural-based screening methods to verify the quality of the Harper transformation costs proposal and to present an algorithm to learn these transformation costs such that the bioactivity dissimilarity is properly defined in a ligand-based virtual screening application. The goodness of the dissimilarity is represented by the classification accuracy. Six publicly available datasets—CAPST, DUD-E, GLL&GDD, NRLiSt-BDB, MUV and ULS-UDS—have been used to validate our methodology and show that with our learned costs, we obtain the highest ratios in identifying the bioactivity similarity in a structurally diverse group of molecules.

Download Full-text

Learning the Edit Costs of Graph Edit Distance Applied to Ligand-Based Virtual Screening

Current Topics in Medicinal Chemistry ◽

10.2174/1568026620666200603122000 ◽

2020 ◽

Vol 20 (18) ◽

pp. 1582-1592 ◽

Cited By ~ 1

Author(s):

Carlos Garcia-Hernandez ◽

Alberto Fernández ◽

Francesc Serratosa

Keyword(s):

Virtual Screening ◽

Edit Distance ◽

Graph Matching ◽

Optimization Techniques ◽

Graph Edit Distance ◽

Structure Activity ◽

Future Drug ◽

Minimum Number ◽

Type Node ◽

Activity Information

Background: Graph edit distance is a methodology used to solve error-tolerant graph matching. This methodology estimates a distance between two graphs by determining the minimum number of modifications required to transform one graph into the other. These modifications, known as edit operations, have an edit cost associated that has to be determined depending on the problem. Objective: This study focuses on the use of optimization techniques in order to learn the edit costs used when comparing graphs by means of the graph edit distance. Methods: Graphs represent reduced structural representations of molecules using pharmacophore-type node descriptions to encode the relevant molecular properties. This reduction technique is known as extended reduced graphs. The screening and statistical tools available on the ligand-based virtual screening benchmarking platform and the RDKit were used. Results: In the experiments, the graph edit distance using learned costs performed better or equally good than using predefined costs. This is exemplified with six publicly available datasets: DUD-E, MUV, GLL&GDD, CAPST, NRLiSt BDB, and ULS-UDS. Conclusion: This study shows that the graph edit distance along with learned edit costs is useful to identify bioactivity similarities in a structurally diverse group of molecules. Furthermore, the target-specific edit costs might provide useful structure-activity information for future drug-design efforts.

Download Full-text

Redefining the Graph Edit Distance

SN Computer Science ◽

10.1007/s42979-021-00792-5 ◽

2021 ◽

Vol 2 (6) ◽

Author(s):

Francesc Serratosa

Keyword(s):

Machine Learning ◽

Pattern Recognition ◽

Triangle Inequality ◽

Edit Distance ◽

Minimum Amount ◽

Graph Edit Distance ◽

Edit Operation ◽

Attributed Graphs ◽

Distance Properties

AbstractGraph edit distance has been used since 1983 to compare objects in machine learning when these objects are represented by attributed graphs instead of vectors. In these cases, the graph edit distance is usually applied to deduce a distance between attributed graphs. This distance is defined as the minimum amount of edit operations (deletion, insertion and substitution of nodes and edges) needed to transform a graph into another. Since now, it has been stated that the distance properties have to be applied [(1) non-negativity (2) symmetry (3) identity and (4) triangle inequality] to the involved edit operations in the process of computing the graph edit distance to make the graph edit distance a metric. In this paper, we show that there is no need to impose the triangle inequality in each edit operation. This is an important finding since in pattern recognition applications, the classification ratio usually maximizes in the edit operation combinations (deletion, insertion and substitution of nodes and edges) that the triangle inequality is not fulfilled.

Download Full-text

Using Structural Similarity to Classify Tests in Mutation Testing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.378.546 ◽

2013 ◽

Vol 378 ◽

pp. 546-551 ◽

Cited By ~ 4

Author(s):

Joanna Strug ◽

Barbara Strug

Keyword(s):

Edit Distance ◽

Computational Cost ◽

Structural Similarity ◽

Mutation Testing ◽

Effective Technique ◽

The Cost ◽

High Computational Cost

Mutation testing is an effective technique for assessing quality of tests provided for a system. However it suffers from high computational cost of executing mutants of the system. In this paper a method of classifying such mutants is proposed. This classification is based on using an edit distance kernel and k-NN classifier. Using the results of this classification it is possible to predict whether a mutant would be detected by tests or not. Thus the application of the approach can help to lower the number of mutants that have to be executed and so also to lower the cost of using the mutation testing.

Download Full-text

Natural Scene Recognition Based on Graph Edit Distance

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.4411 ◽

2014 ◽

Vol 513-517 ◽

pp. 4411-4416

Author(s):

Qiang Rong Jiang ◽

Jian Chang Song ◽

Zhe Wu

Keyword(s):

Edit Distance ◽

Classification Problem ◽

Scene Recognition ◽

Natural Scene ◽

Scene Classification ◽

Graph Edit Distance ◽

Attributed Graph ◽

Novel Method ◽

Vertex Label ◽

Global And Local

Natural scene classification is a challenging pattern classification problem nowadays. The description of image plays a crucial role in the process of recognition. Many different approaches and feature extraction methodologies concerning scene classification have been proposed and applied in the last few years. This paper proposed a novel method of natural scene recognition based on graph edit distance (GED) in which scene images are represented by attributed graph. The vertex label is the features of regions and edge label is the features of public area of adjacent regions. This method used local representation as well as global way, realized the cooperation of global and local mechanisms. The proposed method approaches satisfactory categorization performances on the well-known scene classification datasets with 8 scene categories.

Download Full-text

Ligand-Based Virtual Screening Using Graph Edit Distance as Molecular Similarity Measure

Journal of Chemical Information and Modeling ◽

10.1021/acs.jcim.8b00820 ◽

2019 ◽

Vol 59 (4) ◽

pp. 1410-1421 ◽

Cited By ~ 5

Author(s):

Carlos Garcia-Hernandez ◽

Alberto Fernández ◽

Francesc Serratosa

Keyword(s):

Virtual Screening ◽

Similarity Measure ◽

Edit Distance ◽

Molecular Similarity ◽

Graph Edit Distance

Download Full-text

A Methodology to Generate Attributed Graphs with a Bounded Graph Edit Distance for Graph-Matching Testing

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001418500386 ◽

2018 ◽

Vol 32 (11) ◽

pp. 1850038 ◽

Cited By ~ 1

Author(s):

Francesc Serratosa

Keyword(s):

Edit Distance ◽

Graph Matching ◽

Optimal Algorithm ◽

Computational Cost ◽

Graph Databases ◽

Graph Edit Distance ◽

Problem Size ◽

Large Graphs ◽

Attributed Graphs ◽

Bounded Graph

This paper presents a methodology for generating pairs of attributed graphs with a lower and upper- bounded graph edit distance (GED). It is independent of the type of attributes on nodes and edges. The algorithm is composed of three steps: randomly generating a graph, generating another graph as a sub-graph of the first, and adding structural and semantic noise to both. These graphs, together with their bounded distances, can be used to manufacture synthetic databases of large graphs. The exact GED between large graphs cannot be obtained for runtime reasons since it has to be computed through an optimal algorithm with an exponential computational cost. Through this database, we can test the behavior of the known or new sub-optimal error-tolerant graph-matching algorithms against a lower and an upper bound GED on large graphs, even though we do not have the true distance. It is not clear how the error induced by the use of sub-optimal algorithms grows with problem size. Thus, with this methodology, we can generate graph databases and analyze if the current assumption that we can extrapolate algorithms’ behavior from matching small graphs to large graphs is correct or not. We also show that with some restrictions, the methodology returns the optimal GED in a quadratic time and that it can also be used to generate graph databases to test exact sub-graph isomorphism algorithms.

Download Full-text

Measuring the cost of quality of business processes

10.2514/6.1989-3231 ◽

1989 ◽

Author(s):

HERBERT APPLETON

Keyword(s):

Business Processes ◽

Cost Of Quality ◽

The Cost

Download Full-text

Pengaruh Kompetensi Coder terhadap Keakuratan dan Ketepatan Pengkodean Menggunakan ICD 10 di Rumah Sakit X Pekanbaru Tahun 2016

KESMARS Jurnal Kesehatan Masyarakat Manajemen dan Administrasi Rumah Sakit ◽

10.31539/kesmars.v1i1.158 ◽

2018 ◽

Vol 1 (1) ◽

pp. 31-43

Author(s):

Nur Maimun ◽

Jihan Natassa ◽

Wen Via Trisna ◽

Yeye Supriatin

Keyword(s):

Medical Record ◽

Human Error ◽

Medical Personnel ◽

Quality Of Data ◽

Accuracy And Precision ◽

Result Show ◽

Study Implementation ◽

Icd 10 ◽

The Cost

The accuracy in administering the diagnosis code was the important matter for medical recorder, quality of data was the most important thing for health information management of medical recorder. This study aims to know the coder competency for accuracy and precision of using ICD 10 at X Hospital in Pekanbaru. This study was a qualitative method with case study implementation from five informan. The result show that medical personnel (doctor) have never received a training about coding, doctors writing that hard and difficult to read, failure for making diagnoses code or procedures, doctor used an usual abbreviations that are not standard, theres still an officer who are not understand about the nomenclature and mastering anatomy phatology, facilities and infrastructure were supported for accuracy and precision of the existing code. The errors of coding always happen because there is a human error. The accuracy and precision in coding very influence against the cost of INA CBGs, medical and the committee did most of the work in the case of severity level III, while medical record had a role in monitoring or evaluation of coding implementation. If there are resumes that is not clearly case mix team check file needed medical record the result the diagnoses or coding for conformity. Keywords: coder competency, accuracy and precision of coding, ICD 10

Download Full-text

ABOUT NECESSITY OF MATHEMATICAL MODELING OF THE BUSINESS PROCESS OF COST ESTIMATE CALCULATIONS IN THE CONSTRUCTION OF OIL AND GAS FACILITIES

Oil and Gas Studies ◽

10.31660/0445-0108-2017-6-139-145 ◽

2017 ◽

pp. 139-145

Author(s):

R. I. Hamidullin ◽

L. B. Senkevich

Keyword(s):

Mathematical Modeling ◽

Quality Assessment ◽

Business Process ◽

Oil And Gas ◽

Oil And Gas Industry ◽

Gas Industry ◽

Construction Organizations ◽

The Cost ◽

Cost Of Construction

A study of the quality of the development of estimate documentation on the cost of construction at all stages of the implementation of large projects in the oil and gas industry is conducted. The main problems that arise in construction organizations are indicated. The analysis of the choice of the perfect methodology of mathematical modeling of the investigated business process for improving the activity of budget calculations, conducting quality assessment of estimates and criteria for automation of design estimates is performed.

Download Full-text

Return on Invesment (ROI) of Software Product: A Systematic Literature Review

Jurnal ULTIMA InfoSys ◽

10.31937/si.v6i1.279 ◽

2015 ◽

Vol 6 (1) ◽

pp. 50-57

Author(s):

Rizqa Raaiqa Bintana ◽

Putri Aisyiyah Rakhma Devi ◽

Umi Laili Yuhana

Keyword(s):

Return On Investment ◽

Assessment Criteria ◽

Software Product ◽

And Performance ◽

Rational Consideration ◽

Actual Return ◽

The Cost ◽

Investment Models ◽

The Impact

The quality of the software can be measured by its return on investment. Factors which may affect the return on investment (ROI) is the tangible factors (such as the cost) dan intangible factors (such as the impact of software to the users or stakeholder). The factor of the software itself are assessed through reviewing, testing, process audit, and performance of software. This paper discusses the consideration of return on investment (ROI) assessment criteria derived from the software and its users. These criteria indicate that the approach may support a rational consideration of all relevant criteria when evaluating software, and shows examples of actual return on investment models. Conducted an analysis of the assessment criteria that affect the return on investment if these criteria have a disproportionate effort that resulted in a return on investment of a software decreased. Index Terms - Assessment criteria, Quality assurance, Return on Investment, Software product

Download Full-text