scholarly journals Ligand-Based Virtual Screening Based on the Graph Edit Distance

2021 ◽  
Vol 22 (23) ◽  
pp. 12751
Author(s):  
Elena Rica ◽  
Susana Álvarez ◽  
Francesc Serratosa

Chemical compounds can be represented as attributed graphs. An attributed graph is a mathematical model of an object composed of two types of representations: nodes and edges. Nodes are individual components, and edges are relations between these components. In this case, pharmacophore-type node descriptions are represented by nodes and chemical bounds by edges. If we want to obtain the bioactivity dissimilarity between two chemical compounds, a distance between attributed graphs can be used. The Graph Edit Distance allows computing this distance, and it is defined as the cost of transforming one graph into another. Nevertheless, to define this dissimilarity, the transformation cost must be properly tuned. The aim of this paper is to analyse the structural-based screening methods to verify the quality of the Harper transformation costs proposal and to present an algorithm to learn these transformation costs such that the bioactivity dissimilarity is properly defined in a ligand-based virtual screening application. The goodness of the dissimilarity is represented by the classification accuracy. Six publicly available datasets—CAPST, DUD-E, GLL&GDD, NRLiSt-BDB, MUV and ULS-UDS—have been used to validate our methodology and show that with our learned costs, we obtain the highest ratios in identifying the bioactivity similarity in a structurally diverse group of molecules.

2020 ◽  
Vol 20 (18) ◽  
pp. 1582-1592 ◽  
Author(s):  
Carlos Garcia-Hernandez ◽  
Alberto Fernández ◽  
Francesc Serratosa

Background: Graph edit distance is a methodology used to solve error-tolerant graph matching. This methodology estimates a distance between two graphs by determining the minimum number of modifications required to transform one graph into the other. These modifications, known as edit operations, have an edit cost associated that has to be determined depending on the problem. Objective: This study focuses on the use of optimization techniques in order to learn the edit costs used when comparing graphs by means of the graph edit distance. Methods: Graphs represent reduced structural representations of molecules using pharmacophore-type node descriptions to encode the relevant molecular properties. This reduction technique is known as extended reduced graphs. The screening and statistical tools available on the ligand-based virtual screening benchmarking platform and the RDKit were used. Results: In the experiments, the graph edit distance using learned costs performed better or equally good than using predefined costs. This is exemplified with six publicly available datasets: DUD-E, MUV, GLL&GDD, CAPST, NRLiSt BDB, and ULS-UDS. Conclusion: This study shows that the graph edit distance along with learned edit costs is useful to identify bioactivity similarities in a structurally diverse group of molecules. Furthermore, the target-specific edit costs might provide useful structure-activity information for future drug-design efforts.


2021 ◽  
Vol 2 (6) ◽  
Author(s):  
Francesc Serratosa

AbstractGraph edit distance has been used since 1983 to compare objects in machine learning when these objects are represented by attributed graphs instead of vectors. In these cases, the graph edit distance is usually applied to deduce a distance between attributed graphs. This distance is defined as the minimum amount of edit operations (deletion, insertion and substitution of nodes and edges) needed to transform a graph into another. Since now, it has been stated that the distance properties have to be applied [(1) non-negativity (2) symmetry (3) identity and (4) triangle inequality] to the involved edit operations in the process of computing the graph edit distance to make the graph edit distance a metric. In this paper, we show that there is no need to impose the triangle inequality in each edit operation. This is an important finding since in pattern recognition applications, the classification ratio usually maximizes in the edit operation combinations (deletion, insertion and substitution of nodes and edges) that the triangle inequality is not fulfilled.


2013 ◽  
Vol 378 ◽  
pp. 546-551 ◽  
Author(s):  
Joanna Strug ◽  
Barbara Strug

Mutation testing is an effective technique for assessing quality of tests provided for a system. However it suffers from high computational cost of executing mutants of the system. In this paper a method of classifying such mutants is proposed. This classification is based on using an edit distance kernel and k-NN classifier. Using the results of this classification it is possible to predict whether a mutant would be detected by tests or not. Thus the application of the approach can help to lower the number of mutants that have to be executed and so also to lower the cost of using the mutation testing.


2014 ◽  
Vol 513-517 ◽  
pp. 4411-4416
Author(s):  
Qiang Rong Jiang ◽  
Jian Chang Song ◽  
Zhe Wu

Natural scene classification is a challenging pattern classification problem nowadays. The description of image plays a crucial role in the process of recognition. Many different approaches and feature extraction methodologies concerning scene classification have been proposed and applied in the last few years. This paper proposed a novel method of natural scene recognition based on graph edit distance (GED) in which scene images are represented by attributed graph. The vertex label is the features of regions and edge label is the features of public area of adjacent regions. This method used local representation as well as global way, realized the cooperation of global and local mechanisms. The proposed method approaches satisfactory categorization performances on the well-known scene classification datasets with 8 scene categories.


2019 ◽  
Vol 59 (4) ◽  
pp. 1410-1421 ◽  
Author(s):  
Carlos Garcia-Hernandez ◽  
Alberto Fernández ◽  
Francesc Serratosa

Author(s):  
Francesc Serratosa

This paper presents a methodology for generating pairs of attributed graphs with a lower and upper- bounded graph edit distance (GED). It is independent of the type of attributes on nodes and edges. The algorithm is composed of three steps: randomly generating a graph, generating another graph as a sub-graph of the first, and adding structural and semantic noise to both. These graphs, together with their bounded distances, can be used to manufacture synthetic databases of large graphs. The exact GED between large graphs cannot be obtained for runtime reasons since it has to be computed through an optimal algorithm with an exponential computational cost. Through this database, we can test the behavior of the known or new sub-optimal error-tolerant graph-matching algorithms against a lower and an upper bound GED on large graphs, even though we do not have the true distance. It is not clear how the error induced by the use of sub-optimal algorithms grows with problem size. Thus, with this methodology, we can generate graph databases and analyze if the current assumption that we can extrapolate algorithms’ behavior from matching small graphs to large graphs is correct or not. We also show that with some restrictions, the methodology returns the optimal GED in a quadratic time and that it can also be used to generate graph databases to test exact sub-graph isomorphism algorithms.


Author(s):  
Nur Maimun ◽  
Jihan Natassa ◽  
Wen Via Trisna ◽  
Yeye Supriatin

The accuracy in administering the diagnosis code was the important matter for medical recorder, quality of data was the most important thing for health information management of medical recorder. This study aims to know the coder competency for accuracy and precision of using ICD 10 at X Hospital in Pekanbaru. This study was a qualitative method with case study implementation from five informan. The result show that medical personnel (doctor) have never received a training about coding, doctors writing that hard and difficult to read, failure for making diagnoses code or procedures, doctor used an usual abbreviations that are not standard, theres still an officer who are not understand about the nomenclature and mastering anatomy phatology, facilities and infrastructure were supported for accuracy and precision of the existing code. The errors of coding always happen because there is a human error. The accuracy and precision in coding very influence against the cost of INA CBGs, medical and the committee did most of the work in the case of severity level III, while medical record had a role in monitoring or evaluation of coding implementation. If there are resumes that is not clearly case mix team check file needed medical record the result the diagnoses or coding for conformity. Keywords: coder competency, accuracy and precision of coding, ICD 10


2017 ◽  
pp. 139-145
Author(s):  
R. I. Hamidullin ◽  
L. B. Senkevich

A study of the quality of the development of estimate documentation on the cost of construction at all stages of the implementation of large projects in the oil and gas industry is conducted. The main problems that arise in construction organizations are indicated. The analysis of the choice of the perfect methodology of mathematical modeling of the investigated business process for improving the activity of budget calculations, conducting quality assessment of estimates and criteria for automation of design estimates is performed.


2015 ◽  
Vol 6 (1) ◽  
pp. 50-57
Author(s):  
Rizqa Raaiqa Bintana ◽  
Putri Aisyiyah Rakhma Devi ◽  
Umi Laili Yuhana

The quality of the software can be measured by its return on investment. Factors which may affect the return on investment (ROI) is the tangible factors (such as the cost) dan intangible factors (such as the impact of software to the users or stakeholder). The factor of the software itself are assessed through reviewing, testing, process audit, and performance of software. This paper discusses the consideration of return on investment (ROI) assessment criteria derived from the software and its users. These criteria indicate that the approach may support a rational consideration of all relevant criteria when evaluating software, and shows examples of actual return on investment models. Conducted an analysis of the assessment criteria that affect the return on investment if these criteria have a disproportionate effort that resulted in a return on investment of a software decreased. Index Terms - Assessment criteria, Quality assurance, Return on Investment, Software product


Sign in / Sign up

Export Citation Format

Share Document