Clustering of Synthetic Routes Using Tree Edit Distance

10.26434/chemrxiv.13372475.v1 ◽

2020 ◽

Author(s):

Samuel Genheden ◽

Ola Engkvist ◽

Esben Jannik Bjerrum

Keyword(s):

Open Source ◽

Edit Distance ◽

Prediction Tool ◽

Tree Edit Distance ◽

Distance Calculation ◽

Time Prediction ◽

Synthesis Routes ◽

Synthetic Routes ◽

Novel Algorithm

<div>We present a novel algorithm to compute the distance between synthesis routes based on a tree edit distance calculation. Such distances can be used to cluster synthesis routes from a retrosynthesis prediction tool. We show that the clustering of routes from a retrosynthesis analysis is performed in less than ten seconds on average, and only constitutes seven percent of the total time (prediction + clustering). Furthermore, we are able to show that representative routes from each cluster can be used to reduce the set of predicted routes. Finally, we show with a number of examples that the algorithm gives intuitive clusters that can be easily rationalized. The algorithm is included in the latest version of the open-source AiZynthFinder software.</div>

Download Full-text

Clustering of Synthetic Routes Using Tree Edit Distance

10.26434/chemrxiv.13372475 ◽

2020 ◽

Author(s):

Samuel Genheden ◽

Ola Engkvist ◽

Esben Jannik Bjerrum

Keyword(s):

Open Source ◽

Edit Distance ◽

Prediction Tool ◽

Tree Edit Distance ◽

Distance Calculation ◽

Time Prediction ◽

Synthesis Routes ◽

Synthetic Routes ◽

Novel Algorithm

<div>We present a novel algorithm to compute the distance between synthesis routes based on a tree edit distance calculation. Such distances can be used to cluster synthesis routes from a retrosynthesis prediction tool. We show that the clustering of routes from a retrosynthesis analysis is performed in less than ten seconds on average, and only constitutes seven percent of the total time (prediction + clustering). Furthermore, we are able to show that representative routes from each cluster can be used to reduce the set of predicted routes. Finally, we show with a number of examples that the algorithm gives intuitive clusters that can be easily rationalized. The algorithm is included in the latest version of the open-source AiZynthFinder software.</div>

Download Full-text

Tree Edit Distance Cannot be Computed in Strongly Subcubic Time (Unless APSP Can)

ACM Transactions on Algorithms ◽

10.1145/3381878 ◽

2020 ◽

Vol 16 (4) ◽

pp. 1-22

Author(s):

Karl Bringmann ◽

Paweł Gawrychowski ◽

Shay Mozes ◽

Oren Weimann

Keyword(s):

Edit Distance ◽

Tree Edit Distance

Download Full-text

An optimal decomposition algorithm for tree edit distance

ACM Transactions on Algorithms ◽

10.1145/1644015.1644017 ◽

2009 ◽

Vol 6 (1) ◽

pp. 1-19 ◽

Cited By ~ 64

Author(s):

Erik D. Demaine ◽

Shay Mozes ◽

Benjamin Rossman ◽

Oren Weimann

Keyword(s):

Edit Distance ◽

Decomposition Algorithm ◽

Tree Edit Distance

Download Full-text

Natural Language Inference for Arabic Using Extended Tree Edit Distance with Subtrees

Journal of Artificial Intelligence Research ◽

10.1613/jair.3892 ◽

2013 ◽

Vol 48 ◽

pp. 1-22 ◽

Cited By ~ 10

Author(s):

M. Alabbas ◽

A. Ramsay

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Edit Distance ◽

Arabic Text ◽

Tree Edit Distance ◽

Single Node ◽

Preliminary Results ◽

Standard Algorithm ◽

Standard Tree

Many natural language processing (NLP) applications require the computation of similarities between pairs of syntactic or semantic trees. Many researchers have used tree edit distance for this task, but this technique suffers from the drawback that it deals with single node operations only. We have extended the standard tree edit distance algorithm to deal with subtree transformation operations as well as single nodes. The extended algorithm with subtree operations, TED+ST, is more effective and flexible than the standard algorithm, especially for applications that pay attention to relations among nodes (e.g. in linguistic trees, deleting a modifier subtree should be cheaper than the sum of deleting its components individually). We describe the use of TED+ST for checking entailment between two Arabic text snippets. The preliminary results of using TED+ST were encouraging when compared with two string-based approaches and with the standard algorithm.

Download Full-text