edit distance Latest Research Papers

GraphChainer: Co-linear Chaining for Accurate Alignment of Long Reads to Variation Graphs

10.1101/2022.01.07.475257 ◽

2022 ◽

Author(s):

Jun Ma ◽

Manuel Cáceres ◽

Leena Salmela ◽

Veli Mäkinen ◽

Alexandru I. Tomescu

Keyword(s):

Edit Distance ◽

State Of The Art ◽

Variant Calling ◽

Real Data ◽

Read Length ◽

Human Chromosomes ◽

Distance Threshold ◽

Nature Biotechnology ◽

Long Reads ◽

Standard Task

Aligning reads to a variation graph is a standard task in pangenomics, with downstream applications in e.g., improving variant calling. While the vg toolkit (Garrison et al., Nature Biotechnology, 2018) is a popular aligner of short reads, GraphAligner (Rautiainen and Marschall, Genome Biology, 2020) is the state-of-the-art aligner of long reads. GraphAligner works by finding candidate read occurrences based on individually extending the best seeds of the read in the variation graph. However, a more principled approach recognized in the community is to co-linearly chain multiple seeds. We present a new algorithm to co-linearly chain a set of seeds in an acyclic variation graph, together with the first efficient implementation of such a co-linear chaining algorithm into a new aligner of long reads to variation graphs, GraphChainer. Compared to GraphAligner, at a normalized edit distance threshold of 40%, it aligns 9% to 12% more reads, and 15% to 19% more total read length, on real PacBio reads from human chromosomes 1 and 22. On both simulated and real data, GraphChainer aligns between 97% and 99% of all reads, and of total read length. At the more stringent normalized edit distance threshold of 30%, GraphChainer aligns up to 29% more total real read length than GraphAligner. GraphChainer is freely available at https://github.com/algbio/GraphChainer

Detection and Automatic Correction of Bengali Misspelled Words Using Minimum Edit Distance

Lecture Notes in Networks and Systems - Advances in Distributed Computing and Machine Learning ◽

10.1007/978-981-16-4807-6_19 ◽

2022 ◽

pp. 196-204

Author(s):

Sourav Mallick ◽

Antara Pal ◽

Alok Ranjan Pal

Keyword(s):

Edit Distance ◽

Automatic Correction

An Improved Algorithm for The k-Dyck Edit Distance Problem

10.1137/1.9781611977073.144 ◽

2022 ◽

pp. 3650-3669

Author(s):

Dvir Fried ◽

Shay Golan ◽

Tomasz Kociumaka ◽

Tsvi Kopelowitz ◽

Ely Porat ◽

...

Keyword(s):

Edit Distance ◽

Distance Problem ◽

Improved Algorithm

Bargaining power in informal trilogues: Intra-institutional preference cohesion and inter-institutional bargaining success

European Union Politics ◽

10.1177/14651165211064485 ◽

2021 ◽

pp. 146511652110644

Author(s):

Maximilian Haag

Keyword(s):

European Union ◽

Bargaining Power ◽

Edit Distance ◽

European Parliament ◽

The European Union ◽

Legislative Bargaining ◽

Document Similarity ◽

Institutional Constraint

Informal trilogue meetings are the main legislative bargaining forum in the European Union, yet their dynamics remain largely understudied in a quantitative context. This article builds on the assumption that the negotiating delegations of the European Parliament and the Council play a two-level game whereby these actors can use their intra-institutional constraint to extract inter-institutional bargaining success. Negotiators can credibly claim that their hands are tied if the members of their parent institutions hold similar preferences and do not accept alternative proposals or if their institution is divided and negotiators need to defend a fragile compromise. Employing a measure of document similarity (minimum edit distance) between an institution's negotiation mandate and the trilogue outcome to measure bargaining success, the analysis supports the hypothesis for the European Parliament, but not for the Council.

Ligand-Based Virtual Screening Based on the Graph Edit Distance

International Journal of Molecular Sciences ◽

10.3390/ijms222312751 ◽

2021 ◽

Vol 22 (23) ◽

pp. 12751

Author(s):

Elena Rica ◽

Susana Álvarez ◽

Francesc Serratosa

Keyword(s):

Virtual Screening ◽

Edit Distance ◽

Chemical Compounds ◽

Screening Methods ◽

Graph Edit Distance ◽

Attributed Graph ◽

Attributed Graphs ◽

Type Node ◽

The Cost

Chemical compounds can be represented as attributed graphs. An attributed graph is a mathematical model of an object composed of two types of representations: nodes and edges. Nodes are individual components, and edges are relations between these components. In this case, pharmacophore-type node descriptions are represented by nodes and chemical bounds by edges. If we want to obtain the bioactivity dissimilarity between two chemical compounds, a distance between attributed graphs can be used. The Graph Edit Distance allows computing this distance, and it is defined as the cost of transforming one graph into another. Nevertheless, to define this dissimilarity, the transformation cost must be properly tuned. The aim of this paper is to analyse the structural-based screening methods to verify the quality of the Harper transformation costs proposal and to present an algorithm to learn these transformation costs such that the bioactivity dissimilarity is properly defined in a ligand-based virtual screening application. The goodness of the dissimilarity is represented by the classification accuracy. Six publicly available datasets—CAPST, DUD-E, GLL&GDD, NRLiSt-BDB, MUV and ULS-UDS—have been used to validate our methodology and show that with our learned costs, we obtain the highest ratios in identifying the bioactivity similarity in a structurally diverse group of molecules.

An Edge-Based Approach for Virtual Network Embedding Based on the Graph Edit Distance

10.21203/rs.3.rs-1029589/v1 ◽

2021 ◽

Author(s):

Ze Xi Xu ◽

Lei Zhuang ◽

Meng Yang He ◽

Si Jin Yang ◽

Yu Song ◽

...

Keyword(s):

Edit Distance ◽

Virtual Network ◽

Cost Ratio ◽

Virtual Network Embedding ◽

Network Embedding ◽

Graph Edit Distance ◽

Network Resources ◽

Network Resource ◽

Distance Method ◽

Edge Based

Abstract Virtualization and resource isolation techniques have enabled the efficient sharing of networked resources. How to control network resource allocation accurately and flexibly has gradually become a research hotspot due to the growth in user demands. Therefore, this paper presents a new edge-based virtual network embedding approach to studying this problem that employs a graph edit distance method to accurately control resource usage. In particular, to manage network resources efficiently, we restrict the use conditions of network resources and restrict the structure based on common substructure isomorphism and an improved spider monkey optimization algorithm is employed to prune redundant information from the substrate network. Experimental results showed that the proposed method achieves better performance than existing algorithms in terms of resource management capacity, including energy savings and the revenue-cost ratio.

Indonesian Spelling Error Detection and Type Identification Using Bigram Vector and Minimum Edit Distance Based Probabilities

SinkrOn ◽

10.33395/sinkron.v6i1.11224 ◽

2021 ◽

Vol 6 (1) ◽

pp. 183-190

Author(s):

Emmy Erwina ◽

Tommy Tommy ◽

Mayasari Mayasari

Keyword(s):

Test Data ◽

Error Detection ◽

Edit Distance ◽

Spelling Error ◽

Correct Word ◽

Error Type ◽

Detection And Identification ◽

Younger Generation ◽

Identification Model ◽

Error Types

Spelling error has become an error that is often found in this era which can be seen from the use of words that tend to follow trends or culture, especially in the younger generation. This study aims to develop and test a detection and identification model using a combination of Bigram Vector and Minimum Edit Distance Based Probabilities. Correct words from error words are obtained using candidates search and probability calculations that adopt the concept of minimum edit distance. The detection results then identified the error type into three types of errors, namely vowels, consonants and diphthongs from the error side on the tendency of the characters used as a result of phonemic rendering at the time of writing. The results of error detection and identification of error types obtained are quite good where most of the error test data can be detected and identified according to the type of error, although there are several detection errors by obtaining more than one correct word as a result of the same probability value of these words.

Fast and Optimal Sequence-to-Graph Alignment Guided by Seeds

10.1101/2021.11.05.467453 ◽

2021 ◽

Author(s):

Pesho Ivanov ◽

Benjamin Bichsel ◽

Martin Vechev

Keyword(s):

Dynamic Programming ◽

Edit Distance ◽

Reference Genome ◽

State Of The Art ◽

Optimal Alignment ◽

Reference Mark ◽

A Algorithm ◽

Optimal Sequence ◽

E Coli ◽

Graph Alignment

We present a novel A* seed heuristic enabling fast and optimal sequence-to-graph alignment, guaranteed to minimize the edit distance of the alignment assuming non-negative edit costs. We phrase optimal alignment as a shortest path problem and solve it by instantiating the A* algorithm with our novel seed heuristic. The key idea of the seed heuristic is to extract seeds from the read, locate them in the reference, mark preceding reference positions by crumbs, and use the crumbs to direct the A* search. We prove admissibility of the seed heuristic, thus guaranteeing alignment optimality. Our implementation extends the free and open source AStarix aligner and demonstrates that the seed heuristic outperforms all state-of-the-art optimal aligners including GraphAligner, Vargas, PaSGAL, and the prefix heuristic previously employed by AStarix. Specifically, we achieve a consistent speedup of >60x on both short Illumina reads and long HiFi reads (up to 25kbp), on both the E. coli linear reference genome (1Mbp) and the MHC variant graph (5Mbp). Our speedup is enabled by the seed heuristic consistently skipping >99.99% of the table cells that optimal aligners based on dynamic programming compute.

Interest of the minimum edit distance to detect behaviour change of the elderly person

10.1109/embc46164.2021.9629665 ◽

2021 ◽

Author(s):

Soumaya Msaad ◽

Jean-Louis Dillenseger ◽

Guy Carrault

Keyword(s):

Behaviour Change ◽

Edit Distance ◽

Elderly Person ◽

The Elderly

PEMBOBOTAN DINAMIS BERBASIS POSISI PADA APPROXIMATE STRING MATCHING

Jurnal Komputer dan Informatika ◽

10.35508/jicon.v9i2.5149 ◽

2021 ◽

Vol 9 (2) ◽

pp. 168-175

Author(s):

Sebastianus A S Mola ◽

Meiton Boru ◽

Emerensye Sofia Yublina Pandie

Keyword(s):

Edit Distance ◽

String Matching ◽

Approximate String Matching ◽

Standard Word

Komunikasi tertulis dalam media sosial yang menekankan pada kecepatan penyebaran informasi sering kali terjadi fenomena penggunaan bahasa yang tidak baku baik pada level kalimat, klausa, frasa maupun kata. Sebagai sebuah sumber data, media sosial dengan fenomena ini memberikan tantangan dalam proses ekstraksi informasi. Normalisasi bahasa yang tidak baku menjadi bahasa baku dimulai pada proses normalisasi kata di mana kata yang tidak baku (non-standard word (NSW)) dinormalisasikan ke bentuk baku (standard word (SW)). Proses normalisasi dengan menggunakan edit distance memiliki keterbatasan dalam proses pembobotan nilai mismatch, match, dan gap yang bersifat statis. Dalam perhitungan nilai mismatch, pembobotan statida tidak dapat memberikan pembedaan bobot akibat kesalahan penekanan tombol pada keyboard terutama tombol yang berdekatan. Karena keterbatasan pembobotan edit distance ini maka dalam penelitian ini diusulkan sebuah metode pembobotan dinamis untuk bobot mismatch. Hasil dari penelitian ini adalah adanya metode baru dalam pembobotan dinamis berbasis posisi tombol keyboard yang dapat digunakan dalam melakukan normalisasi NSW menggunakan metode approximate string matching.

edit distance
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

GraphChainer: Co-linear Chaining for Accurate Alignment of Long Reads to Variation Graphs

Detection and Automatic Correction of Bengali Misspelled Words Using Minimum Edit Distance

An Improved Algorithm for The k-Dyck Edit Distance Problem

Bargaining power in informal trilogues: Intra-institutional preference cohesion and inter-institutional bargaining success

Ligand-Based Virtual Screening Based on the Graph Edit Distance

An Edge-Based Approach for Virtual Network Embedding Based on the Graph Edit Distance

Indonesian Spelling Error Detection and Type Identification Using Bigram Vector and Minimum Edit Distance Based Probabilities

Fast and Optimal Sequence-to-Graph Alignment Guided by Seeds

Interest of the minimum edit distance to detect behaviour change of the elderly person

PEMBOBOTAN DINAMIS BERBASIS POSISI PADA APPROXIMATE STRING MATCHING

Export Citation Format

edit distanceRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

GraphChainer: Co-linear Chaining for Accurate Alignment of Long Reads to Variation Graphs

Detection and Automatic Correction of Bengali Misspelled Words Using Minimum Edit Distance

An Improved Algorithm for The k-Dyck Edit Distance Problem

Bargaining power in informal trilogues: Intra-institutional preference cohesion and inter-institutional bargaining success

Ligand-Based Virtual Screening Based on the Graph Edit Distance

An Edge-Based Approach for Virtual Network Embedding Based on the Graph Edit Distance

Indonesian Spelling Error Detection and Type Identification Using Bigram Vector and Minimum Edit Distance Based Probabilities

Fast and Optimal Sequence-to-Graph Alignment Guided by Seeds

Interest of the minimum edit distance to detect behaviour change of the elderly person

PEMBOBOTAN DINAMIS BERBASIS POSISI PADA APPROXIMATE STRING MATCHING

edit distance
Recently Published Documents