edit distance
Recently Published Documents


TOTAL DOCUMENTS

886
(FIVE YEARS 207)

H-INDEX

38
(FIVE YEARS 6)

2022 ◽  
Author(s):  
Jun Ma ◽  
Manuel Cáceres ◽  
Leena Salmela ◽  
Veli Mäkinen ◽  
Alexandru I. Tomescu

Aligning reads to a variation graph is a standard task in pangenomics, with downstream applications in e.g., improving variant calling. While the vg toolkit (Garrison et al., Nature Biotechnology, 2018) is a popular aligner of short reads, GraphAligner (Rautiainen and Marschall, Genome Biology, 2020) is the state-of-the-art aligner of long reads. GraphAligner works by finding candidate read occurrences based on individually extending the best seeds of the read in the variation graph. However, a more principled approach recognized in the community is to co-linearly chain multiple seeds. We present a new algorithm to co-linearly chain a set of seeds in an acyclic variation graph, together with the first efficient implementation of such a co-linear chaining algorithm into a new aligner of long reads to variation graphs, GraphChainer. Compared to GraphAligner, at a normalized edit distance threshold of 40%, it aligns 9% to 12% more reads, and 15% to 19% more total read length, on real PacBio reads from human chromosomes 1 and 22. On both simulated and real data, GraphChainer aligns between 97% and 99% of all reads, and of total read length. At the more stringent normalized edit distance threshold of 30%, GraphChainer aligns up to 29% more total real read length than GraphAligner. GraphChainer is freely available at https://github.com/algbio/GraphChainer


2022 ◽  
pp. 3650-3669
Author(s):  
Dvir Fried ◽  
Shay Golan ◽  
Tomasz Kociumaka ◽  
Tsvi Kopelowitz ◽  
Ely Porat ◽  
...  

2021 ◽  
pp. 146511652110644
Author(s):  
Maximilian Haag

Informal trilogue meetings are the main legislative bargaining forum in the European Union, yet their dynamics remain largely understudied in a quantitative context. This article builds on the assumption that the negotiating delegations of the European Parliament and the Council play a two-level game whereby these actors can use their intra-institutional constraint to extract inter-institutional bargaining success. Negotiators can credibly claim that their hands are tied if the members of their parent institutions hold similar preferences and do not accept alternative proposals or if their institution is divided and negotiators need to defend a fragile compromise. Employing a measure of document similarity (minimum edit distance) between an institution's negotiation mandate and the trilogue outcome to measure bargaining success, the analysis supports the hypothesis for the European Parliament, but not for the Council.


2021 ◽  
Vol 22 (23) ◽  
pp. 12751
Author(s):  
Elena Rica ◽  
Susana Álvarez ◽  
Francesc Serratosa

Chemical compounds can be represented as attributed graphs. An attributed graph is a mathematical model of an object composed of two types of representations: nodes and edges. Nodes are individual components, and edges are relations between these components. In this case, pharmacophore-type node descriptions are represented by nodes and chemical bounds by edges. If we want to obtain the bioactivity dissimilarity between two chemical compounds, a distance between attributed graphs can be used. The Graph Edit Distance allows computing this distance, and it is defined as the cost of transforming one graph into another. Nevertheless, to define this dissimilarity, the transformation cost must be properly tuned. The aim of this paper is to analyse the structural-based screening methods to verify the quality of the Harper transformation costs proposal and to present an algorithm to learn these transformation costs such that the bioactivity dissimilarity is properly defined in a ligand-based virtual screening application. The goodness of the dissimilarity is represented by the classification accuracy. Six publicly available datasets—CAPST, DUD-E, GLL&GDD, NRLiSt-BDB, MUV and ULS-UDS—have been used to validate our methodology and show that with our learned costs, we obtain the highest ratios in identifying the bioactivity similarity in a structurally diverse group of molecules.


2021 ◽  
Author(s):  
Ze Xi Xu ◽  
Lei Zhuang ◽  
Meng Yang He ◽  
Si Jin Yang ◽  
Yu Song ◽  
...  

Abstract Virtualization and resource isolation techniques have enabled the efficient sharing of networked resources. How to control network resource allocation accurately and flexibly has gradually become a research hotspot due to the growth in user demands. Therefore, this paper presents a new edge-based virtual network embedding approach to studying this problem that employs a graph edit distance method to accurately control resource usage. In particular, to manage network resources efficiently, we restrict the use conditions of network resources and restrict the structure based on common substructure isomorphism and an improved spider monkey optimization algorithm is employed to prune redundant information from the substrate network. Experimental results showed that the proposed method achieves better performance than existing algorithms in terms of resource management capacity, including energy savings and the revenue-cost ratio.


SinkrOn ◽  
2021 ◽  
Vol 6 (1) ◽  
pp. 183-190
Author(s):  
Emmy Erwina ◽  
Tommy Tommy ◽  
Mayasari Mayasari

Spelling error has become an error that is often found in this era which can be seen from the use of words that tend to follow trends or culture, especially in the younger generation. This study aims to develop and test a detection and identification model using a combination of Bigram Vector and Minimum Edit Distance Based Probabilities. Correct words from error words are obtained using candidates search and probability calculations that adopt the concept of minimum edit distance. The detection results then identified the error type into three types of errors, namely vowels, consonants and diphthongs from the error side on the tendency of the characters used as a result of phonemic rendering at the time of writing. The results of error detection and identification of error types obtained are quite good where most of the error test data can be detected and identified according to the type of error, although there are several detection errors by obtaining more than one correct word as a result of the same probability value of these words.


2021 ◽  
Author(s):  
Pesho Ivanov ◽  
Benjamin Bichsel ◽  
Martin Vechev

We present a novel A* seed heuristic enabling fast and optimal sequence-to-graph alignment, guaranteed to minimize the edit distance of the alignment assuming non-negative edit costs. We phrase optimal alignment as a shortest path problem and solve it by instantiating the A* algorithm with our novel seed heuristic. The key idea of the seed heuristic is to extract seeds from the read, locate them in the reference, mark preceding reference positions by crumbs, and use the crumbs to direct the A* search. We prove admissibility of the seed heuristic, thus guaranteeing alignment optimality. Our implementation extends the free and open source AStarix aligner and demonstrates that the seed heuristic outperforms all state-of-the-art optimal aligners including GraphAligner, Vargas, PaSGAL, and the prefix heuristic previously employed by AStarix. Specifically, we achieve a consistent speedup of >60x on both short Illumina reads and long HiFi reads (up to 25kbp), on both the E. coli linear reference genome (1Mbp) and the MHC variant graph (5Mbp). Our speedup is enabled by the seed heuristic consistently skipping >99.99% of the table cells that optimal aligners based on dynamic programming compute.


2021 ◽  
Vol 9 (2) ◽  
pp. 168-175
Author(s):  
Sebastianus A S Mola ◽  
Meiton Boru ◽  
Emerensye Sofia Yublina Pandie

Komunikasi tertulis dalam media sosial yang menekankan pada kecepatan penyebaran informasi sering kali terjadi fenomena penggunaan bahasa yang tidak baku baik pada level kalimat, klausa, frasa maupun kata. Sebagai sebuah sumber data, media sosial dengan fenomena ini memberikan tantangan dalam proses ekstraksi informasi. Normalisasi bahasa yang tidak baku menjadi bahasa baku dimulai pada proses normalisasi kata di mana kata yang tidak baku (non-standard word (NSW)) dinormalisasikan ke bentuk baku (standard word (SW)). Proses normalisasi dengan menggunakan edit distance memiliki keterbatasan dalam proses pembobotan nilai mismatch, match, dan gap yang bersifat statis. Dalam perhitungan nilai mismatch, pembobotan statida tidak dapat memberikan pembedaan bobot akibat kesalahan penekanan tombol pada keyboard terutama tombol yang berdekatan. Karena keterbatasan pembobotan edit distance ini maka dalam penelitian ini diusulkan sebuah metode pembobotan dinamis untuk bobot mismatch. Hasil dari penelitian ini adalah adanya metode baru dalam pembobotan dinamis berbasis posisi tombol keyboard yang dapat digunakan dalam melakukan normalisasi NSW menggunakan metode approximate string matching.


Sign in / Sign up

Export Citation Format

Share Document