scholarly journals Peer Review #2 of "Denoising the Denoisers: an independent evaluation of microbiome sequence error-correction approaches (v0.2)"

2015 ◽  
Author(s):  
Soumitra Pal ◽  
Sanguthevar Rajasekaran

Motif search is an important step in extracting meaningful patterns from biological data. Since the general problem of motif search is intractable, there is a pressing need to develop efficient exact and approximation algorithms to solve this problem. We design novel algorithms for solving the Edit-distance-based Motif Search (EMS) problem: given two integers l,d and n biological strings, find all strings of length l that appear in each input strings with at most d substitutions, insertions and deletions. These algorithms have been evaluated on several challenging instances. Our algorithm solves a moderately hard instance (11,3) in a couple of minutes and the next difficult instance (14,3) in a couple of hours whereas the best previously known algorithm, EMS1, solves (11,3) in a few hours and does not solve (13,4) even after 3 days. This significant improvement is due to a novel and provably efficient neighborhood generation technique introduced in this paper. This efficient approach can be used in other edit distance based applications in Bioinformatics, such as k-spectrum based sequence error correction algorithms. We also use a trie based data structure to efficiently store the candidate motifs in the neighbourhood and to output the motifs in a sorted order.


Sign in / Sign up

Export Citation Format

Share Document