string kernel
Recently Published Documents


TOTAL DOCUMENTS

47
(FIVE YEARS 3)

H-INDEX

9
(FIVE YEARS 1)

2019 ◽  
Vol 16 (5) ◽  
pp. 1524-1536 ◽  
Author(s):  
Ritambhara Singh ◽  
Jack Lanchantin ◽  
Gabriel Robins ◽  
Yanjun Qi

Author(s):  
Lingfei Wu ◽  
Ian En-Hsu Yen ◽  
Siyu Huo ◽  
Liang Zhao ◽  
Kun Xu ◽  
...  
Keyword(s):  

2018 ◽  
Author(s):  
Ritambhara Singh ◽  
Arshdeep Sekhon ◽  
Jack Lanchantin ◽  
Kamran Kowsari ◽  
Beilun Wang ◽  
...  

AbstractString Kernel (SK) techniques, especially those using gapped k-mers as features (gk), have obtained great success in classifying sequences like DNA, protein, and text. However, the state-of-the-art gk-SK runs extremely slow when we increase the dictionary size (Σ) or allow more mismatches (M). This is because current gk-SK uses a trie-based algorithm to calculate co-occurrence of mismatched substrings resulting in a time cost proportional to O(ΣM). We propose a fast algorithm for calculating Gapped k-mer Kernel using Counting (GaKCo). GaKCo uses associative arrays to calculate the co-occurrence of substrings using cumulative counting. This algorithm is fast, scalable to larger Σ and M, and naturally parallelizable. We provide a rigorous asymptotic analysis that compares GaKCo with the state-of-the-art gk-SK. Theoretically, the time cost of GaKCo is independent of the ΣM term that slows down the trie-based approach. Experimentally, we observe that GaKCo achieves the same accuracy as the state-of-the-art and outperforms its speed by factors of 2, 100, and 4, on classifying sequences of DNA (5 datasets), protein (12 datasets), and character-based English text (2 datasets). 1


Author(s):  
Ritambhara Singh ◽  
Arshdeep Sekhon ◽  
Kamran Kowsari ◽  
Jack Lanchantin ◽  
Beilun Wang ◽  
...  
Keyword(s):  

Author(s):  
Venkata Joopudi ◽  
Akansha Singh ◽  
Keerthana Kumar ◽  
Anirudh Murali ◽  
Priya Gandhi ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document