An efficient and scalable algorithm for policy compatibility in service virtualization

Abstract Motivation Gapped k-mer kernels with support vector machines (gkm-SVMs) have achieved strong predictive performance on regulatory DNA sequences on modestly sized training sets. However, existing gkm-SVM algorithms suffer from slow kernel computation time, as they depend exponentially on the sub-sequence feature length, number of mismatch positions, and the task’s alphabet size. Results In this work, we introduce a fast and scalable algorithm for calculating gapped k-mer string kernels. Our method, named FastSK, uses a simplified kernel formulation that decomposes the kernel calculation into a set of independent counting operations over the possible mismatch positions. This simplified decomposition allows us to devise a fast Monte Carlo approximation that rapidly converges. FastSK can scale to much greater feature lengths, allows us to consider more mismatches, and is performant on a variety of sequence analysis tasks. On multiple DNA transcription factor binding site prediction datasets, FastSK consistently matches or outperforms the state-of-the-art gkmSVM-2.0 algorithms in area under the ROC curve, while achieving average speedups in kernel computation of ∼100× and speedups of ∼800× for large feature lengths. We further show that FastSK outperforms character-level recurrent and convolutional neural networks while achieving low variance. We then extend FastSK to 7 English-language medical named entity recognition datasets and 10 protein remote homology detection datasets. FastSK consistently matches or outperforms these baselines. Availability and implementation Our algorithm is available as a Python package and as C++ source code at https://github.com/QData/FastSK Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Scalable algorithm for solving 3D contact problems with friction

PAMM ◽

10.1002/pamm.200700664 ◽

2007 ◽

Vol 7 (1) ◽

pp. 1025201-1025202

Author(s):

Radek KucÌŒera ◽

Jaroslav Haslinger ◽

ZdeneÌŒk DostÃ¡l

Keyword(s):

Contact Problems ◽

Scalable Algorithm

Download Full-text

A fast scalable algorithm for discontinuous optical flow estimation

IEEE Transactions on Pattern Analysis and Machine Intelligence ◽

10.1109/34.481542 ◽

1996 ◽

Vol 18 (2) ◽

pp. 181-194 ◽

Cited By ~ 40

Author(s):

S. Ghosal ◽

P. Vanek

Keyword(s):

Optical Flow ◽

Scalable Algorithm ◽

Flow Estimation ◽

Optical Flow Estimation

Download Full-text

A scalable algorithm for the optimization of neural network architectures

Parallel Computing ◽

10.1016/j.parco.2021.102788 ◽

2021 ◽

pp. 102788

Author(s):

Massimiliano Lupo Pasini ◽

Junqi Yin ◽

Ying Wai Li ◽

Markus Eisenbach

Keyword(s):

Neural Network ◽

Network Architectures ◽

Scalable Algorithm ◽

Neural Network Architectures

Download Full-text

A Scalable Algorithm for Constructing Frequent Pattern Tree

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2014010103 ◽

2014 ◽

Vol 10 (1) ◽

pp. 42-56 ◽

Cited By ~ 3

Author(s):

Zailani Abdullah ◽

Tutut Herawan ◽

A. Noraziah ◽

Mustafa Mat Deris

Keyword(s):

Data Structure ◽

Frequent Pattern ◽

Frequent Patterns ◽

Scalable Algorithm ◽

Tree Construction ◽

Frequent Pattern Tree ◽

Support Threshold ◽

Benchmark Datasets ◽

Tree Data ◽

Tree Data Structure

Frequent Pattern Tree (FP-Tree) is a compact data structure of representing frequent itemsets. The construction of FP-Tree is very important prior to frequent patterns mining. However, there have been too limited efforts specifically focused on constructing FP-Tree data structure beyond from its original database. In typical FP-Tree construction, besides the prior knowledge on support threshold, it also requires two database scans; first to build and sort the frequent patterns and second to build its prefix paths. Thus, twice database scanning is a key and major limitation in completing the construction of FP-Tree. Therefore, this paper suggests scalable Trie Transformation Technique Algorithm (T3A) to convert our predefined tree data structure, Disorder Support Trie Itemset (DOSTrieIT) into FP-Tree. Experiment results through two UCI benchmark datasets show that the proposed T3A generates FP-Tree up to 3 magnitudes faster than that the benchmarked FP-Growth.

Download Full-text