scholarly journals RBCeq: A robust and scalable algorithm for accurate genetic blood typing

EBioMedicine ◽  
2022 ◽  
Vol 76 ◽  
pp. 103759
Author(s):  
Sudhir Jadhao ◽  
Candice L. Davison ◽  
Eileen V. Roulis ◽  
Elizna M. Schoeman ◽  
Mayur Divate ◽  
...  
2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i857-i865
Author(s):  
Derrick Blakely ◽  
Eamon Collins ◽  
Ritambhara Singh ◽  
Andrew Norton ◽  
Jack Lanchantin ◽  
...  

Abstract Motivation Gapped k-mer kernels with support vector machines (gkm-SVMs) have achieved strong predictive performance on regulatory DNA sequences on modestly sized training sets. However, existing gkm-SVM algorithms suffer from slow kernel computation time, as they depend exponentially on the sub-sequence feature length, number of mismatch positions, and the task’s alphabet size. Results In this work, we introduce a fast and scalable algorithm for calculating gapped k-mer string kernels. Our method, named FastSK, uses a simplified kernel formulation that decomposes the kernel calculation into a set of independent counting operations over the possible mismatch positions. This simplified decomposition allows us to devise a fast Monte Carlo approximation that rapidly converges. FastSK can scale to much greater feature lengths, allows us to consider more mismatches, and is performant on a variety of sequence analysis tasks. On multiple DNA transcription factor binding site prediction datasets, FastSK consistently matches or outperforms the state-of-the-art gkmSVM-2.0 algorithms in area under the ROC curve, while achieving average speedups in kernel computation of ∼100× and speedups of ∼800× for large feature lengths. We further show that FastSK outperforms character-level recurrent and convolutional neural networks while achieving low variance. We then extend FastSK to 7 English-language medical named entity recognition datasets and 10 protein remote homology detection datasets. FastSK consistently matches or outperforms these baselines. Availability and implementation Our algorithm is available as a Python package and as C++ source code at https://github.com/QData/FastSK Supplementary information Supplementary data are available at Bioinformatics online.


PAMM ◽  
2007 ◽  
Vol 7 (1) ◽  
pp. 1025201-1025202
Author(s):  
Radek KucÌŒera ◽  
Jaroslav Haslinger ◽  
Zdeněk Dostál

2021 ◽  
pp. 102788
Author(s):  
Massimiliano Lupo Pasini ◽  
Junqi Yin ◽  
Ying Wai Li ◽  
Markus Eisenbach

2016 ◽  
Vol 7 ◽  
pp. 121-126 ◽  
Author(s):  
Hiroki Ashiba ◽  
Makoto Fujimaki ◽  
Koichi Awazu ◽  
Torahiko Tanaka ◽  
Makoto Makishima

Science ◽  
1975 ◽  
Vol 190 (4218) ◽  
pp. 938-938
Author(s):  
John P. Gusdon
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document