KATK: fast genotyping of rare variants directly from unmapped sequencing reads
AbstractMotivationKATK is a fast and accurate software tool for calling variants directly from raw NGS reads. It uses predefined k-mers to retrieve only the reads of interest from the FASTQ file and calls genotypes by aligning retrieved reads locally. KATK does not use data about known polymorphisms and has NC (No Call) as default genotype. The reference or variant allele is called only if there is sufficient evidence for their presence in data. Thus it is not biased against rare variants or de novo mutations.ResultsWith simulated datasets, we achieved a false negative rate of 0.23% (sensitivity 99.77%) and a false discovery rate of 0.19%. Calling all human exonic regions with KATK requires 1-2 h, depending on sequencing coverage.AvailabilityKATK is distributed under the terms of GNU GPL v3. The k-mer databases are distributed under the Creative Commons CC BY-NC-SA license. The source code is available at GitHub as part of Genometester4 package (https://github.com/bioinfo-ut/GenomeTester4/). The binaries of KATK package and k-mer databases described in the current paper are available on http://bioinfo.ut.ee/KATK/.