ScaleQC: A Scalable Lossy to Lossless Solution for NGS Sequencing Data Compression
AbstractMotivationPer-base quality values in NGS sequencing data take a significant portion of storage even after compression. Lossy compression technologies could further reduce the space used by quality values. However, in many applications lossless compression is still desired. Hence, sequencing data in multiple file formats have to be prepared for different applications.ResultsWe developed a scalable lossy to lossless compression solution for quality values named ScaleQC. ScaleQC is able to provide bit-stream level scalability. More specifically, the losslessly compressed bit-stream by ScaleQC can be further truncated to lower data rates without re-encoding. Despite its scalability, ScaleQC still achieves same or better compression performance at both lossless and lossy data rates compared to the state-of-the-art lossless or lossy compressors.AvailabilityScaleQC has been integrated with SAMtools as a special quality value encoding mode for CRAM. Its source codes can be obtained from our integrated SAMtools (https://github.com/xmuyulab/samtools) with dependency on integrated HTSlib (https://github.com/xmuyulab/htslib).