lossless and lossy compression
Recently Published Documents


TOTAL DOCUMENTS

21
(FIVE YEARS 3)

H-INDEX

5
(FIVE YEARS 0)

2022 ◽  
Author(s):  
Sagnik Banerjee ◽  
Carson Andorf

Advancement in technology has enabled sequencing machines to produce vast amounts of genetic data, causing an increase in storage demands. Most genomic software utilizes read alignments for several purposes including transcriptome assembly and gene count estimation. Herein we present, ABRIDGE, a state-of-the-art compressor for SAM alignment files offering users both lossless and lossy compression options. This reference-based file compressor achieves the best compression ratio among all compression software ensuring lower space demand and faster file transmission. Central to the software is a novel algorithm that retains non-redundant information. This new approach has allowed ABRIDGE to achieve a compression 16% higher than the second-best compressor for RNA-Seq reads and over 35% for DNA-Seq reads. ABRIDGE also offers users the option to randomly access location without having to decompress the entire file. ABRIDGE is distributed under MIT license and can be obtained from GitHub and docker hub. We anticipate that the user community will adopt ABRIDGE within their existing pipeline encouraging further research in this domain.


2021 ◽  
Vol 13 (18) ◽  
pp. 3727
Author(s):  
Benoit Vozel ◽  
Vladimir Lukin ◽  
Joan Serra-Sagristà

A huge amount of remote sensing data is acquired each day, which is transferred to image processing centers and/or to customers. Due to different limitations, compression has to be applied on-board and/or on-the-ground. This Special Issue collects 15 papers dealing with remote sensing data compression, introducing solutions for both lossless and lossy compression, analyzing the impact of compression on different processes, investigating the suitability of neural networks for compression, and researching on low complexity hardware and software approaches to deliver competitive coding performance.


2017 ◽  
Vol 34 (6) ◽  
pp. 911-919 ◽  
Author(s):  
Vida Ravanmehr ◽  
Minji Kim ◽  
Zhiying Wang ◽  
Olgica Milenković

2017 ◽  
Author(s):  
Vida Ravanmehr ◽  
Minji Kim ◽  
Zhiying Wang ◽  
Olgica Milenković

AbstractMotivationThe past decade has witnessed a rapid development of data acquisition technologies that enable integrative genomic and proteomic analysis. One such technology is chromatin immunoprecipitation sequencing (ChIP-seq), developed for analyzing interactions between proteins and DNA via next-generation sequencing technologies. As ChIP-seq experiments are inexpensive and time-efficient, massive datasets from this domain have been acquired, introducing significant storage and maintenance challenges. To address the resulting Big Data problems, we propose a state-of-the-art lossless and lossy compression framework specifically designed for ChIP-seq Wig data, termed ChIPWig. Wig is a standard file format, which in this setting contains relevant read density information crucial for visualization and downstream processing. ChIPWig may be executed in two different modes: lossless and lossy. Lossless ChIPWig compression allows for random access and fast queries in the file through careful variable-length block-wise encoding. ChIPWig also stores the summary statistics of each block needed for guided access. Lossy ChIPWig, in contrast, performs quantization of the read density values before feeding them into the lossless ChIPWig compressor. Nonuniform lossy quantization leads to further reductions in the file size, while maintaining the same accuracy of the ChIP-seq peak calling and motif discovery pipeline based on the NarrowPeaks method tailor-made for Wig files. The compressors are designed using new statistical modeling approaches coupled with delta and arithmetic encoding.ResultsWe tested the ChIPWig compressor on a number of ChIP-seq datasets generated by the ENCODE project. Lossless ChIPWig reduces the file sizes to merely 6% of the original, and offers an average 6-fold compression rate improvement compared to bigWig. The running times for compression and decompression are comparable to those of bigWig. The compression and decompression speed rates are of the order of 0.2 MB/sec using general purpose computers. ChIPWig with random access only slightly degrades the performance and running time when compared to the standard mode. In the lossy mode, the average file sizes reduce by 2-fold compared to the lossless mode. Most importantly, near-optimal nonuniform quantization with respect to mean-square distortion does not affect peak calling and motif discovery results on the data tested.Availability and ImplementationSource code and binaries freely available for download at https://github.com/vidarmehr/[email protected] informationIs available on bioRxiv.


Sign in / Sign up

Export Citation Format

Share Document