Error bound of hypothesis testing with data compression

International audience We show that data compression methods (or universal codes) can be applied for hypotheses testing in a framework of classical mathematical statistics. Namely, we describe tests, which are based on data compression methods, for the three following problems: i) identity testing, ii) testing for independence and iii) testing of serial independence for time series. Applying our method of identity testing to pseudorandom number generators, we obtained experimental results which show that the suggested tests are quite efficient.

Download Full-text

Interpretations of Directed Information in Portfolio Theory, Data Compression, and Hypothesis Testing

IEEE Transactions on Information Theory ◽

10.1109/tit.2011.2136270 ◽

2011 ◽

Vol 57 (6) ◽

pp. 3248-3259 ◽

Cited By ~ 54

Author(s):

Haim H. Permuter ◽

Young-Han Kim ◽

Tsachy Weissman

Keyword(s):

Hypothesis Testing ◽

Data Compression ◽

Portfolio Theory ◽

Directed Information

Download Full-text

An Efficient Transformation Scheme for Lossy Data Compression with Point-Wise Relative Error Bound

2018 IEEE International Conference on Cluster Computing (CLUSTER) ◽

10.1109/cluster.2018.00036 ◽

2018 ◽

Cited By ~ 11

Author(s):

Xin Liang ◽

Sheng Di ◽

Dingwen Tao ◽

Zizhong Chen ◽

Franck Cappello

Keyword(s):

Data Compression ◽

Error Bound ◽

Relative Error ◽

Efficient Transformation

Download Full-text

The Case for Error-Bounded Lossy Floating-Point Data Compression on Interconnection Networks

10.5121/csit.2021.110706 ◽

2021 ◽

Author(s):

Yao Hu ◽

Michihiro Koibuchi

Keyword(s):

Data Compression ◽

Error Bound ◽

Compression Ratio ◽

Interconnection Networks ◽

Interconnection Network ◽

Lossy Compression ◽

Floating Point ◽

Compression Technique ◽

Ping Pong ◽

Effective Network

Data compression virtually increases the effective network bandwidth on an interconnection network of parallel computers. Although a floating-point dataset is frequently exchanged between compute nodes in parallel applications, its compression ratio often becomes low when using simple lossless compression algorithms. In this study, we aggressively introduce a lossy compression algorithm for floating-point values on interconnection networks. We take an application-level compression for providing high portability: a source process compresses communication datasets at an MPI parallel program, and a destination process decompresses them. Since recent interconnection networks are latency-sensitive, sophisticated lossy compression techniques that introduce large compression overhead are not suitable for compressing communication data. In this context, we apply a linear predictor with the userdefined error bound to the compression of communication datasets. We design, implement, and evaluate the compression technique for the floating-point communication datasets generated in MPI parallel programs, i.e., Ping Pong, Himeno, K-means Clustering, and Fast Fourier Transform (FFT). The proposed compression technique achieves 2.4x, 6.6x, 4.3x and 2.7x compression ratio for Ping Pong, Himeno, K-means and FFT at the cost of the moderate decrease of quality of results (error bound is 10-4 ), thus achieving 2.1x, 1.7x, 2.0x and 2.4x speedup of the execution time, respectively. More generally, our cycle-accurate network simulation shows that a high compression ratio provides comparably low communication latency, and significantly improves effective network throughput on typical synthetic traffic patterns when compared to no data compression on a conventional interconnection network.

Download Full-text