Error bound of hypothesis testing with data compression

Author(s):  
H. Shimokawa ◽  
Te Sun Han ◽  
S. Amari
2005 ◽  
Vol DMTCS Proceedings vol. AD,... (Proceedings) ◽  
Author(s):  
Boris Ryabko ◽  
Jaakko Astola

International audience We show that data compression methods (or universal codes) can be applied for hypotheses testing in a framework of classical mathematical statistics. Namely, we describe tests, which are based on data compression methods, for the three following problems: i) identity testing, ii) testing for independence and iii) testing of serial independence for time series. Applying our method of identity testing to pseudorandom number generators, we obtained experimental results which show that the suggested tests are quite efficient.


2021 ◽  
Author(s):  
Yao Hu ◽  
Michihiro Koibuchi

Data compression virtually increases the effective network bandwidth on an interconnection network of parallel computers. Although a floating-point dataset is frequently exchanged between compute nodes in parallel applications, its compression ratio often becomes low when using simple lossless compression algorithms. In this study, we aggressively introduce a lossy compression algorithm for floating-point values on interconnection networks. We take an application-level compression for providing high portability: a source process compresses communication datasets at an MPI parallel program, and a destination process decompresses them. Since recent interconnection networks are latency-sensitive, sophisticated lossy compression techniques that introduce large compression overhead are not suitable for compressing communication data. In this context, we apply a linear predictor with the userdefined error bound to the compression of communication datasets. We design, implement, and evaluate the compression technique for the floating-point communication datasets generated in MPI parallel programs, i.e., Ping Pong, Himeno, K-means Clustering, and Fast Fourier Transform (FFT). The proposed compression technique achieves 2.4x, 6.6x, 4.3x and 2.7x compression ratio for Ping Pong, Himeno, K-means and FFT at the cost of the moderate decrease of quality of results (error bound is 10-4 ), thus achieving 2.1x, 1.7x, 2.0x and 2.4x speedup of the execution time, respectively. More generally, our cycle-accurate network simulation shows that a high compression ratio provides comparably low communication latency, and significantly improves effective network throughput on typical synthetic traffic patterns when compared to no data compression on a conventional interconnection network.


2010 ◽  
Vol 21 (6) ◽  
pp. 1364-1377 ◽  
Author(s):  
Jian-Ming ZHANG ◽  
Ya-Ping LIN ◽  
Si-Wang ZHOU ◽  
Jing-Cheng OUYANG

Sign in / Sign up

Export Citation Format

Share Document