normalized compression distance Latest Research Papers

Program obfuscation is a widely used cryptographic software intellectual property (IP) protection technique against reverse engineering attacks in embedded systems. However, very few works have studied the impact of combining various obfuscation techniques on the obscurity (difficulty of reverse engineering) and performance (execution time) of obfuscated programs. In this article, we propose a Genetic Algorithm (GA)-based framework that not only optimizes obscurity and performance of obfuscated cryptographic programs, but it also ensures very low timing side-channel leakage. Our proposed T iming S ide C hannel S ensitive P rogram O bfuscation O ptimization F ramework (TSC-SPOOF) determines the combination of obfuscation transformation functions that produce optimized obfuscated programs with preferred optimization parameters. In particular, TSC-SPOOF employs normalized compression distance (NCD) and channel capacity to measure obscurity and timing side-channel leakage, respectively. We also use RISC-V rocket core running on a Xilinx Zynq FPGA device as part of our framework to obtain realistic results. The experimental results clearly show that our proposed solution leads to cryptographic programs with lower execution time, higher obscurity, and lower timing side-channel leakage than unguided obfuscation.

Download Full-text

Using Kolmogorov Complexity to Study the Coevolution of Header Files and Source Files of C-alike Programs

Research Anthology on Recent Trends, Tools, and Implications of Computer Programming ◽

10.4018/978-1-7998-3016-0.ch036 ◽

2021 ◽

pp. 814-824

Author(s):

Liguo Yu

Keyword(s):

Kolmogorov Complexity ◽

Software Evolution ◽

Source Code ◽

Web Server ◽

Evolution Process ◽

Changing Environment ◽

Source File ◽

Normalized Compression Distance ◽

Mantel Tests ◽

Software Product

In C-alike programs, the source code is separated into header files and source files. During the software evolution process, both these two kinds of files need to adapt to changing requirement and changing environment. This paper studies the coevolution of header files and source files of C-alike programs. Using normalized compression distance that is derived from Kolmogorov complexity, we measure the header file difference and source file difference between versions of an evolving software product. Header files distance and source files distance are compared to understand their difference in pace of evolution. Mantel tests are performed to investigate the correlation of header file evolution and source file evolution. The study is performed on the source code of Apache HTTP web server.

Download Full-text

Fast Whole-Genome Phylogeny of the COVID-19 Virus SARS-CoV-2 by Compression

10.1101/2020.07.22.216242 ◽

2020 ◽

Author(s):

Rudi L. Cilibrasi ◽

Paul M.B. Vitányi

Keyword(s):

Kolmogorov Complexity ◽

Whole Genome ◽

Compression Method ◽

Alignment Free ◽

Normalized Compression Distance ◽

Genome Phylogeny ◽

Ideal Tool ◽

The Ideal

AbstractWe analyze the whole genome phylogeny and taxonomy of the SARS-CoV-2 virus using compression. This is a new fast alignment-free method called the “normalized compression distance” (NCD) method. It discovers all effective similarities based on Kolmogorov complexity. The latter being incomputable we approximate it by a good compressor such as the modern zpaq. The results comprise that the SARS-CoV-2 virus is closest to the RaTG13 virus and similar to two bat SARS-like coronaviruses bat-SL-CoVZXC21 and bat-SL-CoVZC4. The similarity is quantified and compared with the same quantified similarities among the mtDNA of certain species. We treat the question whether Pangolins are involved in the SARS-CoV-2 virus. The compression method is simpler and possibly faster than any other whole genome method, which makes it the ideal tool to explore phylogeny.

Download Full-text

Detecting Malware with Information Complexity

Entropy ◽

10.3390/e22050575 ◽

2020 ◽

Vol 22 (5) ◽

pp. 575 ◽

Cited By ~ 1

Author(s):

Nadia Alshahwan ◽

Earl T. Barr ◽

David Clark ◽

George Danezis ◽

Héctor D. Menéndez

Keyword(s):

False Positive ◽

False Positive Rate ◽

Time Frame ◽

Arms Race ◽

Computation Cost ◽

Approximation Techniques ◽

Normalized Compression Distance ◽

Malware Propagation ◽

Shared Information ◽

Positive Rate

Malware concealment is the predominant strategy for malware propagation. Black hats create variants of malware based on polymorphism and metamorphism. Malware variants, by definition, share some information. Although the concealment strategy alters this information, there are still patterns on the software. Given a zoo of labelled malware and benign-ware, we ask whether a suspect program is more similar to our malware or to our benign-ware. Normalized Compression Distance (NCD) is a generic metric that measures the shared information content of two strings. This measure opens a new front in the malware arms race, one where the countermeasures promise to be more costly for malware writers, who must now obfuscate patterns as strings qua strings, without reference to execution, in their variants. Our approach classifies disk-resident malware with 97.4% accuracy and a false positive rate of 3%. We demonstrate that its accuracy can be improved by combining NCD with the compressibility rates of executables using decision forests, paving the way for future improvements. We demonstrate that malware reported within a narrow time frame of a few days is more homogeneous than malware reported over two years, but that our method still classifies the latter with 95.2% accuracy and a 5% false positive rate. Due to its use of compression, the time and computation cost of our method is nontrivial. We show that simple approximation techniques can improve its running time by up to 63%. We compare our results to the results of applying the 59 anti-malware programs used on the VirusTotal website to our malware. Our approach outperforms each one used alone and matches that of all of them used collectively.

Download Full-text

New Prospects for Information Theory in Arts Research

Leonardo ◽

10.1162/leon_a_01860 ◽

2020 ◽

Vol 53 (3) ◽

pp. 274-280

Author(s):

Alan Marsden

Keyword(s):

Information Theory ◽

Twentieth Century ◽

Similarity Measure ◽

Large Collection ◽

Computing Systems ◽

Information Theoretic ◽

Normalized Compression Distance ◽

Art Forms ◽

Potential Success

Information Theory provoked the interest of arts researchers from its inception in the mid-twentieth century but failed to produce the expected impact, partly because the data and computing systems required were not available. With the modern availability of data from public collections and sophisticated software, there is renewed interest in Information Theory. Successful application in the analysis of music implies potential success in other art forms also. The author gives an illustrative example, applying the Information-Theoretic similarity measure normalized compression distance with the aim of ranking paintings in a large collection by their conventionality.

Download Full-text

Unbiased Seamless SAR Image Change Detection Based on Normalized Compression Distance

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ◽

10.1109/jstars.2019.2909143 ◽

2019 ◽

Vol 12 (7) ◽

pp. 2088-2096 ◽

Cited By ~ 1

Author(s):

Mihai Coca ◽

Andrei Anghel ◽

Mihai Datcu

Keyword(s):

Change Detection ◽

Sar Image ◽

Normalized Compression Distance ◽

Image Change Detection

Download Full-text

Application of Normalized Compression Distance and Lempel-Ziv Jaccard Distance in Micro-electrode Signal Stream Classification for the Surgical Treatment of Parkinson’s Disease

Studies in Logic, Grammar and Rhetoric ◽

10.2478/slgr-2018-0040 ◽

2018 ◽

Vol 56 (1) ◽

pp. 45-57

Author(s):

Kamil Ząbkiewicz

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Data Stream ◽

Performance Measure ◽

Classification Rate ◽

Raw Data ◽

Stream Classification ◽

Jaccard Distance ◽

Data Stream Classification ◽

Normalized Compression Distance

Abstract Parkinson’s Disease can be treated with the use of microelectrode recording and stimulation. This paper presents a data stream classifier that analyses raw data from micro-electrodes and decides whether the measurements were taken from the subthalamic nucleus (STN) or not. The novelty of the proposed approach is based on the fact that distances based on raw data are used. Two distances are investigated in this paper, i.e. Normalized Compression Distance (NCD) and Lempel-Ziv Jaccard Distance (LZJD). No new features needed to be extracted due to the fact that in the case of high-dimensional data the process is extremely time-consuming. The k-nearest neighbour (k-NN) was chosen as the classifier due to its simplicity, which is essential in data stream classification. Results obtained from classifiers based on k-NN: k-NN, k-NN were compared with Probabilistic Approximate Window (k-NN with PAW); k-NN with Probabilistic Approximate Window and Adaptive Windowing (k-NN with PAW and ADWIN); and Self Adjusting Memory k-NN (SAM k-NN), which use the proposed distances, with the performance of the same classifiers but using standard Euclidean distance. Prequential accuracy was chosen as the performance measure. The results of the experiments performed with the described approach are in most cases better, i.e. the performance measures for kNN classifiers that use NCD and LZJD distances are better by up to 8.5 per cent and 14 per cent, respectively. Moreover, the proposed approach performs better when compared with other stream classification algorithms, i.e. Hoeffding Tree, Naive Bayes, and Leveraging Bagging. In the discussed case, an improvement of classification rate of up to 17.9 per cent when using Lempel-Ziv Jaccard Distance instead of the Euclidean was noted.

Download Full-text