normalized compression distance
Recently Published Documents


TOTAL DOCUMENTS

40
(FIVE YEARS 6)

H-INDEX

9
(FIVE YEARS 1)

2021 ◽  
Vol 18 (2) ◽  
pp. 1-20
Author(s):  
Arnab Kumar Biswas

Program obfuscation is a widely used cryptographic software intellectual property (IP) protection technique against reverse engineering attacks in embedded systems. However, very few works have studied the impact of combining various obfuscation techniques on the obscurity (difficulty of reverse engineering) and performance (execution time) of obfuscated programs. In this article, we propose a Genetic Algorithm (GA)-based framework that not only optimizes obscurity and performance of obfuscated cryptographic programs, but it also ensures very low timing side-channel leakage. Our proposed T iming S ide C hannel S ensitive P rogram O bfuscation O ptimization F ramework (TSC-SPOOF) determines the combination of obfuscation transformation functions that produce optimized obfuscated programs with preferred optimization parameters. In particular, TSC-SPOOF employs normalized compression distance (NCD) and channel capacity to measure obscurity and timing side-channel leakage, respectively. We also use RISC-V rocket core running on a Xilinx Zynq FPGA device as part of our framework to obtain realistic results. The experimental results clearly show that our proposed solution leads to cryptographic programs with lower execution time, higher obscurity, and lower timing side-channel leakage than unguided obfuscation.


Author(s):  
Liguo Yu

In C-alike programs, the source code is separated into header files and source files. During the software evolution process, both these two kinds of files need to adapt to changing requirement and changing environment. This paper studies the coevolution of header files and source files of C-alike programs. Using normalized compression distance that is derived from Kolmogorov complexity, we measure the header file difference and source file difference between versions of an evolving software product. Header files distance and source files distance are compared to understand their difference in pace of evolution. Mantel tests are performed to investigate the correlation of header file evolution and source file evolution. The study is performed on the source code of Apache HTTP web server.


2020 ◽  
Author(s):  
Rudi L. Cilibrasi ◽  
Paul M.B. Vitányi

AbstractWe analyze the whole genome phylogeny and taxonomy of the SARS-CoV-2 virus using compression. This is a new fast alignment-free method called the “normalized compression distance” (NCD) method. It discovers all effective similarities based on Kolmogorov complexity. The latter being incomputable we approximate it by a good compressor such as the modern zpaq. The results comprise that the SARS-CoV-2 virus is closest to the RaTG13 virus and similar to two bat SARS-like coronaviruses bat-SL-CoVZXC21 and bat-SL-CoVZC4. The similarity is quantified and compared with the same quantified similarities among the mtDNA of certain species. We treat the question whether Pangolins are involved in the SARS-CoV-2 virus. The compression method is simpler and possibly faster than any other whole genome method, which makes it the ideal tool to explore phylogeny.


Entropy ◽  
2020 ◽  
Vol 22 (5) ◽  
pp. 575 ◽  
Author(s):  
Nadia Alshahwan ◽  
Earl T. Barr ◽  
David Clark ◽  
George Danezis ◽  
Héctor D. Menéndez

Malware concealment is the predominant strategy for malware propagation. Black hats create variants of malware based on polymorphism and metamorphism. Malware variants, by definition, share some information. Although the concealment strategy alters this information, there are still patterns on the software. Given a zoo of labelled malware and benign-ware, we ask whether a suspect program is more similar to our malware or to our benign-ware. Normalized Compression Distance (NCD) is a generic metric that measures the shared information content of two strings. This measure opens a new front in the malware arms race, one where the countermeasures promise to be more costly for malware writers, who must now obfuscate patterns as strings qua strings, without reference to execution, in their variants. Our approach classifies disk-resident malware with 97.4% accuracy and a false positive rate of 3%. We demonstrate that its accuracy can be improved by combining NCD with the compressibility rates of executables using decision forests, paving the way for future improvements. We demonstrate that malware reported within a narrow time frame of a few days is more homogeneous than malware reported over two years, but that our method still classifies the latter with 95.2% accuracy and a 5% false positive rate. Due to its use of compression, the time and computation cost of our method is nontrivial. We show that simple approximation techniques can improve its running time by up to 63%. We compare our results to the results of applying the 59 anti-malware programs used on the VirusTotal website to our malware. Our approach outperforms each one used alone and matches that of all of them used collectively.


Leonardo ◽  
2020 ◽  
Vol 53 (3) ◽  
pp. 274-280
Author(s):  
Alan Marsden

Information Theory provoked the interest of arts researchers from its inception in the mid-twentieth century but failed to produce the expected impact, partly because the data and computing systems required were not available. With the modern availability of data from public collections and sophisticated software, there is renewed interest in Information Theory. Successful application in the analysis of music implies potential success in other art forms also. The author gives an illustrative example, applying the Information-Theoretic similarity measure normalized compression distance with the aim of ranking paintings in a large collection by their conventionality.


2018 ◽  
Vol 56 (1) ◽  
pp. 45-57
Author(s):  
Kamil Ząbkiewicz

Abstract Parkinson’s Disease can be treated with the use of microelectrode recording and stimulation. This paper presents a data stream classifier that analyses raw data from micro-electrodes and decides whether the measurements were taken from the subthalamic nucleus (STN) or not. The novelty of the proposed approach is based on the fact that distances based on raw data are used. Two distances are investigated in this paper, i.e. Normalized Compression Distance (NCD) and Lempel-Ziv Jaccard Distance (LZJD). No new features needed to be extracted due to the fact that in the case of high-dimensional data the process is extremely time-consuming. The k-nearest neighbour (k-NN) was chosen as the classifier due to its simplicity, which is essential in data stream classification. Results obtained from classifiers based on k-NN: k-NN, k-NN were compared with Probabilistic Approximate Window (k-NN with PAW); k-NN with Probabilistic Approximate Window and Adaptive Windowing (k-NN with PAW and ADWIN); and Self Adjusting Memory k-NN (SAM k-NN), which use the proposed distances, with the performance of the same classifiers but using standard Euclidean distance. Prequential accuracy was chosen as the performance measure. The results of the experiments performed with the described approach are in most cases better, i.e. the performance measures for kNN classifiers that use NCD and LZJD distances are better by up to 8.5 per cent and 14 per cent, respectively. Moreover, the proposed approach performs better when compared with other stream classification algorithms, i.e. Hoeffding Tree, Naive Bayes, and Leveraging Bagging. In the discussed case, an improvement of classification rate of up to 17.9 per cent when using Lempel-Ziv Jaccard Distance instead of the Euclidean was noted.


Sign in / Sign up

Export Citation Format

Share Document