scholarly journals Comparison of Entropy and Dictionary Based Text Compression in English, German, French, Italian, Czech, Hungarian, Finnish, and Croatian

Mathematics ◽  
2020 ◽  
Vol 8 (7) ◽  
pp. 1059
Author(s):  
Matea Ignatoski ◽  
Jonatan Lerga ◽  
Ljubiša Stanković ◽  
Miloš Daković

The rapid growth in the amount of data in the digital world leads to the need for data compression, and so forth, reducing the number of bits needed to represent a text file, an image, audio, or video content. Compressing data saves storage capacity and speeds up data transmission. In this paper, we focus on the text compression and provide a comparison of algorithms (in particular, entropy-based arithmetic and dictionary-based Lempel–Ziv–Welch (LZW) methods) for text compression in different languages (Croatian, Finnish, Hungarian, Czech, Italian, French, German, and English). The main goal is to answer a question: ”How does the language of a text affect the compression ratio?” The results indicated that the compression ratio is affected by the size of the language alphabet, and size or type of the text. For example, The European Green Deal was compressed by 75.79%, 76.17%, 77.33%, 76.84%, 73.25%, 74.63%, 75.14%, and 74.51% using the LZW algorithm, and by 72.54%, 71.47%, 72.87%, 73.43%, 69.62%, 69.94%, 72.42% and 72% using the arithmetic algorithm for the English, German, French, Italian, Czech, Hungarian, Finnish, and Croatian versions, respectively.

Author(s):  
Jamil Azzeh

Data compression is a size reduction of data to be sent via network or to be stored on auxiliary storage for long time, thus data compression will save storage capacity, speed up file transfer, speedup data transmission by decreasing transferring time, and decrease costs for storage hardware and network bandwidth.In this paper we will invistigate Huffman and LZW methods of data compression-decompression. Different images in sizes and types will treated, compresion , decompression times will be evaluated, compression ratio will obtained, the obtaind results will be anakyzed inorder to do some judgments


2020 ◽  
Vol 27 (1) ◽  
Author(s):  
MB Ibrahim ◽  
KA Gbolagade

The science and art of data compression is presenting information in a compact form. This compact representation of information is generated by recognizing the use of structures that exist in the data. The Lempel-Ziv-Welch (LZW) algorithm is known to be one of the best compressors of text which achieve a high degree of compression. This is possible for text files with lots of redundancies. Thus, the greater the redundancies, the greater the compression achieved. In this paper, the LZW algorithm is further enhanced to achieve a higher degree of compression without compromising its performances through the introduction of an algorithm, called Chinese Remainder Theorem (CRT), is presented. Compression Time and Compression Ratio was used for performance metrics. Simulations was carried out using MATLAB for five (5) text files (of varying sizes) in determining the efficiency of the proposed CRT-LZW technique. This new technique has opened a new development of increasing the speed of compressing data than the traditional LZW. The results show that the CRT-LZW performs better than LZW in terms of computational time by 0.12s to 15.15s, while the compression ratio remains same with 2.56% respectively. The proposed compression time also performed better than some investigative papers implementing LZW-RNS by 0.12s to 2.86s and another by 0.12s to 0.14s. Keywords: Data Compression, Lempel-Ziv-Welch (LZW) algorithm, Enhancement, Chinese Remainder Theorem (CRT), Text files.


Information ◽  
2020 ◽  
Vol 11 (3) ◽  
pp. 172 ◽  
Author(s):  
Wayit Abliz ◽  
Hao Wu ◽  
Maihemuti Maimaiti ◽  
Jiamila Wushouer ◽  
Kahaerjiang Abiderexiti ◽  
...  

To improve utilization of text storage resources and efficiency of data transmission, we proposed two syllable-based Uyghur text compression coding schemes. First, according to the statistics of syllable coverage of the corpus text, we constructed a 12-bit and 16-bit syllable code tables and added commonly used symbols—such as punctuation marks and ASCII characters—to the code tables. To enable the coding scheme to process Uyghur texts mixed with other language symbols, we introduced a flag code in the compression process to distinguish the Unicode encodings that were not in the code table. The experiments showed that the 12-bit coding scheme had an average compression ratio of 0.3 on Uyghur text less than 4 KB in size and that the 16-bit coding scheme had an average compression ratio of 0.5 on text less than 2 KB in size. Our compression schemes outperformed GZip, BZip2, and the LZW algorithm on short text and could be effectively applied to the compression of Uyghur short text for storage and applications.


2015 ◽  
Vol 719-720 ◽  
pp. 554-560
Author(s):  
Le Yang ◽  
Zhao Yang Guo ◽  
Shan Shan Yong ◽  
Feng Guo ◽  
Xin An Wang

This paper presents a hardware implementation of real time data compression and decompression circuits based on the LZW algorithm. LZW is a dictionary based data compression, which has the advantage of fast speed, high compression, and small resource occupation. In compression circuit, the design creatively utilizes two dictionaries alternately to improve efficiency and compressing rate. In decompression circuit, an integrated State machine control module is adopted to save hardware resource. Through hardware description and language programming, the circuits finally reach function simulation and timing simulation. The width of data sample is 12bits, and the dictionary storage capacity is 1K. The simulation results show the compression and decompression circuits have complete function. Compared to software method, hardware implementation can save more storage and compressing time. It has a high practical value in the future.


There is a necessity to reduce the consumption of exclusive resources. This is achieved using data compression. The data compression is one well known technique which can reduce the file size. A plethora of data compression algorithms are available which provides compression in various ratios. LZW is one of the powerful widely used algorithms. This paper attempts to propose and apply some enhancements to LZW, hence comes out with an efficient lossless text compression scheme that can compress a given file at better compression ratio. The paper proposes three approaches which practically enhances the original algorithm. These approaches try to gain better compression ratio. In approach1, it exploits the notion of using existing string code with odd code for a newly encounter string which is reverse of existing. In approach2 it uses a choice of code length for the current compression, so avoiding the problem of dictionary overflow. In approach3 it appends some selective set of frequently encountered string patterns. So the intensified LZW method provides better compression ratio with the inclusion of the above features.


2020 ◽  
Vol 7 (2) ◽  
pp. 554-563
Author(s):  
Kazeem B. Adedeji

IoT-based smart water supply network management applications generate a huge volume of data from the installed sensing devices which are required to be processed (sometimes in-network), stored and transmitted to a remote centre for decision making. When the volume of data produced by diverse IoT smart sensing devices intensify, processing and storage of these data begin to be a serious issue. The large data size acquired from these applications increases the computational complexities, occupies the scarce bandwidth of data transmission and increases the storage space. Thus, data size reduction through the use of data compression algorithms is essential in IoT-based smart water network management applications. In this paper, the performance evaluation of four different data compression algorithms used for this purpose is presented. These algorithms, which include RLE, Huffman, LZW and Shanon-Fano encoding were realised using MATLAB software and tested on six water supply system data. The performance of each of these algorithms was evaluated based on their compression ratio, compression factor, percentage space savings, as well as the compression gain. The results obtained showed that the LZW algorithm shows better performance base on the compression ratio, compression factor, space savings and the compression gain. However, its execution time is relatively slow compared to the RLE and the two other algorithms investigated. Most importantly, the LZW algorithm has a significant reduction in the data sizes of the tested files than all other algorithms


This study aims to implement the Shannon-fano Adaptive data compression algorithm on characters as input data. This study also investigates the data compression ratio, which is the ratio between the number of data bits before and after compression. The resulting program is tested by using black-box testing, measuring the number of character variants and the number of types of characters to the compression ratio, and testing the objective truth with the Mean Square Error (MSE) method. The description of the characteristics of the application made is done by processing data in the form of a collection of characters that have different types of characters, variants, and the number of characters. This research presents algorithm that support the steps of making adaptive Shannon-fano compression applications. The length of the character determines the variant value, compression ratio, and the number of input character types. Based on the results of test results, no error occurs according to the comparison of the original text input and the decompression results. A higher appearance frequency of a character causes a greater compression ratio of the resulting file; the analysis shows that a higher number of types of input characters causes a lower compression ratio, which proves that the proposed method in real-time data compression improves the effectiveness and efficiency of the compression process


2017 ◽  
Author(s):  
Andysah Putera Utama Siahaan

Compression is an activity performed to reduce its size into smaller than earlier. Compression is created since lack of adequate storage capacity. Data compression is also needed to speed up data transmission activity between computer networks. Compression has the different rule between speed and density. Compressed compression will take longer than compression that relies on speed. Elias Delta is one of the lossless compression techniques that can compress the characters. This compression is created based on the frequency of the character of a character on a document to be compressed. It works based on bit deductions on seven or eight bits. The most common characters will have the least number of bits, while the fewest characters will have the longest number of bits. The formation of character sets serves to eliminate double characters in the calculation of the number of each character as well as for the compression table storage. It has a good level of comparison between before and after compression. The speed of compression and decompression process possessed by this method is outstanding and fast.


Author(s):  
Hikka Sartika ◽  
Taronisokhi Zebua

Storage space required by an application is one of the problems on smartphones. This problem can result in a waste of storage space because not all smartphones have a very large storage capacity. One application that has a large file size is the RPUL application and this application is widely accessed by students and the general public. Large file size is what often causes this application can not run effectively on smartphones. One solution that can be used to solve this problem is to compress the application file, so that the size of the storage space needed in the smartphone is much smaller. This study describes how the application of the elias gamma code algorithm as one of the compression technique algorithms to compress the RPUL application database file. This is done so that the RPUL application can run effectively on a smartphone after it is installed. Based on trials conducted on 64 bit of text as samples in this research it was found that compression based on the elias gamma code algorithm is able to compress text from a database file with a ratio of compression is 2 bits, compression ratio is 50% with a redundancy is 50%. Keywords: Compression, RPUL, Smartphone, Elias Gamma Code


Author(s):  
Konstantinos Kardaras ◽  
George I. Lambrou ◽  
Dimitrios Koutsouris

Background: In the new era of wireless communications new challenges emerge including the provision of various services over the digital television network. In particular, such services become more important when referring to the tele-medical applications through terrestrial Digital Video Broadcasting (DVB). Objective: One of the most significant aspects of video broadcasting is the quality and information content of data. Towards that end several algorithms have been proposed for image processing in order to achieve the most convenient data compression. Methods: Given that medical video and data are highly demanding in terms of resources it is imperative to find methods and algorithms that will facilitate medical data transmission with ordinary infrastructure such as DVB. Results: In the present work we have utilized a quantization algorithm for data compression and we have attempted to transform video signal in such a way that would transmit information and data with a minimum loss in quality and succeed a near maximum End-user approval. Conclusions: Such approaches are proven to be of great significance in emergency handling situations, which also include health care and emergency care applications.


Sign in / Sign up

Export Citation Format

Share Document