scholarly journals Burrows–Wheeler Transform Based Lossless Text Compression Using Keys and Huffman Coding

Symmetry ◽  
2020 ◽  
Vol 12 (10) ◽  
pp. 1654
Author(s):  
Md. Atiqur Rahman ◽  
Mohamed Hamada

Text compression is one of the most significant research fields, and various algorithms for text compression have already been developed. This is a significant issue, as the use of internet bandwidth is considerably increasing. This article proposes a Burrows–Wheeler transform and pattern matching-based lossless text compression algorithm that uses Huffman coding in order to achieve an excellent compression ratio. In this article, we introduce an algorithm with two keys that are used in order to reduce more frequently repeated characters after the Burrows–Wheeler transform. We then find patterns of a certain length from the reduced text and apply Huffman encoding. We compare our proposed technique with state-of-the-art text compression algorithms. Finally, we conclude that the proposed technique demonstrates a gain in compression ratio when compared to other compression techniques. A small problem with our proposed method is that it does not work very well for symmetric communications like Brotli.

2021 ◽  
Vol 102 ◽  
pp. 04013
Author(s):  
Md. Atiqur Rahman ◽  
Mohamed Hamada

Modern daily life activities produced lots of information for the advancement of telecommunication. It is a challenging issue to store them on a digital device or transmit it over the Internet, leading to the necessity for data compression. Thus, research on data compression to solve the issue has become a topic of great interest to researchers. Moreover, the size of compressed data is generally smaller than its original. As a result, data compression saves storage and increases transmission speed. In this article, we propose a text compression technique using GPT-2 language model and Huffman coding. In this proposed method, Burrows-Wheeler transform and a list of keys are used to reduce the original text file’s length. Finally, we apply GPT-2 language mode and then Huffman coding for encoding. This proposed method is compared with the state-of-the-art techniques used for text compression. Finally, we show that the proposed method demonstrates a gain in compression ratio compared to the other state-of-the-art methods.


2003 ◽  
Vol 13 (01) ◽  
pp. 39-45
Author(s):  
AMER AL-NASSIRI

In this paper we considered a theoretical evaluation of data and text compression algorithm based on the Burrows–Wheeler Transform (BWT) and General Bidirectional Associative Memory (GBAM). A new data and text lossless compression method, based on the combination of BWT1 and GBAM2 approaches, is presented. The algorithm was tested on many texts in different formats (ASCII and RTF). The compression ratio achieved is fairly good, on average 28–36%. Decompression is fast.


2016 ◽  
Vol 2016 ◽  
pp. 1-11 ◽  
Author(s):  
Vu H. Nguyen ◽  
Hien T. Nguyen ◽  
Hieu N. Duong ◽  
Vaclav Snasel

We propose an efficient method for compressing Vietnamese text usingn-gram dictionaries. It has a significant compression ratio in comparison with those of state-of-the-art methods on the same dataset. Given a text, first, the proposed method splits it inton-grams and then encodes them based onn-gram dictionaries. In the encoding phase, we use a sliding window with a size that ranges from bigram to five grams to obtain the best encoding stream. Eachn-gram is encoded by two to four bytes accordingly based on its correspondingn-gram dictionary. We collected 2.5 GB text corpus from some Vietnamese news agencies to buildn-gram dictionaries from unigram to five grams and achieve dictionaries with a size of 12 GB in total. In order to evaluate our method, we collected a testing set of 10 different text files with different sizes. The experimental results indicate that our method achieves compression ratio around 90% and outperforms state-of-the-art methods.


Author(s):  
Nannan Li ◽  
Yu Pan ◽  
Yaran Chen ◽  
Zixiang Ding ◽  
Dongbin Zhao ◽  
...  

AbstractRecently, tensor ring networks (TRNs) have been applied in deep networks, achieving remarkable successes in compression ratio and accuracy. Although highly related to the performance of TRNs, rank selection is seldom studied in previous works and usually set to equal in experiments. Meanwhile, there is not any heuristic method to choose the rank, and an enumerating way to find appropriate rank is extremely time-consuming. Interestingly, we discover that part of the rank elements is sensitive and usually aggregate in a narrow region, namely an interest region. Therefore, based on the above phenomenon, we propose a novel progressive genetic algorithm named progressively searching tensor ring network search (PSTRN), which has the ability to find optimal rank precisely and efficiently. Through the evolutionary phase and progressive phase, PSTRN can converge to the interest region quickly and harvest good performance. Experimental results show that PSTRN can significantly reduce the complexity of seeking rank, compared with the enumerating method. Furthermore, our method is validated on public benchmarks like MNIST, CIFAR10/100, UCF11 and HMDB51, achieving the state-of-the-art performance.


Author(s):  
Shilpa.K. Meshram ◽  
Meghana .A. Hasamnis

Huffman coding is entropy encoding algorithm used for lossless data compression. It basically uses variable length coding which is done using binary tree method. In our implementation of Huffman encoder, more frequent input data is encoded with less number of binary bits than the data with less frequency.This way of coding is used in JPEG and MPEG for image compression. Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code. Prefix-free codes means the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol.


2019 ◽  
Vol 11 (2) ◽  
pp. 47-62 ◽  
Author(s):  
Xinchao Huang ◽  
Zihan Liu ◽  
Wei Lu ◽  
Hongmei Liu ◽  
Shijun Xiang

Detecting digital audio forgeries is a significant research focus in the field of audio forensics. In this article, the authors focus on a special form of digital audio forgery—copy-move—and propose a fast and effective method to detect doctored audios. First, the article segments the input audio data into syllables by voice activity detection and syllable detection. Second, the authors select the points in the frequency domain as feature by applying discrete Fourier transform (DFT) to each audio segment. Furthermore, this article sorts every segment according to the features and gets a sorted list of audio segments. In the end, the article merely compares one segment with some adjacent segments in the sorted list so that the time complexity is decreased. After comparisons with other state of the art methods, the results show that the proposed method can identify the authentication of the input audio and locate the forged position fast and effectively.


2020 ◽  
Vol 6 (10) ◽  
pp. 110
Author(s):  
Francesco Lombardi ◽  
Simone Marinai

Nowadays, deep learning methods are employed in a broad range of research fields. The analysis and recognition of historical documents, as we survey in this work, is not an exception. Our study analyzes the papers published in the last few years on this topic from different perspectives: we first provide a pragmatic definition of historical documents from the point of view of the research in the area, then we look at the various sub-tasks addressed in this research. Guided by these tasks, we go through the different input-output relations that are expected from the used deep learning approaches and therefore we accordingly describe the most used models. We also discuss research datasets published in the field and their applications. This analysis shows that the latest research is a leap forward since it is not the simple use of recently proposed algorithms to previous problems, but novel tasks and novel applications of state of the art methods are now considered. Rather than just providing a conclusive picture of the current research in the topic we lastly suggest some potential future trends that can represent a stimulus for innovative research directions.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1617
Author(s):  
Ioannis Intzes ◽  
Hongying Meng ◽  
John Cosmas

Wireless Capsule Endoscopy is a state-of-the-art technology for medical diagnoses of gastrointestinal diseases. The amount of data produced by an endoscopic capsule camera is huge. These vast amounts of data are not practical to be saved internally due to power consumption and the available size. So, this data must be transmitted wirelessly outside the human body for further processing. The data should be compressed and transmitted efficiently in the domain of power consumption. In this paper, a new approach in the design and implementation of a low complexity, multiplier-less compression algorithm is proposed. Statistical analysis of capsule endoscopy images improved the performance of traditional lossless techniques, like Huffman coding and DPCM coding. Furthermore the Huffman implementation based on simple logic gates and without the use of memory tables increases more the speed and reduce the power consumption of the proposed system. Further analysis and comparison with existing state-of-the-art methods proved that the proposed method has better performance.


Sign in / Sign up

Export Citation Format

Share Document