Burrows–Wheeler Transform Based Lossless Text Compression Using Keys and Huffman Coding

Text compression is one of the most significant research fields, and various algorithms for text compression have already been developed. This is a significant issue, as the use of internet bandwidth is considerably increasing. This article proposes a Burrows–Wheeler transform and pattern matching-based lossless text compression algorithm that uses Huffman coding in order to achieve an excellent compression ratio. In this article, we introduce an algorithm with two keys that are used in order to reduce more frequently repeated characters after the Burrows–Wheeler transform. We then find patterns of a certain length from the reduced text and apply Huffman encoding. We compare our proposed technique with state-of-the-art text compression algorithms. Finally, we conclude that the proposed technique demonstrates a gain in compression ratio when compared to other compression techniques. A small problem with our proposed method is that it does not work very well for symmetric communications like Brotli.

Download Full-text

Lossless text compression using GPT-2 language model and Huffman coding

SHS Web of Conferences ◽

10.1051/shsconf/202110204013 ◽

2021 ◽

Vol 102 ◽

pp. 04013

Author(s):

Md. Atiqur Rahman ◽

Mohamed Hamada

Keyword(s):

Data Compression ◽

State Of The Art ◽

Language Model ◽

Huffman Coding ◽

Original Text ◽

Text Compression ◽

Compression Technique ◽

Daily Life Activities ◽

Burrows Wheeler Transform ◽

Compressed Data

Modern daily life activities produced lots of information for the advancement of telecommunication. It is a challenging issue to store them on a digital device or transmit it over the Internet, leading to the necessity for data compression. Thus, research on data compression to solve the issue has become a topic of great interest to researchers. Moreover, the size of compressed data is generally smaller than its original. As a result, data compression saves storage and increases transmission speed. In this article, we propose a text compression technique using GPT-2 language model and Huffman coding. In this proposed method, Burrows-Wheeler transform and a list of keys are used to reduce the original text file’s length. Finally, we apply GPT-2 language mode and then Huffman coding for encoding. This proposed method is compared with the state-of-the-art techniques used for text compression. Finally, we show that the proposed method demonstrates a gain in compression ratio compared to the other state-of-the-art methods.

Download Full-text

TEXT COMPRESSION USING HYBRIDS OF BWT AND GBAM

International Journal of Neural Systems ◽

10.1142/s0129065703001388 ◽

2003 ◽

Vol 13 (01) ◽

pp. 39-45

Author(s):

AMER AL-NASSIRI

Keyword(s):

Associative Memory ◽

Compression Ratio ◽

Lossless Compression ◽

Compression Algorithm ◽

Text Compression ◽

Bidirectional Associative Memory ◽

Compression Method ◽

Theoretical Evaluation ◽

Evaluation Of Data ◽

Burrows Wheeler Transform

In this paper we considered a theoretical evaluation of data and text compression algorithm based on the Burrows–Wheeler Transform (BWT) and General Bidirectional Associative Memory (GBAM). A new data and text lossless compression method, based on the combination of BWT1 and GBAM2 approaches, is presented. The algorithm was tested on many texts in different formats (ASCII and RTF). The compression ratio achieved is fairly good, on average 28–36%. Decompression is fast.

Download Full-text

Improvement Text Compression Performance Using Combination of Burrows Wheeler Transform, Move to Front, and Huffman Coding Methods

Journal of Physics Conference Series ◽

10.1088/1742-6596/495/1/012042 ◽

2014 ◽

Vol 495 ◽

pp. 012042 ◽

Cited By ~ 1

Author(s):

Mohammada Aprilianto ◽

Maman Abdurohman

Keyword(s):

Huffman Coding ◽

Text Compression ◽

Compression Performance ◽

Burrows Wheeler Transform

Download Full-text

n-Gram-Based Text Compression

Computational Intelligence and Neuroscience ◽

10.1155/2016/9483646 ◽

2016 ◽

Vol 2016 ◽

pp. 1-11 ◽

Cited By ~ 5

Author(s):

Vu H. Nguyen ◽

Hien T. Nguyen ◽

Hieu N. Duong ◽

Vaclav Snasel

Keyword(s):

Compression Ratio ◽

Efficient Method ◽

State Of The Art ◽

Sliding Window ◽

Experimental Results ◽

Text Compression ◽

News Agencies ◽

Art Methods ◽

N Gram ◽

Significant Compression

We propose an efficient method for compressing Vietnamese text usingn-gram dictionaries. It has a significant compression ratio in comparison with those of state-of-the-art methods on the same dataset. Given a text, first, the proposed method splits it inton-grams and then encodes them based onn-gram dictionaries. In the encoding phase, we use a sliding window with a size that ranges from bigram to five grams to obtain the best encoding stream. Eachn-gram is encoded by two to four bytes accordingly based on its correspondingn-gram dictionary. We collected 2.5 GB text corpus from some Vietnamese news agencies to buildn-gram dictionaries from unigram to five grams and achieve dictionaries with a size of 12 GB in total. In order to evaluate our method, we collected a testing set of 10 different text files with different sizes. The experimental results indicate that our method achieves compression ratio around 90% and outperforms state-of-the-art methods.

Download Full-text

Heuristic rank selection with progressively searching tensor ring network

Complex & Intelligent Systems ◽

10.1007/s40747-021-00308-x ◽

2021 ◽

Author(s):

Nannan Li ◽

Yu Pan ◽

Yaran Chen ◽

Zixiang Ding ◽

Dongbin Zhao ◽

...

Keyword(s):

Genetic Algorithm ◽

Compression Ratio ◽

State Of The Art ◽

Heuristic Method ◽

Ring Networks ◽

Ring Network ◽

Narrow Region ◽

Network Search ◽

Deep Networks ◽

Evolutionary Phase

AbstractRecently, tensor ring networks (TRNs) have been applied in deep networks, achieving remarkable successes in compression ratio and accuracy. Although highly related to the performance of TRNs, rank selection is seldom studied in previous works and usually set to equal in experiments. Meanwhile, there is not any heuristic method to choose the rank, and an enumerating way to find appropriate rank is extremely time-consuming. Interestingly, we discover that part of the rank elements is sensitive and usually aggregate in a narrow region, namely an interest region. Therefore, based on the above phenomenon, we propose a novel progressive genetic algorithm named progressively searching tensor ring network search (PSTRN), which has the ability to find optimal rank precisely and efficiently. Through the evolutionary phase and progressive phase, PSTRN can converge to the interest region quickly and harvest good performance. Experimental results show that PSTRN can significantly reduce the complexity of seeking rank, compared with the enumerating method. Furthermore, our method is validated on public benchmarks like MNIST, CIFAR10/100, UCF11 and HMDB51, achieving the state-of-the-art performance.

Download Full-text

Huffman Encoding using VLSI

International Journal of Electronics and Electical Engineering ◽

10.47893/ijeee.2012.1028 ◽

2012 ◽

pp. 143-145

Author(s):

Shilpa.K. Meshram ◽

Meghana .A. Hasamnis

Keyword(s):

Binary Tree ◽

Input Data ◽

Huffman Coding ◽

Specific Method ◽

Prefix Code ◽

Huffman Encoding ◽

Lossless Data Compression ◽

Variable Length Coding ◽

Entropy Encoding ◽

Tree Method

Huffman coding is entropy encoding algorithm used for lossless data compression. It basically uses variable length coding which is done using binary tree method. In our implementation of Huffman encoder, more frequent input data is encoded with less number of binary bits than the data with less frequency.This way of coding is used in JPEG and MPEG for image compression. Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code. Prefix-free codes means the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol.

Download Full-text

Low power text compression for Huffman coding using Altera FPGA with power management controller

2018 1st International Scientific Conference of Engineering Sciences - 3rd Scientific Conference of Engineering Science (ISCES) ◽

10.1109/isces.2018.8340521 ◽

2018 ◽

Cited By ~ 2

Author(s):

Maan Hameed ◽

Hussein Shakor ◽

Intesar Razak

Keyword(s):

Low Power ◽

Power Management ◽

Huffman Coding ◽

Text Compression

Download Full-text

Fast and Effective Copy-Move Detection of Digital Audio Based on Auto Segment

International Journal of Digital Crime and Forensics ◽

10.4018/ijdcf.2019040104 ◽

2019 ◽

Vol 11 (2) ◽

pp. 47-62 ◽

Cited By ~ 7

Author(s):

Xinchao Huang ◽

Zihan Liu ◽

Wei Lu ◽

Hongmei Liu ◽

Shijun Xiang

Keyword(s):

Time Complexity ◽

State Of The Art ◽

Digital Audio ◽

Activity Detection ◽

Research Focus ◽

Audio Forensics ◽

Significant Research ◽

Audio Data ◽

Adjacent Segments ◽

Voice Activity

Detecting digital audio forgeries is a significant research focus in the field of audio forensics. In this article, the authors focus on a special form of digital audio forgery—copy-move—and propose a fast and effective method to detect doctored audios. First, the article segments the input audio data into syllables by voice activity detection and syllable detection. Second, the authors select the points in the frequency domain as feature by applying discrete Fourier transform (DFT) to each audio segment. Furthermore, this article sorts every segment according to the features and gets a sorted list of audio segments. In the end, the article merely compares one segment with some adjacent segments in the sorted list so that the time complexity is decreased. After comparisons with other state of the art methods, the results show that the proposed method can identify the authentication of the input audio and locate the forged position fast and effectively.

Download Full-text

Deep Learning for Historical Document Analysis and Recognition—A Survey

Journal of Imaging ◽

10.3390/jimaging6100110 ◽

2020 ◽

Vol 6 (10) ◽

pp. 110

Author(s):

Francesco Lombardi ◽

Simone Marinai

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Point Of View ◽

Historical Documents ◽

Learning Approaches ◽

Historical Document ◽

Research Directions ◽

Research Fields ◽

Definition Of ◽

Novel Applications

Nowadays, deep learning methods are employed in a broad range of research fields. The analysis and recognition of historical documents, as we survey in this work, is not an exception. Our study analyzes the papers published in the last few years on this topic from different perspectives: we first provide a pragmatic definition of historical documents from the point of view of the research in the area, then we look at the various sub-tasks addressed in this research. Guided by these tasks, we go through the different input-output relations that are expected from the used deep learning approaches and therefore we accordingly describe the most used models. We also discuss research datasets published in the field and their applications. This analysis shows that the latest research is a leap forward since it is not the simple use of recently proposed algorithms to previous problems, but novel tasks and novel applications of state of the art methods are now considered. Rather than just providing a conclusive picture of the current research in the topic we lastly suggest some potential future trends that can represent a stimulus for innovative research directions.

Download Full-text

An Ingenious Design of a High Performance-Low Complexity Image Compressor for Wireless Capsule Endoscopy

Sensors ◽

10.3390/s20061617 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1617

Author(s):

Ioannis Intzes ◽

Hongying Meng ◽

John Cosmas

Keyword(s):

Power Consumption ◽

Capsule Endoscopy ◽

State Of The Art ◽

Logic Gates ◽

Low Complexity ◽

Gastrointestinal Diseases ◽

Wireless Capsule Endoscopy ◽

Huffman Coding ◽

Medical Diagnoses ◽

Wireless Capsule

Wireless Capsule Endoscopy is a state-of-the-art technology for medical diagnoses of gastrointestinal diseases. The amount of data produced by an endoscopic capsule camera is huge. These vast amounts of data are not practical to be saved internally due to power consumption and the available size. So, this data must be transmitted wirelessly outside the human body for further processing. The data should be compressed and transmitted efficiently in the domain of power consumption. In this paper, a new approach in the design and implementation of a low complexity, multiplier-less compression algorithm is proposed. Statistical analysis of capsule endoscopy images improved the performance of traditional lossless techniques, like Huffman coding and DPCM coding. Furthermore the Huffman implementation based on simple logic gates and without the use of memory tables increases more the speed and reduce the power consumption of the proposed system. Further analysis and comparison with existing state-of-the-art methods proved that the proposed method has better performance.

Download Full-text