Design of Effective Lossless Data Compression Technique for Multiple Genomic DNA Sequences

2021 ◽  
pp. 17-25
Author(s):  
Mahmud Alosta ◽  
◽  
◽  
Alireza Souri

In recent years, a massive amount of genomic DNA sequences are being created which leads to the development of new storing and archiving methods. There is a major challenge to process, store or transmit the huge volume of DNA sequences data. To lessen the number of bits needed to store and transmit data, data compression (DC) techniques are proposed. Recently, DC becomes more popular, and large number of techniques is proposed with applications in several domains. In this paper, a lossless compression technique named Arithmetic coding is employed to compress DNA sequences. In order to validate the performance of the proposed model, the artificial genome dataset is used and the results are investigated interms of different evaluation parameters. Experiments were performed on artificial datasets and the compression performance of Arithmetic coding is compared to Huffman coding, LZW coding, and LZMA techniques. From simulation results, it is clear that the Arithmetic coding achieves significantly better compression with a compression ratio of 0.261 at the bit rate of 2.16 bpc.

2021 ◽  
Vol 102 ◽  
pp. 04013
Author(s):  
Md. Atiqur Rahman ◽  
Mohamed Hamada

Modern daily life activities produced lots of information for the advancement of telecommunication. It is a challenging issue to store them on a digital device or transmit it over the Internet, leading to the necessity for data compression. Thus, research on data compression to solve the issue has become a topic of great interest to researchers. Moreover, the size of compressed data is generally smaller than its original. As a result, data compression saves storage and increases transmission speed. In this article, we propose a text compression technique using GPT-2 language model and Huffman coding. In this proposed method, Burrows-Wheeler transform and a list of keys are used to reduce the original text file’s length. Finally, we apply GPT-2 language mode and then Huffman coding for encoding. This proposed method is compared with the state-of-the-art techniques used for text compression. Finally, we show that the proposed method demonstrates a gain in compression ratio compared to the other state-of-the-art methods.


2016 ◽  
Vol 78 (6-4) ◽  
Author(s):  
Muhamad Azlan Daud ◽  
Muhammad Rezal Kamel Ariffin ◽  
S. Kularajasingam ◽  
Che Haziqah Che Hussin ◽  
Nurliyana Juhan ◽  
...  

A new compression algorithm used to ensure a modified Baptista symmetric cryptosystem which is based on a chaotic dynamical system to be applicable is proposed. The Baptista symmetric cryptosystem able to produce various ciphers responding to the same message input. This modified Baptista type cryptosystem suffers from message expansion that goes against the conventional methodology of a symmetric cryptosystem. A new lossless data compression algorithm based on theideas from the Huffman coding for data transmission is proposed.This new compression mechanism does not face the problem of mapping elements from a domain which is much larger than its range.Our new algorithm circumvent this problem via a pre-defined codeword list.  The purposed algorithm has fast encoding and decoding mechanism and proven analytically to be a lossless data compression technique.


Author(s):  
Muhammad Usama ◽  
Qutaibah M. Malluhi ◽  
Nordin Zakaria ◽  
Imran Razzak ◽  
Waheed Iqbal

AbstractData stored in physical storage or transferred over a communication channel includes substantial redundancy. Compression techniques cut down the data redundancy to reduce space and communication time. Nevertheless, compression techniques lack proper security measures, e.g., secret key control, leaving the data susceptible to attack. Data encryption is therefore needed to achieve data security in keeping the data unreadable and unaltered through a secret key. This work concentrates on the problems of data compression and encryption collectively without negatively affecting each other. Towards this end, an efficient, secure data compression technique is introduced, which provides cryptographic capabilities for use in combination with an adaptive Huffman coding, pseudorandom keystream generator, and S-Box to achieve confusion and diffusion properties of cryptography into the compression process and overcome the performance issues. Thus, compression is carried out according to a secret key such that the output will be both encrypted and compressed in a single step. The proposed work demonstrated a congruent fit for real-time implementation, providing robust encryption quality and acceptable compression capability. Experiment results are provided to show that the proposed technique is efficient and produces similar space-saving (%) to standard techniques. Security analysis discloses that the proposed technique is susceptible to the secret key and plaintext. Moreover, the ciphertexts produced by the proposed technique successfully passed all NIST tests, which confirm that the 99% confidence level on the randomness of the ciphertext.


Author(s):  
Piyush Kumar Shukla ◽  
Pradeep Rusiya ◽  
Deepak Agrawal ◽  
Lata Chhablani ◽  
Balwant Singh Raghuwanshi

2010 ◽  
Vol 24 (5) ◽  
pp. 487-493
Author(s):  
Yiming Ouyang ◽  
Xi'e Huang ◽  
Huaguo Liang ◽  
Baosheng Zou

Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1521
Author(s):  
Jihoon Lee ◽  
Seungwook Yoon ◽  
Euiseok Hwang

With the development of the internet of things (IoT), the power grid has become intelligent using massive IoT sensors, such as smart meters. Generally, installed smart meters can collect large amounts of data to improve grid visibility and situational awareness. However, the limited storage and communication capacities can restrain their infrastructure in the IoT environment. To alleviate these problems, efficient and various compression techniques are required. Deep learning-based compression techniques such as auto-encoders (AEs) have recently been deployed for this purpose. However, the compression performance of the existing models can be limited when the spectral properties of high-frequency sampled power data are widely varying over time. This paper proposes an AE compression model, based on a frequency selection method, which improves the reconstruction quality while maintaining the compression ratio (CR). For efficient data compression, the proposed method selectively applies customized compression models, depending on the spectral properties of the corresponding time windows. The framework of the proposed method involves two primary steps: (i) division of the power data into a series of time windows with specified spectral properties (high-frequency, medium-frequency, and low-frequency dominance) and (ii) separate training and selective application of the AE models, which prepares them for the power data compression that best suits the characteristics of each frequency. In simulations on the Dutch residential energy dataset, the frequency-selective AE model shows significantly higher reconstruction performance than the existing model with the same CR. In addition, the proposed model reduces the computational complexity involved in the analysis of the learning process.


Author(s):  
Kuldeepsingh A. Kalariya ◽  
Ram Prasnna Meena ◽  
Lipi Poojara ◽  
Deepa Shahi ◽  
Sandip Patel

Abstract Background Squalene synthase (SQS) is a rate-limiting enzyme necessary to produce pentacyclic triterpenes in plants. It is an important enzyme producing squalene molecules required to run steroidal and triterpenoid biosynthesis pathways working in competitive inhibition mode. Reports are available on information pertaining to SQS gene in several plants, but detailed information on SQS gene in Gymnema sylvestre R. Br. is not available. G. sylvestre is a priceless rare vine of central eco-region known for its medicinally important triterpenoids. Our work aims to characterize the GS-SQS gene in this high-value medicinal plant. Results Coding DNA sequences (CDS) with 1245 bp length representing GS-SQS gene predicted from transcriptome data in G. sylvestre was used for further characterization. The SWISS protein structure modeled for the GS-SQS amino acid sequence data had MolProbity Score of 1.44 and the Clash Score 3.86. The quality estimates and statistical score of Ramachandran plots analysis indicated that the homology model was reliable. For full-length amplification of the gene, primers designed from flanking regions of CDS encoding GS-SQS were used to get amplification against genomic DNA as template which resulted in approximately 6.2-kb sized single-band product. The sequencing of this product through NGS was carried out generating 2.32 Gb data and 3347 number of scaffolds with N50 value of 457 bp. These scaffolds were compared to identify similarity with other SQS genes as well as the GS-SQSs of the transcriptome. Scaffold_3347 representing the GS-SQS gene harbored two introns of 101 and 164 bp size. Both these intronic regions were validated by primers designed from adjoining outside regions of the introns on the scaffold representing GS-SQS gene. The amplification took place when the template was genomic DNA and failed when the template was cDNA confirmed the presence of two introns in GS-SQS gene in Gymnema sylvestre R. Br. Conclusion This study shows GS-SQS gene was very closely related to Coffea arabica and Gardenia jasminoides and this gene harbored two introns of 101 and 164 bp size.


2020 ◽  
pp. 1-1
Author(s):  
Fang Zhang ◽  
Xiaojun Wang ◽  
Ying Yan ◽  
Jinghan He ◽  
Wenzhong Gao ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document