A Syllable-Based Technique for Uyghur Text Compression

Wayit Abliz; Hao Wu; Maihemuti Maimaiti; Jiamila Wushouer; Kahaerjiang Abiderexiti; Tuergen Yibulayin; Aishan Wumaier

doi:10.3390/info11030172

A Syllable-Based Technique for Uyghur Text Compression

Information ◽

10.3390/info11030172 ◽

2020 ◽

Vol 11 (3) ◽

pp. 172 ◽

Cited By ~ 3

Author(s):

Wayit Abliz ◽

Hao Wu ◽

Maihemuti Maimaiti ◽

Jiamila Wushouer ◽

Kahaerjiang Abiderexiti ◽

...

Keyword(s):

Compression Ratio ◽

Data Transmission ◽

Text Compression ◽

Compression Process ◽

Short Text ◽

Coding Scheme ◽

Coding Schemes ◽

Code Table ◽

Compression Coding ◽

Lzw Algorithm

To improve utilization of text storage resources and efficiency of data transmission, we proposed two syllable-based Uyghur text compression coding schemes. First, according to the statistics of syllable coverage of the corpus text, we constructed a 12-bit and 16-bit syllable code tables and added commonly used symbols—such as punctuation marks and ASCII characters—to the code tables. To enable the coding scheme to process Uyghur texts mixed with other language symbols, we introduced a flag code in the compression process to distinguish the Unicode encodings that were not in the code table. The experiments showed that the 12-bit coding scheme had an average compression ratio of 0.3 on Uyghur text less than 4 KB in size and that the 16-bit coding scheme had an average compression ratio of 0.5 on text less than 2 KB in size. Our compression schemes outperformed GZip, BZip2, and the LZW algorithm on short text and could be effectively applied to the compression of Uyghur short text for storage and applications.

Download Full-text

Comparison of Entropy and Dictionary Based Text Compression in English, German, French, Italian, Czech, Hungarian, Finnish, and Croatian

Mathematics ◽

10.3390/math8071059 ◽

2020 ◽

Vol 8 (7) ◽

pp. 1059

Author(s):

Matea Ignatoski ◽

Jonatan Lerga ◽

Ljubiša Stanković ◽

Miloš Daković

Keyword(s):

Data Compression ◽

Compression Ratio ◽

Data Transmission ◽

Storage Capacity ◽

Text Compression ◽

Video Content ◽

Text File ◽

Digital World ◽

Arithmetic Algorithm ◽

Lzw Algorithm

The rapid growth in the amount of data in the digital world leads to the need for data compression, and so forth, reducing the number of bits needed to represent a text file, an image, audio, or video content. Compressing data saves storage capacity and speeds up data transmission. In this paper, we focus on the text compression and provide a comparison of algorithms (in particular, entropy-based arithmetic and dictionary-based Lempel–Ziv–Welch (LZW) methods) for text compression in different languages (Croatian, Finnish, Hungarian, Czech, Italian, French, German, and English). The main goal is to answer a question: ”How does the language of a text affect the compression ratio?” The results indicated that the compression ratio is affected by the size of the language alphabet, and size or type of the text. For example, The European Green Deal was compressed by 75.79%, 76.17%, 77.33%, 76.84%, 73.25%, 74.63%, 75.14%, and 74.51% using the LZW algorithm, and by 72.54%, 71.47%, 72.87%, 73.43%, 69.62%, 69.94%, 72.42% and 72% using the arithmetic algorithm for the English, German, French, Italian, Czech, Hungarian, Finnish, and Croatian versions, respectively.

Download Full-text

Enhancing Computational Time of Lempel-Ziv-Welch-Based Text Compression with Chinese Remainder Theorem.

Journal of Computer Science and Its Application ◽

10.4314/jcsia.v27i1.9 ◽

2020 ◽

Vol 27 (1) ◽

Author(s):

MB Ibrahim ◽

KA Gbolagade

Keyword(s):

Data Compression ◽

Compression Ratio ◽

Performance Metrics ◽

Chinese Remainder Theorem ◽

Computational Time ◽

Compact Representation ◽

Text Compression ◽

Compression Time ◽

Lzw Algorithm ◽

Better Than

The science and art of data compression is presenting information in a compact form. This compact representation of information is generated by recognizing the use of structures that exist in the data. The Lempel-Ziv-Welch (LZW) algorithm is known to be one of the best compressors of text which achieve a high degree of compression. This is possible for text files with lots of redundancies. Thus, the greater the redundancies, the greater the compression achieved. In this paper, the LZW algorithm is further enhanced to achieve a higher degree of compression without compromising its performances through the introduction of an algorithm, called Chinese Remainder Theorem (CRT), is presented. Compression Time and Compression Ratio was used for performance metrics. Simulations was carried out using MATLAB for five (5) text files (of varying sizes) in determining the efficiency of the proposed CRT-LZW technique. This new technique has opened a new development of increasing the speed of compressing data than the traditional LZW. The results show that the CRT-LZW performs better than LZW in terms of computational time by 0.12s to 15.15s, while the compression ratio remains same with 2.56% respectively. The proposed compression time also performed better than some investigative papers implementing LZW-RNS by 0.12s to 2.86s and another by 0.12s to 0.14s. Keywords: Data Compression, Lempel-Ziv-Welch (LZW) algorithm, Enhancement, Chinese Remainder Theorem (CRT), Text files.

Download Full-text

Implementation of Text Compression using Adaptive Shannon-Fano Algorithm

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.c6383.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 3984-3990

Keyword(s):

Data Compression ◽

Compression Ratio ◽

Original Text ◽

Text Compression ◽

Test Results ◽

Time Data ◽

Compression Process ◽

Text Input ◽

Before And After ◽

Black Box Testing

This study aims to implement the Shannon-fano Adaptive data compression algorithm on characters as input data. This study also investigates the data compression ratio, which is the ratio between the number of data bits before and after compression. The resulting program is tested by using black-box testing, measuring the number of character variants and the number of types of characters to the compression ratio, and testing the objective truth with the Mean Square Error (MSE) method. The description of the characteristics of the application made is done by processing data in the form of a collection of characters that have different types of characters, variants, and the number of characters. This research presents algorithm that support the steps of making adaptive Shannon-fano compression applications. The length of the character determines the variant value, compression ratio, and the number of input character types. Based on the results of test results, no error occurs according to the comparison of the original text input and the decompression results. A higher appearance frequency of a character causes a greater compression ratio of the resulting file; the analysis shows that a higher number of types of input characters causes a lower compression ratio, which proves that the proposed method in real-time data compression improves the effectiveness and efficiency of the compression process

Download Full-text

An overview of channel coding for 5G NR cellular communications

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2019.10 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 7

Author(s):

Jung Hyun Bae ◽

Ahmed Abotabl ◽

Hsien-Ping Lin ◽

Kee-Bong Song ◽

Jungwon Lee

Keyword(s):

Channel Coding ◽

Ldpc Codes ◽

Low Latency ◽

Polar Codes ◽

Coding Scheme ◽

Coding Schemes ◽

Successive Cancellation ◽

New Radio ◽

Key Characteristics ◽

User Data

AbstractA 5G new radio cellular system is characterized by three main usage scenarios of enhanced mobile broadband (eMBB), ultra-reliable and low latency communications (URLLC), and massive machine type communications, which require improved throughput, latency, and reliability compared with a 4G system. This overview paper discusses key characteristics of 5G channel coding schemes which are mainly designed for the eMBB scenario as well as for partial support of the URLLC scenario focusing on low latency. Two capacity-achieving channel coding schemes of low-density parity-check (LDPC) codes and polar codes have been adopted for 5G where the former is for user data and the latter is for control information. As a coding scheme for data, 5G LDPC codes are designed to support high throughput, a variable code rate and length and hybrid automatic repeat request in addition to good error correcting capability. 5G polar codes, as a coding scheme for control, are designed to perform well with short block length while addressing a latency issue of successive cancellation decoding.

Download Full-text

Stories students tell about their lived experiences of spirituality in the Dalcroze class

British Journal of Music Education ◽

10.1017/s0265051720000091 ◽

2020 ◽

Vol 37 (2) ◽

pp. 125-139

Author(s):

John Habron ◽

Liesl van der Merwe

Keyword(s):

Data Analysis ◽

Social Interaction ◽

Lived Experiences ◽

Music Educators ◽

Evidence Based ◽

Spiritual Experiences ◽

Coding Scheme ◽

Coding Schemes ◽

Personal Interaction ◽

Evidence Based Guidelines

AbstractThis article is a narrative inquiry of the lived spiritual experiences of students participating in Dalcroze Eurhythmics training. Previous studies have located Jaques-Dalcroze’s own writings and thought within the context of spirituality and have explored the spiritual experiences of Dalcroze teachers, but students’ perspectives remain to be investigated. We interviewed seven students, broadly defined as anyone currently attending regular Dalcroze training or who have recently attended Dalcroze courses and still consider themselves Dalcroze students. Various strategies for narrative data analysis were synthesised into our own coding scheme. Themes emerged from the data analysis: situation, continuity, personal interaction, social interaction and significant moments. The themes helped us construct a fictive conversation between the participants, using direct quotations from the interviews. Implications for practice focus on what inhibits and promotes experiences of spirituality in the Dalcroze class. This research will be relevant to music educators, as it gives clear, evidence-based guidelines on how opportunities for spirituality can be created in the Dalcroze classroom. It also offers an original synthesis of existing coding schemes for other researchers undertaking narrative inquiries.

Download Full-text

Study on Data Compression Algorithm Based on Semantic Analysis

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.842.712 ◽

2013 ◽

Vol 842 ◽

pp. 712-716

Author(s):

Qi Hong ◽

Xiao Lei Lu

Keyword(s):

Data Compression ◽

Semantic Analysis ◽

Traditional Approach ◽

Basic Element ◽

Compression Algorithm ◽

Huffman Coding ◽

Text Compression ◽

C Language ◽

Lossless Data Compression ◽

Compression Coding

As a lossless data compression coding, Huffman coding is widely used in text compression. Nevertheless, the traditional approach has some deficiencies. For example, same compression on all characters may overlook the particularity of keywords and special statements as well as the regularity of some statements. In terms of this situation, a new data compression algorithm based on semantic analysis is proposed in this paper. The new kind of method, which takes C language keywords as the basic element, is created for solving the text compression of source files of C language. The results of experiment show that the compression ratio has been improved by 150 percent roughly in this way. This method can be promoted to apply to text compression of the constrained-language.

Download Full-text

Relations Between Design Activities and Interactional Characteristics in Novice-Expert Design Consultations: ‘What’ is Done ‘How’

Volume 8: 14th Design for Manufacturing and the Life Cycle Conference; 6th Symposium on International Design and Design Education; 21st International Conference on Design Theory and Methodology, Parts A and B ◽

10.1115/detc2009-87602 ◽

2009 ◽

Author(s):

Fleur Deken ◽

Maaike S. Kleinsmann ◽

Marco Aurisicchio ◽

Rob B. Bracewell ◽

Kristina Lauche

Keyword(s):

Organizational Context ◽

Aerospace Engineering ◽

Design Engineering ◽

Expert Consultation ◽

Coding Scheme ◽

Coding Schemes ◽

Design Activities ◽

Resolution Level ◽

Design Projects ◽

Fine Resolution

This study investigated processes in novice–expert consultation meetings in an organizational context to identify ‘what’ is done ‘how’ by novices and expert in consultation discourses. A conceptual model was developed for studying novice–expert design discourses at a fine-resolution level. An empirical study was performed at Rolls-Royce Aerospace Engineering. In total 7 audio-records were captured of meetings between trainees (novices) and expert designers, which occurred over the course of 3 trainee teams’ design projects. Relations were investigated between two coding schemes, namely the activity coding scheme and the conversational flow coding scheme. It was found that certain activities in the meeting were more often performed by either novices or experts, whereas other activities were more often performed collaboratively. Based on the results, implications for design engineering practitioners were derived and suggestions for further research are provided.

Download Full-text

Achieving Quasi-Isothermal Air Compression With Multistage Compressors for Large-Scale Energy Storage

ASME 2013 7th International Conference on Energy Sustainability ◽

10.1115/es2013-18008 ◽

2013 ◽

Author(s):

Kai Wang ◽

Peiwen Li ◽

Ara Arabyan

Keyword(s):

Energy Storage ◽

Compression Ratio ◽

Large Scale ◽

High Efficiency ◽

Pressure Ratio ◽

Compressed Air ◽

Thermal Source ◽

Round Trip ◽

Compression Process ◽

Compression Scheme

The round trip efficiency of compressed air for energy storage is greatly limited by the significant increase in the temperature of the compressed air (and the resulting heat loss) in high-ratio adiabatic compression. This paper introduces a multi-stage compression scheme with low-compression-ratio compressors and inter-compressor natural convection cooling resulting in a quasi-isothermal compression process that can be useful for large-scale energy storage. When many low pressure ratio compressors work inline, a high overall compression ratio can be achieved with high efficiency. The quasi-isothermally compressed air can then be expanded adiabatically in turbines to generate power with the addition of thermal energy, from either fuel or a solar thermal source. This paper presents mathematical models of such an energy storage system and discusses its round-trip performance with different operating schemes.

Download Full-text

DECOMPOSITION OF BUS-INVERT CODING FOR LOW-POWER I/O

Journal of Circuits System and Computers ◽

10.1142/s0218126600000093 ◽

2000 ◽

Vol 10 (01n02) ◽

pp. 101-111 ◽

Cited By ~ 7

Author(s):

SUNGPACK HONG ◽

TAEWHAN KIM ◽

UNNI NARAYANAN ◽

KI-SEOK CHUNG

Keyword(s):

Low Power ◽

Experimental Results ◽

Coding Scheme ◽

Coding Schemes

This paper proposes a new bus-invert coding scheme for reducing the number of bus transitions. Unlike the previous schemes in which the entire bus lines or one subset of the bus lines are considered for bus-invert coding, in the proposed scheme, the bus lines are partitioned and each partitioned group is considered independently for bus-invert coding to maximize the effectiveness of reducing the total number of bus transitions. Experimental results show that the decomposed bus-invert coding scheme reduces the total number of bus transitions by 47.2% and 11.9% on average than those of the conventional and the partial bus-invert coding schemes respectively.

Download Full-text

A coding scheme for underwater digital data transmission

10.1109/oceans.1970.1160928 ◽

1970 ◽

Cited By ~ 1

Author(s):

T. Hasegawa

Keyword(s):

Data Transmission ◽

Digital Data ◽

Coding Scheme

Download Full-text