scholarly journals A Syllable-Based Technique for Uyghur Text Compression

Information ◽  
2020 ◽  
Vol 11 (3) ◽  
pp. 172 ◽  
Author(s):  
Wayit Abliz ◽  
Hao Wu ◽  
Maihemuti Maimaiti ◽  
Jiamila Wushouer ◽  
Kahaerjiang Abiderexiti ◽  
...  

To improve utilization of text storage resources and efficiency of data transmission, we proposed two syllable-based Uyghur text compression coding schemes. First, according to the statistics of syllable coverage of the corpus text, we constructed a 12-bit and 16-bit syllable code tables and added commonly used symbols—such as punctuation marks and ASCII characters—to the code tables. To enable the coding scheme to process Uyghur texts mixed with other language symbols, we introduced a flag code in the compression process to distinguish the Unicode encodings that were not in the code table. The experiments showed that the 12-bit coding scheme had an average compression ratio of 0.3 on Uyghur text less than 4 KB in size and that the 16-bit coding scheme had an average compression ratio of 0.5 on text less than 2 KB in size. Our compression schemes outperformed GZip, BZip2, and the LZW algorithm on short text and could be effectively applied to the compression of Uyghur short text for storage and applications.

Mathematics ◽  
2020 ◽  
Vol 8 (7) ◽  
pp. 1059
Author(s):  
Matea Ignatoski ◽  
Jonatan Lerga ◽  
Ljubiša Stanković ◽  
Miloš Daković

The rapid growth in the amount of data in the digital world leads to the need for data compression, and so forth, reducing the number of bits needed to represent a text file, an image, audio, or video content. Compressing data saves storage capacity and speeds up data transmission. In this paper, we focus on the text compression and provide a comparison of algorithms (in particular, entropy-based arithmetic and dictionary-based Lempel–Ziv–Welch (LZW) methods) for text compression in different languages (Croatian, Finnish, Hungarian, Czech, Italian, French, German, and English). The main goal is to answer a question: ”How does the language of a text affect the compression ratio?” The results indicated that the compression ratio is affected by the size of the language alphabet, and size or type of the text. For example, The European Green Deal was compressed by 75.79%, 76.17%, 77.33%, 76.84%, 73.25%, 74.63%, 75.14%, and 74.51% using the LZW algorithm, and by 72.54%, 71.47%, 72.87%, 73.43%, 69.62%, 69.94%, 72.42% and 72% using the arithmetic algorithm for the English, German, French, Italian, Czech, Hungarian, Finnish, and Croatian versions, respectively.


2020 ◽  
Vol 27 (1) ◽  
Author(s):  
MB Ibrahim ◽  
KA Gbolagade

The science and art of data compression is presenting information in a compact form. This compact representation of information is generated by recognizing the use of structures that exist in the data. The Lempel-Ziv-Welch (LZW) algorithm is known to be one of the best compressors of text which achieve a high degree of compression. This is possible for text files with lots of redundancies. Thus, the greater the redundancies, the greater the compression achieved. In this paper, the LZW algorithm is further enhanced to achieve a higher degree of compression without compromising its performances through the introduction of an algorithm, called Chinese Remainder Theorem (CRT), is presented. Compression Time and Compression Ratio was used for performance metrics. Simulations was carried out using MATLAB for five (5) text files (of varying sizes) in determining the efficiency of the proposed CRT-LZW technique. This new technique has opened a new development of increasing the speed of compressing data than the traditional LZW. The results show that the CRT-LZW performs better than LZW in terms of computational time by 0.12s to 15.15s, while the compression ratio remains same with 2.56% respectively. The proposed compression time also performed better than some investigative papers implementing LZW-RNS by 0.12s to 2.86s and another by 0.12s to 0.14s. Keywords: Data Compression, Lempel-Ziv-Welch (LZW) algorithm, Enhancement, Chinese Remainder Theorem (CRT), Text files.


This study aims to implement the Shannon-fano Adaptive data compression algorithm on characters as input data. This study also investigates the data compression ratio, which is the ratio between the number of data bits before and after compression. The resulting program is tested by using black-box testing, measuring the number of character variants and the number of types of characters to the compression ratio, and testing the objective truth with the Mean Square Error (MSE) method. The description of the characteristics of the application made is done by processing data in the form of a collection of characters that have different types of characters, variants, and the number of characters. This research presents algorithm that support the steps of making adaptive Shannon-fano compression applications. The length of the character determines the variant value, compression ratio, and the number of input character types. Based on the results of test results, no error occurs according to the comparison of the original text input and the decompression results. A higher appearance frequency of a character causes a greater compression ratio of the resulting file; the analysis shows that a higher number of types of input characters causes a lower compression ratio, which proves that the proposed method in real-time data compression improves the effectiveness and efficiency of the compression process


Author(s):  
Jung Hyun Bae ◽  
Ahmed Abotabl ◽  
Hsien-Ping Lin ◽  
Kee-Bong Song ◽  
Jungwon Lee

AbstractA 5G new radio cellular system is characterized by three main usage scenarios of enhanced mobile broadband (eMBB), ultra-reliable and low latency communications (URLLC), and massive machine type communications, which require improved throughput, latency, and reliability compared with a 4G system. This overview paper discusses key characteristics of 5G channel coding schemes which are mainly designed for the eMBB scenario as well as for partial support of the URLLC scenario focusing on low latency. Two capacity-achieving channel coding schemes of low-density parity-check (LDPC) codes and polar codes have been adopted for 5G where the former is for user data and the latter is for control information. As a coding scheme for data, 5G LDPC codes are designed to support high throughput, a variable code rate and length and hybrid automatic repeat request in addition to good error correcting capability. 5G polar codes, as a coding scheme for control, are designed to perform well with short block length while addressing a latency issue of successive cancellation decoding.


2020 ◽  
Vol 37 (2) ◽  
pp. 125-139
Author(s):  
John Habron ◽  
Liesl van der Merwe

AbstractThis article is a narrative inquiry of the lived spiritual experiences of students participating in Dalcroze Eurhythmics training. Previous studies have located Jaques-Dalcroze’s own writings and thought within the context of spirituality and have explored the spiritual experiences of Dalcroze teachers, but students’ perspectives remain to be investigated. We interviewed seven students, broadly defined as anyone currently attending regular Dalcroze training or who have recently attended Dalcroze courses and still consider themselves Dalcroze students. Various strategies for narrative data analysis were synthesised into our own coding scheme. Themes emerged from the data analysis: situation, continuity, personal interaction, social interaction and significant moments. The themes helped us construct a fictive conversation between the participants, using direct quotations from the interviews. Implications for practice focus on what inhibits and promotes experiences of spirituality in the Dalcroze class. This research will be relevant to music educators, as it gives clear, evidence-based guidelines on how opportunities for spirituality can be created in the Dalcroze classroom. It also offers an original synthesis of existing coding schemes for other researchers undertaking narrative inquiries.


2013 ◽  
Vol 842 ◽  
pp. 712-716
Author(s):  
Qi Hong ◽  
Xiao Lei Lu

As a lossless data compression coding, Huffman coding is widely used in text compression. Nevertheless, the traditional approach has some deficiencies. For example, same compression on all characters may overlook the particularity of keywords and special statements as well as the regularity of some statements. In terms of this situation, a new data compression algorithm based on semantic analysis is proposed in this paper. The new kind of method, which takes C language keywords as the basic element, is created for solving the text compression of source files of C language. The results of experiment show that the compression ratio has been improved by 150 percent roughly in this way. This method can be promoted to apply to text compression of the constrained-language.


Author(s):  
Fleur Deken ◽  
Maaike S. Kleinsmann ◽  
Marco Aurisicchio ◽  
Rob B. Bracewell ◽  
Kristina Lauche

This study investigated processes in novice–expert consultation meetings in an organizational context to identify ‘what’ is done ‘how’ by novices and expert in consultation discourses. A conceptual model was developed for studying novice–expert design discourses at a fine-resolution level. An empirical study was performed at Rolls-Royce Aerospace Engineering. In total 7 audio-records were captured of meetings between trainees (novices) and expert designers, which occurred over the course of 3 trainee teams’ design projects. Relations were investigated between two coding schemes, namely the activity coding scheme and the conversational flow coding scheme. It was found that certain activities in the meeting were more often performed by either novices or experts, whereas other activities were more often performed collaboratively. Based on the results, implications for design engineering practitioners were derived and suggestions for further research are provided.


Author(s):  
Kai Wang ◽  
Peiwen Li ◽  
Ara Arabyan

The round trip efficiency of compressed air for energy storage is greatly limited by the significant increase in the temperature of the compressed air (and the resulting heat loss) in high-ratio adiabatic compression. This paper introduces a multi-stage compression scheme with low-compression-ratio compressors and inter-compressor natural convection cooling resulting in a quasi-isothermal compression process that can be useful for large-scale energy storage. When many low pressure ratio compressors work inline, a high overall compression ratio can be achieved with high efficiency. The quasi-isothermally compressed air can then be expanded adiabatically in turbines to generate power with the addition of thermal energy, from either fuel or a solar thermal source. This paper presents mathematical models of such an energy storage system and discusses its round-trip performance with different operating schemes.


2000 ◽  
Vol 10 (01n02) ◽  
pp. 101-111 ◽  
Author(s):  
SUNGPACK HONG ◽  
TAEWHAN KIM ◽  
UNNI NARAYANAN ◽  
KI-SEOK CHUNG

This paper proposes a new bus-invert coding scheme for reducing the number of bus transitions. Unlike the previous schemes in which the entire bus lines or one subset of the bus lines are considered for bus-invert coding, in the proposed scheme, the bus lines are partitioned and each partitioned group is considered independently for bus-invert coding to maximize the effectiveness of reducing the total number of bus transitions. Experimental results show that the decomposed bus-invert coding scheme reduces the total number of bus transitions by 47.2% and 11.9% on average than those of the conventional and the partial bus-invert coding schemes respectively.


Sign in / Sign up

Export Citation Format

Share Document