An Enhanced Short Text Compression Scheme for Smart Devices

2010 ◽  
Vol 5 (1) ◽  
Author(s):  
Md. Rafiqul Islam ◽  
S. A. Ahsan Rajon
2018 ◽  
Vol 27 (2) ◽  
pp. 48-57
Author(s):  
Duha Amir Sultan

Computers ◽  
2019 ◽  
Vol 8 (1) ◽  
pp. 19 ◽  
Author(s):  
Maha Alamri ◽  
William Teahan

This paper proposes an automatic correction system that detects and corrects dyslexic errors in Arabic text. The system uses a language model based on the Prediction by Partial Matching (PPM) text compression scheme that generates possible alternatives for each misspelled word. Furthermore, the generated candidate list is based on edit operations (insertion, deletion, substitution and transposition), and the correct alternative for each misspelled word is chosen on the basis of the compression codelength of the trigram. The system is compared with widely-used Arabic word processing software and the Farasa tool. The system provided good results compared with the other tools, with a recall of 43%, precision 89%, F1 58% and accuracy 81%.


2010 ◽  
Vol 7 (1) ◽  
pp. 123-131 ◽  
Author(s):  
Hussein Al-Bahadili ◽  
Shakir M. Hussain

Author(s):  
Raffaele Pizzolante ◽  
Bruno Carpentieri ◽  
Aniello Castiglione ◽  
Arcangelo Castiglione ◽  
Francesco Palmieri

Information ◽  
2020 ◽  
Vol 11 (3) ◽  
pp. 172 ◽  
Author(s):  
Wayit Abliz ◽  
Hao Wu ◽  
Maihemuti Maimaiti ◽  
Jiamila Wushouer ◽  
Kahaerjiang Abiderexiti ◽  
...  

To improve utilization of text storage resources and efficiency of data transmission, we proposed two syllable-based Uyghur text compression coding schemes. First, according to the statistics of syllable coverage of the corpus text, we constructed a 12-bit and 16-bit syllable code tables and added commonly used symbols—such as punctuation marks and ASCII characters—to the code tables. To enable the coding scheme to process Uyghur texts mixed with other language symbols, we introduced a flag code in the compression process to distinguish the Unicode encodings that were not in the code table. The experiments showed that the 12-bit coding scheme had an average compression ratio of 0.3 on Uyghur text less than 4 KB in size and that the 16-bit coding scheme had an average compression ratio of 0.5 on text less than 2 KB in size. Our compression schemes outperformed GZip, BZip2, and the LZW algorithm on short text and could be effectively applied to the compression of Uyghur short text for storage and applications.


Sign in / Sign up

Export Citation Format

Share Document