Unconventional Methods in Voynich Manuscript Analysis

This paper discusses the possible use of unconventional algorithms on analysis and categorization of the unknown text, including documents written in unknown languages. Scholars have identied about ten famous manuscripts, mostly encrypted or written in the unknown language. The most famous is the Voynich manuscript, an illustrated codex hand-written in an unknown language or writing system. Using carbon-dating methods, the researchers determined its age as the early 15th century (between 1404-1438). Many professional and amateur cryptographers have studied the Voynich manuscript, and none has deciphered its meaning as yet, including American and British code-breakers and cryptologists. While there exist many hypotheses about the meaning and structure of the document, they have yet to be conrmed empirically. In this paper, we discuss two dierent kinds of unconventional approaches for how to handle manuscripts with unidentied writing systems and determine whether its properties are characterized by a natural language, or is only historical fake text.

Download Full-text

A survey of diacritic restoration in abjad and alphabet writing systems

Natural Language Engineering ◽

10.1017/s1351324917000407 ◽

2017 ◽

Vol 24 (1) ◽

pp. 123-154 ◽

Cited By ~ 2

Author(s):

FRANKLIN ỌLÁDIÍPỌ̀ ASAHIAH ◽

ỌDẸ́TÚNJÍ ÀJÀDÍ ỌDẸ́JỌBÍ ◽

EMMANUEL RÓTÌMÍ ADÁGÚNODÒ

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Evaluation Metrics ◽

Writing System ◽

Writing Systems ◽

The World ◽

Intractable Problems

AbstractA diacritic is a mark placed near or through a character to alter its original phonetic or orthographic value. Many languages around the world use diacritics in their orthography, whatever the writing system the orthography is based on. In many languages, diacritics are ignored either by convention or as a matter of convenience. For users who are not familiar with the text domain, the absence of diacritics within text has been known to cause mild to serious readability and comprehension problems. However, the absence of diacritics in text causes near-intractable problems for natural language processing systems. This situation has led to extensive research on diacritization. Several techniques have been applied to address diacritic restoration (or diacritization) but the existing surveys of techniques have been restricted to some languages and hence left gaps for practitioners to fill. Our survey examined diacritization from the angle of resources deployed and various formulation employed for diacritization. It was concluded by recommending that (a) any proposed technique for diacritization should consider the language features and the purpose served by diacritics, (b) that evaluation metrics needed to be more rigorously defined for easy comparison of performance of models.

Download Full-text

‘King Sejong is crying’

English Today ◽

10.1017/s0266078420000085 ◽

2020 ◽

pp. 1-6

Author(s):

Eun-Young (Julia) Kim

Keyword(s):

Royal Academy ◽

Scientific Writing ◽

National Language ◽

Writing System ◽

Writing Systems ◽

15Th Century ◽

The Government

Korea is probably one of the few countries, if not the only one, that observes a holiday in honor of the national language's alphabet. Hangulnal, which falls on October 9, is the Korean Alphabet Day. Each year, the government hosts events to celebrate one of the most prized possessions of the country, Hangul – the writing system of the national language. Created by King Sejong and his Royal Academy Scholars in the 15th century, Hangul is recognized as one of ‘the world's most scientific writing systems ever created by man’ (Sohn, 2001: 13). To outsiders, such pride may appear somewhat overblown, but Koreans do take great pride in Hangul.

Download Full-text

Using Sub-character Level Information for Neural Machine Translation of Logographic Languages

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3431727 ◽

2021 ◽

Vol 20 (2) ◽

pp. 1-15

Author(s):

Longtu Zhang ◽

Mamoru Komachi

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Writing System ◽

Writing Systems ◽

Linguistic Features ◽

Neural Machine Translation ◽

Level Information

Logographic and alphabetic languages (e.g., Chinese vs. English) have different writing systems linguistically. Languages belonging to the same writing system usually exhibit more sharing information, which can be used to facilitate natural language processing tasks such as neural machine translation (NMT). This article takes advantage of the logographic characters in Chinese and Japanese by decomposing them into smaller units, thus more optimally utilizing the information these characters share in the training of NMT systems in both encoding and decoding processes. Experiments show that the proposed method can robustly improve the NMT performance of both “logographic” language pairs (JA–ZH) and “logographic + alphabetic” (JA–EN and ZH–EN) language pairs in both supervised and unsupervised NMT scenarios. Moreover, as the decomposed sequences are usually very long, extra position features for the transformer encoder can help with the modeling of these long sequences. The results also indicate that, theoretically, linguistic features can be manipulated to obtain higher share token rates and further improve the performance of natural language processing systems.

Download Full-text

An Error Analysis of Hiragana and Katakana Writing Systems in the Learning of Japanese as Third Language at UiTM

International Journal of Modern Languages And Applied Linguistics ◽

10.24191/ijmal.v4i3.8628 ◽

2020 ◽

Vol 4 (3) ◽

pp. 46

Author(s):

Norhazlina Husin ◽

Nuranisah Tan Abdullah ◽

Aini Aziz

Keyword(s):

Error Analysis ◽

Foreign Students ◽

Teaching And Learning ◽

Language Skills ◽

The Other ◽

Japanese Language ◽

Writing System ◽

Writing Systems ◽

Third Language ◽

Main Challenge

Abstract The teaching of Japanese language as third language to foreign students has its own issues and challenges. It does not merely involve only teaching the four language skills. Japanese language has its own unique values. These unique values also tend to differentiate the teaching of Japanese language as a third language from other third language acquisitions. The teaching of Japanese language as third language to foreign students also involves the teaching of its writing system. This makes the teaching of Japanese language rather complicated because Japanese language has three forms of writings, namely: Hiragana, Katakana and Kanji. Students are required to fully understand the Hiragana system of writing first before proceeding to learn the other two forms of writings. The main challenge in the teaching of Japanese writing systems is the time allocated that can be considered as very limited as other language aspects need to be taught too. This, which relates directly to students’ factor very much contribute to the challenges foreseen. Students are likely to face problems in understanding and using the writings as they simultaneously need to adhere to the findings teaching and learning schedules. This article discusses on the analysis conducted in terms of the learning of the Hiragana and Katagana systems of writing among foreign students. The discussion in this article is based on the teaching of Japanese language to students of Universiti Teknologi MARA(UiTM), Shah Alam. Keywords: Third language, Hiragana, Katakana, Kanji

Download Full-text

In search of the perfect orthography

Written Language & Literacy ◽

10.1075/wll.7.2.02ven ◽

2005 ◽

Vol 7 (2) ◽

pp. 139-163 ◽

Cited By ~ 8

Author(s):

Richard L. Venezky

Keyword(s):

Writing System ◽

Writing Systems ◽

International Phonetic Alphabet ◽

Experienced Reader ◽

Cultural Considerations ◽

Advanced Stages ◽

One To One ◽

The One ◽

Psychological Processing ◽

The Ideal

Philologists, linguists, and educators have insisted for several centuries that the ideal orthography has a one-to-one correspondence between grapheme and phoneme. Others, however, have suggested deviations for such functions as distinguishing homophones, displaying popular alternative spellings, and retaining morpheme identity. If, indeed, the one-to-one ideal were accepted, the International Phonetic Alphabet should become the orthographic standard for all enlightened nations, yet the failure of even a single country to adopt it for practical writing suggests that other factors besides phonology are considered important for a writing system. Whatever the ideal orthography might be, the practical writing systems adopted upon this earth reflect linguistic, psychological, and cultural considerations. Knowingly or unknowingly, countries have adopted orthographies that favour either the early stages of learning to read or the advanced stages, that is, the experienced reader. The more a system tends towards a one-to-one relationship between graphemes and phonemes, the more it assists the new reader and the non-speaker of the language while the more it marks etymology and morphology, the more it favours the experienced reader. The study of psychological processing in reading demonstrates that human capacities for processing print are so powerful that complex patterns and irregularities pose only a small challenge. Orthographic regularity is extracted from lexical input and used to recognise words during reading. To understand how such a system develops, researchers should draw on the general mechanisms of perceptual learning.

Download Full-text

A Computational Theory of Writing Systems Richard Sproat (AT&T Laboratories) Cambridge University Press (Studies in natural language processing, edited by Branimir Boguraev), 2000, xviii+236 pp; hardbound, ISBN 0-521-66340-7, $59.95

Computational Linguistics ◽

10.1162/coli.2000.27.3.464 ◽

2001 ◽

Vol 27 (3) ◽

pp. 464-467 ◽

Cited By ~ 1

Author(s):

Kenneth R. Beesley

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Writing Systems ◽

Computational Theory ◽

Cambridge University

Download Full-text

The relation of vowel letters to phonological syllables in English and German

Written Language & Literacy ◽

10.1075/wll.7.2.05nee ◽

2005 ◽

Vol 7 (2) ◽

pp. 205-234

Author(s):

Martin Neef

Keyword(s):

Side Effects ◽

Main Function ◽

Writing System ◽

Writing Systems ◽

Phonological Representations ◽

Language System ◽

English I

Assuming that a writing system is inevitably dependent on a language system, the main function of written representations is to give access to the basic representations of the language system. In this paper, I want to deal with graphematic phenomena, i.e. the relations of written representations to corresponding phonological representations. In particular, I will delve into the relation of written representations to the phonological factor of the number of syllables, based on data from English and German. Though in these languages, there is neither a specific written element relating to the syllable number nor an isomorphic relation between vowel letters and the number of syllables, two questions are worth examining: Can a word have more syllables than vowel letters? Can a word have less syllables than uninterrupted sequences of vowel letters? The first question will be answered positively for both languages although there are some severe differences to be stated; the second question will be answered positively only for English. I will show that these results are side-effects of more basic regularities of the writing systems under consideration.

Download Full-text

Revitalising Naxi dongba as a ‘pictographic’ vernacular script

Journal of Chinese Writing Systems ◽

10.1177/2513850218814405 ◽

2019 ◽

Vol 3 (1) ◽

pp. 53-67

Author(s):

Duncan Poupard

Keyword(s):

Future Development ◽

Yunnan Province ◽

Writing System ◽

Writing Systems ◽

Traditional Role ◽

The Past ◽

The Future ◽

English Translations ◽

Standard Chinese

A script can be a window into a language and all the culture contained within it. China’s minority peoples have a multitude of scripts, but many are in danger of falling out of use, a decline spurred by the adoption and promotion of standard Chinese across the country. Nevertheless, efforts are being made to preserve minority writing systems. This article reveals how the primarily logographic Naxi dongba script (often labelled the world’s ‘last living pictographs’), used in China’s southwestern Yunnan province to record the Naxi language, can be practically used as a modern writing system alongside its more widely known traditional role as a means of recording religious rites, and what exactly separates these two styles of writing. The efforts that have been made to achieve the goal of modernisation over the past decades are reviewed, including the longstanding attempts at Unicode encoding. I make some suggestions for the future development of the script, and employ plenty of examples from recent publications, alongside phonetic renderings and English translations. It is hoped that overall awareness of this unique script can be raised, and that it can develop into a vernacular script with everyday applications.

Download Full-text

Neurocognitive Basis of Dyslexia in Different Writing Systems

10.31234/osf.io/ykptc ◽

2019 ◽

Author(s):

Li Liu ◽

James R. Booth

Keyword(s):

Developmental Dyslexia ◽

Writing System ◽

Writing Systems ◽

Neural Basis ◽

Chinese Writing System ◽

Chinese Writing ◽

Alphabetic Writing

An important issue in dyslexia research is whether developmental dyslexia in different writing systems has a common neurocognitive basis across writing systems or whether there are specific neurocognitive alterations. In this chapter, we review studies that investigate the neurocognitive basis of dyslexia in Chinese, a logographic writing system, and compare the findings of these studies with dyslexia in alphabetic writing systems. We begin with a brief review of the characteristics of the Chinese writing system because to fully understand the commonality and specificity in the neural basis of Chinese dyslexia one must understand how logographic writing systems are structured differently than alphabetic systems.

Download Full-text

RULE-BASED SYLLABIFICATION OF KOREAN WORDS WRITTEN IN LATIN USING DETERMINISTIC FINITE AUTOMATA MODELS

Jurnal Terapan Teknologi Informasi ◽

10.21460/jutei.2018.21.77 ◽

2018 ◽

Vol 2 (1) ◽

pp. 75-85

Author(s):

Rouly Doharma Sihite ◽

Aditya Wikan Mahastama

Keyword(s):

Success Rate ◽

Statistical Approach ◽

Markov Models ◽

Finite Automata ◽

Writing System ◽

Test Results ◽

Writing Systems ◽

Research Focus ◽

Finite State ◽

Catch Up

Transliteration is still a challenge in helping people to read or write from one to another writing systems. Korean transliteration has been a topic of research to automate the conversion between Hangul (Korean writing system) and Latin characters. Previous works have been done in transliterating Hangul to Latin, using statistical approach (72.2% accuracy) and Extended Markov Models (54.9% accuracy). This research focus on transliterating Latin (romanised) Korean words into Hangul, as many learners of Korean began using Latin first. Selected method is modeling the probable vowel and consonant forms and problable vowel and consonant sequences using Finite State Automata to avoid training. These models are then coded into rules which applied and tested to 100 random Korean words. Initial test results only 40% success rate in transliterating due to the nature that consonants have to be labeled as initial or final of a syllable, and some consonants missed the modeled rules. Additional rules are then added to catch-up and merge these consonants into existing proper syllables, which increased the success rate to 92%. This result is analysed further and it is found that certain consonants sequence caused syllabification problem if exist in a certain position. Other additional rules was inserted and yields 99% final success rate which also is the accuracy of transliterating Korean words written in Latin into Hangul characters in compund syllables.

Download Full-text