scholarly journals Unconventional Methods in Voynich Manuscript Analysis

MENDEL ◽  
2019 ◽  
Vol 25 (1) ◽  
pp. 1-14 ◽  
Author(s):  
Ivan Zelinka ◽  
Oldrich Zmeskal ◽  
Leah Windsor ◽  
Zhiqiang Cai

This paper discusses the possible use of unconventional algorithms on analysis and categorization of the unknown text, including documents written in unknown languages. Scholars have identied about ten famous manuscripts, mostly encrypted or written in the unknown language. The most famous is the Voynich manuscript, an illustrated codex hand-written in an unknown language or writing system. Using carbon-dating methods, the researchers determined its age as the early 15th century (between 1404-1438). Many professional and amateur cryptographers have studied the Voynich manuscript, and none has deciphered its meaning as yet, including American and British code-breakers and cryptologists. While there exist many hypotheses about the meaning and structure of the document, they have yet to be conrmed empirically. In this paper, we discuss two dierent kinds of unconventional approaches for how to handle manuscripts with unidentied writing systems and determine whether its properties are characterized by a natural language, or is only historical fake text.

2017 ◽  
Vol 24 (1) ◽  
pp. 123-154 ◽  
Author(s):  
FRANKLIN ỌLÁDIÍPỌ̀ ASAHIAH ◽  
ỌDẸ́TÚNJÍ ÀJÀDÍ ỌDẸ́JỌBÍ ◽  
EMMANUEL RÓTÌMÍ ADÁGÚNODÒ

AbstractA diacritic is a mark placed near or through a character to alter its original phonetic or orthographic value. Many languages around the world use diacritics in their orthography, whatever the writing system the orthography is based on. In many languages, diacritics are ignored either by convention or as a matter of convenience. For users who are not familiar with the text domain, the absence of diacritics within text has been known to cause mild to serious readability and comprehension problems. However, the absence of diacritics in text causes near-intractable problems for natural language processing systems. This situation has led to extensive research on diacritization. Several techniques have been applied to address diacritic restoration (or diacritization) but the existing surveys of techniques have been restricted to some languages and hence left gaps for practitioners to fill. Our survey examined diacritization from the angle of resources deployed and various formulation employed for diacritization. It was concluded by recommending that (a) any proposed technique for diacritization should consider the language features and the purpose served by diacritics, (b) that evaluation metrics needed to be more rigorously defined for easy comparison of performance of models.


English Today ◽  
2020 ◽  
pp. 1-6
Author(s):  
Eun-Young (Julia) Kim

Korea is probably one of the few countries, if not the only one, that observes a holiday in honor of the national language's alphabet. Hangulnal, which falls on October 9, is the Korean Alphabet Day. Each year, the government hosts events to celebrate one of the most prized possessions of the country, Hangul – the writing system of the national language. Created by King Sejong and his Royal Academy Scholars in the 15th century, Hangul is recognized as one of ‘the world's most scientific writing systems ever created by man’ (Sohn, 2001: 13). To outsiders, such pride may appear somewhat overblown, but Koreans do take great pride in Hangul.


Author(s):  
Longtu Zhang ◽  
Mamoru Komachi

Logographic and alphabetic languages (e.g., Chinese vs. English) have different writing systems linguistically. Languages belonging to the same writing system usually exhibit more sharing information, which can be used to facilitate natural language processing tasks such as neural machine translation (NMT). This article takes advantage of the logographic characters in Chinese and Japanese by decomposing them into smaller units, thus more optimally utilizing the information these characters share in the training of NMT systems in both encoding and decoding processes. Experiments show that the proposed method can robustly improve the NMT performance of both “logographic” language pairs (JA–ZH) and “logographic + alphabetic” (JA–EN and ZH–EN) language pairs in both supervised and unsupervised NMT scenarios. Moreover, as the decomposed sequences are usually very long, extra position features for the transformer encoder can help with the modeling of these long sequences. The results also indicate that, theoretically, linguistic features can be manipulated to obtain higher share token rates and further improve the performance of natural language processing systems.


Author(s):  
Norhazlina Husin ◽  
Nuranisah Tan Abdullah ◽  
Aini Aziz

Abstract The teaching of Japanese language as third language to foreign students has its own issues and challenges. It does not merely involve only teaching the four language skills. Japanese language has its own unique values. These unique values also tend to differentiate the teaching of Japanese language as a third language from other third language acquisitions. The teaching of Japanese language as third language to foreign students also involves the teaching of its writing system. This makes the teaching of Japanese language rather complicated because Japanese language has three forms of writings, namely: Hiragana, Katakana and Kanji. Students are required to fully understand the Hiragana system of writing first before proceeding to learn the other two forms of writings. The main challenge in the teaching of Japanese writing systems is the time allocated that can be considered as very limited as other language aspects need to be taught too. This, which relates directly to students’ factor very much contribute to the challenges foreseen. Students are likely to face problems in understanding and using the writings as they simultaneously need to adhere to the findings teaching and learning schedules. This article discusses on the analysis conducted in terms of the learning of the Hiragana and Katagana systems of writing among foreign students. The discussion in this article is based on the teaching of Japanese language to students of Universiti Teknologi MARA(UiTM), Shah Alam. Keywords: Third language, Hiragana, Katakana, Kanji


2005 ◽  
Vol 7 (2) ◽  
pp. 139-163 ◽  
Author(s):  
Richard L. Venezky

Philologists, linguists, and educators have insisted for several centuries that the ideal orthography has a one-to-one correspondence between grapheme and phoneme. Others, however, have suggested deviations for such functions as distinguishing homophones, displaying popular alternative spellings, and retaining morpheme identity. If, indeed, the one-to-one ideal were accepted, the International Phonetic Alphabet should become the orthographic standard for all enlightened nations, yet the failure of even a single country to adopt it for practical writing suggests that other factors besides phonology are considered important for a writing system. Whatever the ideal orthography might be, the practical writing systems adopted upon this earth reflect linguistic, psychological, and cultural considerations. Knowingly or unknowingly, countries have adopted orthographies that favour either the early stages of learning to read or the advanced stages, that is, the experienced reader. The more a system tends towards a one-to-one relationship between graphemes and phonemes, the more it assists the new reader and the non-speaker of the language while the more it marks etymology and morphology, the more it favours the experienced reader. The study of psychological processing in reading demonstrates that human capacities for processing print are so powerful that complex patterns and irregularities pose only a small challenge. Orthographic regularity is extracted from lexical input and used to recognise words during reading. To understand how such a system develops, researchers should draw on the general mechanisms of perceptual learning.


2005 ◽  
Vol 7 (2) ◽  
pp. 205-234
Author(s):  
Martin Neef

Assuming that a writing system is inevitably dependent on a language system, the main function of written representations is to give access to the basic representations of the language system. In this paper, I want to deal with graphematic phenomena, i.e. the relations of written representations to corresponding phonological representations. In particular, I will delve into the relation of written representations to the phonological factor of the number of syllables, based on data from English and German. Though in these languages, there is neither a specific written element relating to the syllable number nor an isomorphic relation between vowel letters and the number of syllables, two questions are worth examining: Can a word have more syllables than vowel letters? Can a word have less syllables than uninterrupted sequences of vowel letters? The first question will be answered positively for both languages although there are some severe differences to be stated; the second question will be answered positively only for English. I will show that these results are side-effects of more basic regularities of the writing systems under consideration.


2019 ◽  
Vol 3 (1) ◽  
pp. 53-67
Author(s):  
Duncan Poupard

A script can be a window into a language and all the culture contained within it. China’s minority peoples have a multitude of scripts, but many are in danger of falling out of use, a decline spurred by the adoption and promotion of standard Chinese across the country. Nevertheless, efforts are being made to preserve minority writing systems. This article reveals how the primarily logographic Naxi dongba script (often labelled the world’s ‘last living pictographs’), used in China’s southwestern Yunnan province to record the Naxi language, can be practically used as a modern writing system alongside its more widely known traditional role as a means of recording religious rites, and what exactly separates these two styles of writing. The efforts that have been made to achieve the goal of modernisation over the past decades are reviewed, including the longstanding attempts at Unicode encoding. I make some suggestions for the future development of the script, and employ plenty of examples from recent publications, alongside phonetic renderings and English translations. It is hoped that overall awareness of this unique script can be raised, and that it can develop into a vernacular script with everyday applications.


2019 ◽  
Author(s):  
Li Liu ◽  
James R. Booth

An important issue in dyslexia research is whether developmental dyslexia in different writing systems has a common neurocognitive basis across writing systems or whether there are specific neurocognitive alterations. In this chapter, we review studies that investigate the neurocognitive basis of dyslexia in Chinese, a logographic writing system, and compare the findings of these studies with dyslexia in alphabetic writing systems. We begin with a brief review of the characteristics of the Chinese writing system because to fully understand the commonality and specificity in the neural basis of Chinese dyslexia one must understand how logographic writing systems are structured differently than alphabetic systems.


2018 ◽  
Vol 2 (1) ◽  
pp. 75-85
Author(s):  
Rouly Doharma Sihite ◽  
Aditya Wikan Mahastama

Transliteration is still a challenge in helping people to read or write from one to another writing systems. Korean transliteration has been a topic of research to automate the conversion between Hangul (Korean writing system) and Latin characters. Previous works have been done in transliterating Hangul to Latin, using statistical approach (72.2% accuracy) and Extended Markov Models (54.9% accuracy). This research focus on transliterating Latin (romanised) Korean words into Hangul, as many learners of Korean began using Latin first. Selected method is modeling the probable vowel and consonant forms and problable vowel and consonant sequences using Finite State Automata to avoid training. These models are then coded into rules which applied and tested to 100 random Korean words. Initial test results only 40% success rate in transliterating due to the nature that consonants have to be labeled as initial or final of a syllable, and some consonants missed the modeled rules. Additional rules are then added to catch-up and merge these consonants into existing proper syllables, which increased the success rate to 92%. This result is analysed further and it is found that certain consonants sequence caused syllabification problem if exist in a certain position. Other additional rules was inserted and yields 99% final success rate which also is the accuracy of transliterating Korean words written in Latin into Hangul characters in compund syllables.


Sign in / Sign up

Export Citation Format

Share Document