scholarly journals RULE-BASED SYLLABIFICATION OF KOREAN WORDS WRITTEN IN LATIN USING DETERMINISTIC FINITE AUTOMATA MODELS

2018 ◽  
Vol 2 (1) ◽  
pp. 75-85
Author(s):  
Rouly Doharma Sihite ◽  
Aditya Wikan Mahastama

Transliteration is still a challenge in helping people to read or write from one to another writing systems. Korean transliteration has been a topic of research to automate the conversion between Hangul (Korean writing system) and Latin characters. Previous works have been done in transliterating Hangul to Latin, using statistical approach (72.2% accuracy) and Extended Markov Models (54.9% accuracy). This research focus on transliterating Latin (romanised) Korean words into Hangul, as many learners of Korean began using Latin first. Selected method is modeling the probable vowel and consonant forms and problable vowel and consonant sequences using Finite State Automata to avoid training. These models are then coded into rules which applied and tested to 100 random Korean words. Initial test results only 40% success rate in transliterating due to the nature that consonants have to be labeled as initial or final of a syllable, and some consonants missed the modeled rules. Additional rules are then added to catch-up and merge these consonants into existing proper syllables, which increased the success rate to 92%. This result is analysed further and it is found that certain consonants sequence caused syllabification problem if exist in a certain position. Other additional rules was inserted and yields 99% final success rate which also is the accuracy of transliterating Korean words written in Latin into Hangul characters in compund syllables.

Author(s):  
Norhazlina Husin ◽  
Nuranisah Tan Abdullah ◽  
Aini Aziz

Abstract The teaching of Japanese language as third language to foreign students has its own issues and challenges. It does not merely involve only teaching the four language skills. Japanese language has its own unique values. These unique values also tend to differentiate the teaching of Japanese language as a third language from other third language acquisitions. The teaching of Japanese language as third language to foreign students also involves the teaching of its writing system. This makes the teaching of Japanese language rather complicated because Japanese language has three forms of writings, namely: Hiragana, Katakana and Kanji. Students are required to fully understand the Hiragana system of writing first before proceeding to learn the other two forms of writings. The main challenge in the teaching of Japanese writing systems is the time allocated that can be considered as very limited as other language aspects need to be taught too. This, which relates directly to students’ factor very much contribute to the challenges foreseen. Students are likely to face problems in understanding and using the writings as they simultaneously need to adhere to the findings teaching and learning schedules. This article discusses on the analysis conducted in terms of the learning of the Hiragana and Katagana systems of writing among foreign students. The discussion in this article is based on the teaching of Japanese language to students of Universiti Teknologi MARA(UiTM), Shah Alam. Keywords: Third language, Hiragana, Katakana, Kanji


2005 ◽  
Vol 7 (2) ◽  
pp. 139-163 ◽  
Author(s):  
Richard L. Venezky

Philologists, linguists, and educators have insisted for several centuries that the ideal orthography has a one-to-one correspondence between grapheme and phoneme. Others, however, have suggested deviations for such functions as distinguishing homophones, displaying popular alternative spellings, and retaining morpheme identity. If, indeed, the one-to-one ideal were accepted, the International Phonetic Alphabet should become the orthographic standard for all enlightened nations, yet the failure of even a single country to adopt it for practical writing suggests that other factors besides phonology are considered important for a writing system. Whatever the ideal orthography might be, the practical writing systems adopted upon this earth reflect linguistic, psychological, and cultural considerations. Knowingly or unknowingly, countries have adopted orthographies that favour either the early stages of learning to read or the advanced stages, that is, the experienced reader. The more a system tends towards a one-to-one relationship between graphemes and phonemes, the more it assists the new reader and the non-speaker of the language while the more it marks etymology and morphology, the more it favours the experienced reader. The study of psychological processing in reading demonstrates that human capacities for processing print are so powerful that complex patterns and irregularities pose only a small challenge. Orthographic regularity is extracted from lexical input and used to recognise words during reading. To understand how such a system develops, researchers should draw on the general mechanisms of perceptual learning.


2007 ◽  
Vol 18 (04) ◽  
pp. 859-871
Author(s):  
MARTIN ŠIMŮNEK ◽  
BOŘIVOJ MELICHAR

A border of a string is a prefix of the string that is simultaneously its suffix. It is one of the basic stringology keystones used as a part of many algorithms in pattern matching, molecular biology, computer-assisted music analysis and others. The paper offers the automata-theoretical description of Iliopoulos's ALL_BORDERS algorithm. The algorithm finds all borders of a string with don't care symbols. We show that ALL_BORDERS algorithm is an implementation of a finite state transducer of specific form. We describe how such a transducer can be constructed and what should be the input string like. The described transducer finds a set of lengths of all borders. Last but not least, we define approximate borders and show how to find all approximate borders of a string when we concern Hamming distance definition. Our solution of this problem is based on transducers again. This allows us to use analogy with automata-based pattern matching methods. Finally we discuss conditions under which the same principle can be used for other distance measures.


2003 ◽  
Vol 15 (8) ◽  
pp. 1931-1957 ◽  
Author(s):  
Peter Tiňo ◽  
Barbara Hammer

We have recently shown that when initialized with “small” weights, recurrent neural networks (RNNs) with standard sigmoid-type activation functions are inherently biased toward Markov models; even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer & Tiňo, 2002; Tiňo, Čerňanský, &Beňušková, 2002a, 2002b). Following Christiansen and Chater (1999), we refer to this phenomenon as the architectural bias of RNNs. In this article, we extend our work on the architectural bias in RNNs by performing a rigorous fractal analysis of recurrent activation patterns. We assume the network is driven by sequences obtained by traversing an underlying finite-state transition diagram&a scenario that has been frequently considered in the past, for example, when studying RNN-based learning and implementation of regular grammars and finite-state transducers. We obtain lower and upper bounds on various types of fractal dimensions, such as box counting and Hausdorff dimensions. It turns out that not only can the recurrent activations inside RNNs with small initial weights be explored to build Markovian predictive models, but also the activations form fractal clusters, the dimension of which can be bounded by the scaled entropy of the underlying driving source. The scaling factors are fixed and are given by the RNN parameters.


2005 ◽  
Vol 7 (2) ◽  
pp. 205-234
Author(s):  
Martin Neef

Assuming that a writing system is inevitably dependent on a language system, the main function of written representations is to give access to the basic representations of the language system. In this paper, I want to deal with graphematic phenomena, i.e. the relations of written representations to corresponding phonological representations. In particular, I will delve into the relation of written representations to the phonological factor of the number of syllables, based on data from English and German. Though in these languages, there is neither a specific written element relating to the syllable number nor an isomorphic relation between vowel letters and the number of syllables, two questions are worth examining: Can a word have more syllables than vowel letters? Can a word have less syllables than uninterrupted sequences of vowel letters? The first question will be answered positively for both languages although there are some severe differences to be stated; the second question will be answered positively only for English. I will show that these results are side-effects of more basic regularities of the writing systems under consideration.


2019 ◽  
Vol 3 (1) ◽  
pp. 53-67
Author(s):  
Duncan Poupard

A script can be a window into a language and all the culture contained within it. China’s minority peoples have a multitude of scripts, but many are in danger of falling out of use, a decline spurred by the adoption and promotion of standard Chinese across the country. Nevertheless, efforts are being made to preserve minority writing systems. This article reveals how the primarily logographic Naxi dongba script (often labelled the world’s ‘last living pictographs’), used in China’s southwestern Yunnan province to record the Naxi language, can be practically used as a modern writing system alongside its more widely known traditional role as a means of recording religious rites, and what exactly separates these two styles of writing. The efforts that have been made to achieve the goal of modernisation over the past decades are reviewed, including the longstanding attempts at Unicode encoding. I make some suggestions for the future development of the script, and employ plenty of examples from recent publications, alongside phonetic renderings and English translations. It is hoped that overall awareness of this unique script can be raised, and that it can develop into a vernacular script with everyday applications.


Energies ◽  
2019 ◽  
Vol 12 (4) ◽  
pp. 684 ◽  
Author(s):  
Dong Kim ◽  
Beom Chung ◽  
Young Chung

In this paper, we propose a method to estimate communication performance for the advanced metering infrastructure that employs the power line communication (PLC) technology. Using bit-per-symbol signals from the PLC network management system, we estimate a PLC model quality in terms of packet success rate based on statistical learning. We also verify the accuracy of the estimations by comparing them with measured communication test results at test sites. Finally, from the packet success rate estimate, the qualities of services, such as meter readings and time-of-use pricing data downloading under several metering protocol sequences, are investigated through a mathematical analysis, and numerical results are provided.


2019 ◽  
Author(s):  
Li Liu ◽  
James R. Booth

An important issue in dyslexia research is whether developmental dyslexia in different writing systems has a common neurocognitive basis across writing systems or whether there are specific neurocognitive alterations. In this chapter, we review studies that investigate the neurocognitive basis of dyslexia in Chinese, a logographic writing system, and compare the findings of these studies with dyslexia in alphabetic writing systems. We begin with a brief review of the characteristics of the Chinese writing system because to fully understand the commonality and specificity in the neural basis of Chinese dyslexia one must understand how logographic writing systems are structured differently than alphabetic systems.


Sign in / Sign up

Export Citation Format

Share Document