scholarly journals Ergodic multigram HMM integrating word segmentation and class tagging for Chinese language modeling

Author(s):  
Hubert Hin-Cheung Law ◽  
Chorkin Chan
2008 ◽  
Vol 53 (3) ◽  
pp. 630-647 ◽  
Author(s):  
Zhijie Wu

Abstract The Chinese language, unlike some western languages, is written without a space between any two words, which presents itself as a unique problem in Machine Translation: how to segment words in Chinese? The current word-segmentation systems in Machine Translation are either linguistically-oriented or statistically-oriented. Both types, however, have some innate defects that cannot be overcome due to the pragmatically-oriented feature of the Chinese language. This research aims at addressing the problem of Chinese word segmentation of Machine Translation in light of a language investigation consisting of two surveys and eight interviews.


Author(s):  
Kunyu Lian ◽  
Jie Ma ◽  
Feifei Liang ◽  
Ling Wei ◽  
Shuwei Zhang ◽  
...  

How frequently a character appears in a word (positional character frequency) is used as a cue in word segmentation when reading aloud in the Chinese language. In this study we created 176 sentences with a target word in the center of each. Participants were 76 college students (mature readers) and 76 third-grade students (beginner readers). Results show an interaction effect of age and positional frequency of the initial character in the word on gaze duration. Further analysis shows that the third-grade students’ gaze duration was significantly longer in high, relative to low, positional character frequency of the target words. This trend was consistent with refixation duration, and there was a marginally significant interaction between age and total fixation time. Overall, positional character frequency was an important cue for word segmentation in oral reading in the Chinese language, and third-grade students relied more heavily on this cue than did college students.


2012 ◽  
Vol 56 (3) ◽  
pp. 631-644
Author(s):  
Zhijie Wu

The Chinese language, unlike English, is written without marked word boundaries, and Chinese word segmentation is often referred to as the bottleneck for Chinese-English machine translation. The current word-segmentation systems in machine translation are either linguistically-oriented or statistically-oriented. Chinese, however, is a pragmatically-oriented language, which explains why the existing Chinese word segmentation systems in machine translation are not successful in dealing with the language. Based on a language investigation consisting of two surveys and eight interviews, and its findings concerning how Chinese people segment a Chinese sentence into words in their reading, we have developed a new word-segmentation model, aiming to address the word-segmentation problem in machine translation from a cognitive perspective.


2017 ◽  
Vol 10 (2) ◽  
pp. 165-173 ◽  
Author(s):  
Xinxin Shu ◽  
Junhui Wang ◽  
Xiaotong Shen ◽  
Annie Qu

Sign in / Sign up

Export Citation Format

Share Document