An Improved Method of Applying a Machine Translation Model to a Chinese Word Segmentation Task

Author(s):  
Yuekun Wei ◽  
Binbin Qu ◽  
Nan Hu ◽  
Liu Han
2013 ◽  
Vol 791-793 ◽  
pp. 1622-1625
Author(s):  
Dan Han ◽  
Zhi Han Yu

In this article, we mainly introduce some basic concepts about machine translation. Machine translation means translating a natural language text to another by software. It can be divided into two categories: rule-based and corpus-based. IBM's statistical machine translation, Microsoft's multi-language machine translation project, AT & T's voice translation system and CMUs PANGLOSS system are three typical machine translation systems. Due to sentences are constructed by words continuously in Chinese. Chinese word segmentation is very essential. Three methods of Chinese word segmentation: segmentation methods based on string matching, segmentation method based on the understanding and segmentation method based on the statistics.


2008 ◽  
Vol 53 (3) ◽  
pp. 630-647 ◽  
Author(s):  
Zhijie Wu

Abstract The Chinese language, unlike some western languages, is written without a space between any two words, which presents itself as a unique problem in Machine Translation: how to segment words in Chinese? The current word-segmentation systems in Machine Translation are either linguistically-oriented or statistically-oriented. Both types, however, have some innate defects that cannot be overcome due to the pragmatically-oriented feature of the Chinese language. This research aims at addressing the problem of Chinese word segmentation of Machine Translation in light of a language investigation consisting of two surveys and eight interviews.


2012 ◽  
Vol 56 (3) ◽  
pp. 631-644
Author(s):  
Zhijie Wu

The Chinese language, unlike English, is written without marked word boundaries, and Chinese word segmentation is often referred to as the bottleneck for Chinese-English machine translation. The current word-segmentation systems in machine translation are either linguistically-oriented or statistically-oriented. Chinese, however, is a pragmatically-oriented language, which explains why the existing Chinese word segmentation systems in machine translation are not successful in dealing with the language. Based on a language investigation consisting of two surveys and eight interviews, and its findings concerning how Chinese people segment a Chinese sentence into words in their reading, we have developed a new word-segmentation model, aiming to address the word-segmentation problem in machine translation from a cognitive perspective.


Sign in / Sign up

Export Citation Format

Share Document