Improving Statistical Parser by Recognition of Chinese Number and Quantifier Prefix in Machine Translation
By studying the Chinese number and quantifier prefix (CNQP) as a special language phenomenon in machine translation, this paper presents a CNQP recognition method, which is rule based and independent of word segmentation. The method expressed CNQPs compositions using Backus-Naur Form (BNF), and took the numeral as the active information and the quantifiers as the boundaries of the CNQPs. To avoid the word segmentation noise, a forward maximum matching method was used for obtaining the compositions of the CNQPs, which can be fed into the statistical parser for the analysis of the Chinese sentences. The experimental results indicate the proposed method as a pre-processing module can effectively improve the parsing results of the statistical parser without retraining on experimental data constructed manually, which can further enhance the translation qualities.