Building a Collation Element Table for a Large Chinese Character Set in YES

Author(s):  
Xiaoheng Zhang ◽  
Xiaotong Li
2008 ◽  
Vol 59 (9) ◽  
pp. 1528-1530 ◽  
Author(s):  
Loet Leydesdorff ◽  
Ping Zhou

Author(s):  
RUIFENG XU ◽  
DANIEL YEUNG ◽  
WENHAO SHU ◽  
JIAFENG LIU

In this paper, a hybrid post-processing system for improving the performance of Handwritten Chinese Character Recognition is presented. In order to remove two kinds of frequently encountered errors in the recognition result, namely mis-recognized character and unrecognized character, both confusing character characteristics of the recognizer and the contextual linguistic information are utilized in our hybrid three-stage post-processing system. In the first stage, the confusing character set and a statistical Noisy-Channel model are employed to identify the most promising candidate character and append possible unrecognized similar-shaped characters into candidate character set when a candidate sequence is given. Secondly, dictionary-based approximate word matching is conducted to further append contextual linguistic-prone characters into candidate character set and bind the candidate characters into a word-lattice. Finally, a Chinese word BI-Gram Markov model is employed in the third stage to identify a most promising sentence by selecting plausible words from the word-lattice. On the average, our system achieves a 5.1% recognition rate improvement for the first candidate when the original character recognition rate is 90% for the first candidate and 95% for the top-10 candidates by an online HCCR engine.


Sign in / Sign up

Export Citation Format

Share Document