scholarly journals A part of speech estimation method for Japanese unknown words using a statistical model of morphology and context

Author(s):  
Masaaki Nagata
2008 ◽  
Vol 13 (2) ◽  
pp. 169-193 ◽  
Author(s):  
Xiaofei Lu

This paper presents a hybrid model for part-of-speech (POS) guessing of Chinese unknown words. Most previous studies on this task have developed a unified statistical model for all Chinese unknown words and have rejected rule-based models without testing. We argue that models that use different sources of information about unknown words, both structural and contextual, can be effective for handling different types of unknown words. We propose a rule-based model that uses information about the type, length, and internal structure of unknown words and combine it with two existing statistical models that use information about the POS context and component characters of unknown words respectively for this task. By combining the complementary strengths of the three models that use different sources of information, the hybrid model achieves an accuracy of 89%, a significant improvement over the best result reported in previous studies.


1996 ◽  
Vol 2 (2) ◽  
pp. 111-136 ◽  
Author(s):  
ANDREI MIKHEEV

Words unknown to the lexicon present a substantial problem to part-of-speech tagging. In this paper we present a technique for fully unsupervised acquisition of rules which guess possible parts of speech for unknown words. This technique does not require specially prepared training data, and uses instead the lexicon supplied with a tagger and word frequencies collected from a raw corpus. Three complimentary sets of word-guessing rules are statistically induced: prefix morphological rules, suffix morphological rules and ending guessing rules. The acquisition process is strongly associated with guessing-rule evaluation methodology which is solely dedicated to the performance of part-of-speech guessers. Using the proposed technique a guessing-rule induction experiment was performed on the Brown Corpus data and rule-sets, with a highly competitive performance, were produced and compared with the state-of-the-art. To evaluate the impact of the word-guessing component on the overall tagging performance, it was integrated into a stochastic and a rule-based tagger and applied to texts with unknown words.


2012 ◽  
Vol 48 (12) ◽  
pp. 727 ◽  
Author(s):  
T. Moazzeni ◽  
A. Amei ◽  
J. Ma ◽  
Y. Jiang

Robotica ◽  
1999 ◽  
Vol 17 (6) ◽  
pp. 649-660 ◽  
Author(s):  
Alireza Bab-Hadiashar ◽  
David Suter

A method of data segmentation, based upon robust least K-th order statistical model fitting (LKS), is proposed and applied to image motion and range data segmentation. The estimation method differs from other approaches using versions of LKS in a number of important ways. Firstly, the value of K is not determined by a complex optimization routine. Secondly, having chosen a fit, the estimation of scale of the noise is not based upon the K-th order statistic of the residuals. Other aspects of the full segmentation scheme include the use of segment contiguity to: (a) reduce the number of random sample fits used in the LKS stage, and (b) to “fill-in” holes caused by isolated miss-classified data.


Author(s):  
Tobias Schnabel ◽  
Hinrich Schütze

We present FLORS, a new part-of-speech tagger for domain adaptation. FLORS uses robust representations that work especially well for unknown words and for known words with unseen tags. FLORS is simpler and faster than previous domain adaptation methods, yet it has significantly better accuracy than several baselines.


1997 ◽  
Vol 29 (4) ◽  
pp. 531-553 ◽  
Author(s):  
Paula J. Schwanenflugel ◽  
Steven A. Stahl ◽  
Elisabeth L. McFalls

The experiment investigated the development of vocabulary knowledge in elementary school children as a function of story reading for partially known and unknown words. Fourth graders participated in a vocabulary checklist in which they provided definitions or sentences for words they knew (known words) and checked off words they did not know the meaning of but were familiar with (partially known words). Children then read stories containing some of these words. The remaining words served as a control. Vocabulary growth was small but even for both partially known and unknown words. However, the characteristics of the words being learned themselves (particularly, part of speech and concreteness) were more important in determining this growth than aspects of the texts.


Sign in / Sign up

Export Citation Format

Share Document