Statistical Evidence for Learnable Lexical Subclasses in Japanese
It has been proposed that the Japanese lexicon can be divided into etymologically defined sublexica on phonotactic and other grounds. However, the psychological reality of this sublexical analysis has been challenged by some authors who have appealed to putative problem with the learnability of the system. In this study, we apply a commonly used clustering method to Japanese words and show that there is robust statistical evidence for the sublexica and, thereby, that such sublexica are learnable. The model is able to recover phonotactic properties of sublexica previously discussed in the literature, and also reveals hitherto unnoticed phonotactic properties which are characteristic of sublexical membership and which can serve as a basis for future experimental investigations. The proposed approach is general and based purely on phonotactic information and thus can be applied to other languages.