Learning a Frequency-Matching Grammar together with Lexical Idiosyncrasy: MaxEnt versus Hierarchical Regression
Experimental research has uncovered language learners’ ability to frequency-match to statistical generalizations across the lexicon, while also acquiring the idiosyncratic behavior of individual attested words. How can we model the learning of a frequency-matching grammar together with lexical idiosyncrasy? A recent approach based in the single-level regression model Maximum Entropy Harmonic Grammar makes use of general constraints that putatively capture statistical generalizations across the lexicon, as well as lexical constraints governing the behavior of individual words. I argue on the basis of learning simulations that the approach fails to learn statistical generalizations across the lexicon, running into what I call the GRAMMAR-LEXICON BALANCING PROBLEM: lexical constraints are so powerful that the learner comes to acquire the behavior of each attested form using only these constraints, at which point the general constraint is rendered superfluous and ineffective. I argue that MaxEnt be replaced with the HIERARCHICAL REGRESSION MODEL: multiple layers of regression structure, corresponding to different levels of a hierarchy of generalizations. Hierarchical regression is shown to surmount the grammar-lexicon balancing problem—learning a frequency-matching grammar together with lexical idiosyncrasy—by encoding general constraints as fixed effects and lexical constraints as a random effect. The model is applied to variable Slovenian palatalization, with promising results.