The car pet in the carpet. On the interaction of computer-linguistic methodology and manual refinement in researching noun compounds
<p>Why does football combine productively with further nouns to form more complex expressions like football game, whereas seemingly comparable compounds like keyword only infrequently expand to more complex sequences? This project explores why some two-noun compounds are more readily available for forming triconstituent constructions than others. I hypothesize that the productivity of a two-noun compound in the formation of triconstituent sequences depends on the degree of entrenchment of that two-noun compound, assuming that only compounds that are entrenched to a certain degree are productive in forming more complex constructions. In order to test this hypothesis, a list of three-noun compounds in the English language needed to be compiled. The obvious thing to do would be to search for sequences of three nouns in POS-tagged corpora. However, since such automatized searches on the one hand do not allow the recall of all required instances and, on the other hand, often create results that are not precise enough, this requires substantial manual screening. Furthermore, in order to operationalize the concepts of entrenchment and productivity, it was necessary to count the usage frequencies of noun constructions. For this work, as well, the automatic elicitation of the data needed to be complemented by further manual selection in order to obtain correct usage frequencies. Both the complex automatic and manual work processes in the elicitation of the data will be presented in detail to give an impression of the extent of such a project.</p>