Variety, Idiosyncracy, and Complexity in Language and Language Technologies
This paper addresses three issues in language technologies. For each issue, the paper recommends an area of linguistics that is easily accessible to computer scientists and provides some examples that may be thought-provoking. The first issue is linguistic diversity, which is addressed by language typology. Typology provides an insightful view of the syntax and semantics of word order, as presented in Section 2.2. The second issue is the long tail of sparse phenomena. Section 3.3 uses Construction Grammar as a framework for addressing the details of definiteness and modality. Finally, Section 4 addresses how to make error analysis fun. It moves beyond monoclausal sentences and revives some rules from 1970s style transformational grammar as a fun way to analyze complex sentences.