RuThes Thesaurus for Natural Language Processing
AbstractThis chapter describes the Russian RuThes thesaurus created as a linguistic and terminological resource for automatic document processing. Its structure utilizes two popular paradigms for computer thesauri: concept-based units, a small set of relation types, rules for including multiword expression as in information retrieval thesauri; and language-motivated units, detailed sets of synonyms, description of ambiguous words as in WordNet-like thesauri. The development of the RuThes thesaurus is supported for many years: new concepts, new senses, and multiword expressions found in contemporary texts are introduced regularly. The chapter shows some examples of representing newly appeared concepts related to important internal and international events.