Using data-mining to identify and study patterns in lexical
innovation on the web
Abstract This paper presents the NeoCrawler – a tailor-made webcrawler, which identifies and retrieves neologisms from the Internet and systematically monitors the use of detected neologisms on the web by means of weekly searches. It enables researchers to use the web as a corpus in order to investigate the dynamics of lexical innovation on a large-scale and systematic basis. The NeoCrawler represents an innovative web-mining tool which opens up new opportunities for linguists to tackle a number of unresolved and under-researched issues in the field of lexical innovation. This paper presents the design as well as the most important characteristics of two modules, the Discoverer and the Observer, with regard to the usage-based study of lexical innovation and diffusion.