Inducing and Refining Topics for Web Query Classification Using a Semantic Network
Web query classification, the task of inferring topical categories from a web search query is a non-trivial problem in Information Retrieval domain. The topic categories inferred by a Web query classification system may provide a rich set of features for improving query expansion and web advertising. Conventional methods for Web query classification derive corpus statistics from the web and employ machine-learning techniques to infer Open Directory Project categories. But they suffer from two major drawbacks, the computational overhead to derive corpus statistics and inferring topic categories that are too abstract for semantic discrimination due to polysemy. Concepts too shallow or too deep in the semantic gradient are produced due to the wrong senses of the query terms coalescing with the correct senses. This paper proposes and demonstrates a succinct solution to these problems through a method based on the Tree cut model and Wordnet Thesarus to infer fine-grained topic categories for Web query classification, and also suggests an enhancement to the Tree Cut Model to resolve sense ambiguities.