Inducing and Refining Topics for Web Query Classification Using a Semantic Network

2018 ◽  
Vol 15 (2) ◽  
pp. 595-600
Author(s):  
R. Sathish Kumar ◽  
M. Chandrasekaran

Web query classification, the task of inferring topical categories from a web search query is a non-trivial problem in Information Retrieval domain. The topic categories inferred by a Web query classification system may provide a rich set of features for improving query expansion and web advertising. Conventional methods for Web query classification derive corpus statistics from the web and employ machine-learning techniques to infer Open Directory Project categories. But they suffer from two major drawbacks, the computational overhead to derive corpus statistics and inferring topic categories that are too abstract for semantic discrimination due to polysemy. Concepts too shallow or too deep in the semantic gradient are produced due to the wrong senses of the query terms coalescing with the correct senses. This paper proposes and demonstrates a succinct solution to these problems through a method based on the Tree cut model and Wordnet Thesarus to infer fine-grained topic categories for Web query classification, and also suggests an enhancement to the Tree Cut Model to resolve sense ambiguities.

2021 ◽  
pp. 1-11
Author(s):  
Zhinan Gou ◽  
Yan Li

With the development of the web 2.0 communities, information retrieval has been widely applied based on the collaborative tagging system. However, a user issues a query that is often a brief query with only one or two keywords, which leads to a series of problems like inaccurate query words, information overload and information disorientation. The query expansion addresses this issue by reformulating each search query with additional words. By analyzing the limitation of existing query expansion methods in folksonomy, this paper proposes a novel query expansion method, based on user profile and topic model, for search in folksonomy. In detail, topic model is constructed by variational antoencoder with Word2Vec firstly. Then, query expansion is conducted by user profile and topic model. Finally, the proposed method is evaluated by a real dataset. Evaluation results show that the proposed method outperforms the baseline methods.


2019 ◽  
Vol 11 (2) ◽  
pp. 20-34
Author(s):  
Meenakshi Sharma ◽  
Anshul Garg

The World Wide Web is immensely rich in knowledge. The knowledge comes from both the content and distinctive characteristics of the web like its hyperlink structure. The problem comes in digging the relevant data from the web and giving the most appropriate decision to solve the given problem, which can be used for improving any business organisation. The effective solution of the problem depends on how efficiently and effectively the analysis of the web data is done. In analysing the data on web, not only relevant content analysis is essential but also the analysis of web structure is important. This article gives a brief introduction about the various terminologies and measures like centrality, Page Rank, and density used in the web networking analysis. This article will also give a brief introduction about the various supervised ML techniques such as classification, regression, and unsupervised machine learning techniques such as clustering, etc., which are very useful in analysing the web network so that user can make quick and effective decision making


2020 ◽  
Vol 10 (18) ◽  
pp. 6527 ◽  
Author(s):  
Omar Sharif ◽  
Mohammed Moshiul Hoque ◽  
A. S. M. Kayes ◽  
Raza Nowrozy ◽  
Iqbal H. Sarker

Due to the substantial growth of internet users and its spontaneous access via electronic devices, the amount of electronic contents has been growing enormously in recent years through instant messaging, social networking posts, blogs, online portals and other digital platforms. Unfortunately, the misapplication of technologies has increased with this rapid growth of online content, which leads to the rise in suspicious activities. People misuse the web media to disseminate malicious activity, perform the illegal movement, abuse other people, and publicize suspicious contents on the web. The suspicious contents usually available in the form of text, audio, or video, whereas text contents have been used in most of the cases to perform suspicious activities. Thus, one of the most challenging issues for NLP researchers is to develop a system that can identify suspicious text efficiently from the specific contents. In this paper, a Machine Learning (ML)-based classification model is proposed (hereafter called STD) to classify Bengali text into non-suspicious and suspicious categories based on its original contents. A set of ML classifiers with various features has been used on our developed corpus, consisting of 7000 Bengali text documents where 5600 documents used for training and 1400 documents used for testing. The performance of the proposed system is compared with the human baseline and existing ML techniques. The SGD classifier ‘tf-idf’ with the combination of unigram and bigram features are used to achieve the highest accuracy of 84.57%.


The recommender system is everywhere, and even streaming platform they have been looking for a maze of user available information handling products and services. Unfortunately, these black box systems do not have sufficient transparency, as they provide littlie description about the their prediction. In contrast, the white box system by its nature can produce a brief description. However, their predictions are less accurate than complex black box models. Recent research has shown that explanations are an important component in bringing powerful big data predictions and machine learning techniques to a mass audience without compromising trust.This paper proposes a new approach using semantic web technology to generate an explanation for the output of a black box recommender system. The developed model is trained to make predictions accompanied by explanations that are automatically extracted from the semantic network.


AI Magazine ◽  
2008 ◽  
Vol 29 (3) ◽  
pp. 35 ◽  
Author(s):  
Filippo Menczer ◽  
Le-Shin Wu ◽  
Ruj Akavipat

Collaborative query routing is a new paradigm for Web search that treats both established search engines and other publicly available indices as intelligent peer agents in a search network. The approach makes it transparent for anyone to build their own (micro) search engine, by integrating established Web search services, desktop search, and topical crawling techniques. The challenge in this model is that each of these agents must learn about its environment— the existence, knowledge, diversity, reliability, and trustworthiness of other agents — by analyzing the queries received from and results exchanged with these other agents. We present the 6S peer network, which uses machine learning techniques to learn about the changing query environment. We show that simple reinforcement learning algorithms are sufficient to detect and exploit semantic locality in the network, resulting in efficient routing and high-quality search results. A prototype of 6S is available for public use and is intended to assist in the evaluation of different AI techniques employed by the networked agents.


2017 ◽  
pp. 71-93 ◽  
Author(s):  
I. Goloshchapova ◽  
M. Andreev

The paper proposes a new approach to measure inflation expectations of the Russian population based on text mining of information on the Internet with the help of machine learning techniques. Two indicators were constructed on the base of readers’ comments to inflation news in major Russian economic media available in the web at the period from 2014 through 2016: with the help of words frequency and sentiment analysis of comments content. During the whole considered period of time both indicators were characterized by dynamics adequate to the development of macroeconomic situation and were also able to forecast dynamics of official Bank of Russia indicators of population inflation expectations for approximately one month in advance.


2014 ◽  
Vol 977 ◽  
pp. 464-467
Author(s):  
Li Xin Gan ◽  
Wei Tu

Query expansion is one of the key technologies for improving precision and recall in information retrieval. In order to overcome limitations of single corpus, in this paper, semantic characteristics of Wikipedia corpus is combined with the standard corpus to extract more rich relationship between terms for construction of a steady Markov semantic network. Information of the entity pages and disambiguation pages in Wikipedia is comprehensively utilized to classify query terms to improve query classification accuracy. Related candidates with high quality can be used for query expansion according to semantic pruning. The proposal in our work is benefit to improve retrieval performance and to save search computational cost.


Author(s):  
Sang Thanh Thi Nguyen ◽  
Tuan Thanh Nguyen

With the rapid advancement of ICT technology, the World Wide Web (referred to as the Web) has become the biggest information repository whose volume keeps growing on a daily basis. The challenge is how to find the most wanted information from the Web with a minimum effort. This paper presents a novel ontology-based framework for searching the related web pages to a given term within a few given specific websites. With this framework, a web crawler first learns the content of web pages within the given websites, then the topic modeller finds the relations between web pages and topics via key words found on the web pages using the Latent Dirichlet Allocation (LDA) technique. After that, the ontology builder establishes an ontology which is a semantic network of web pages based on the topic model. Finally, a reasoner can find the related web pages to a given term by making use of the ontology. The framework and related modelling techniques have been verified using a few test websites and the results convince its superiority over the existing web search tools.


Sign in / Sign up

Export Citation Format

Share Document