Inducing and Refining Topics for Web Query Classification Using a Semantic Network

Web query classification, the task of inferring topical categories from a web search query is a non-trivial problem in Information Retrieval domain. The topic categories inferred by a Web query classification system may provide a rich set of features for improving query expansion and web advertising. Conventional methods for Web query classification derive corpus statistics from the web and employ machine-learning techniques to infer Open Directory Project categories. But they suffer from two major drawbacks, the computational overhead to derive corpus statistics and inferring topic categories that are too abstract for semantic discrimination due to polysemy. Concepts too shallow or too deep in the semantic gradient are produced due to the wrong senses of the query terms coalescing with the correct senses. This paper proposes and demonstrates a succinct solution to these problems through a method based on the Tree cut model and Wordnet Thesarus to infer fine-grained topic categories for Web query classification, and also suggests an enhancement to the Tree Cut Model to resolve sense ambiguities.

Download Full-text

A method of query expansion based on topic models and user profile for search in folksonomy

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210508 ◽

2021 ◽

pp. 1-11

Author(s):

Zhinan Gou ◽

Yan Li

Keyword(s):

Information Retrieval ◽

Query Expansion ◽

Information Overload ◽

Topic Model ◽

User Profile ◽

Expansion Method ◽

Collaborative Tagging ◽

Search Query ◽

Tagging System ◽

The Web

With the development of the web 2.0 communities, information retrieval has been widely applied based on the collaborative tagging system. However, a user issues a query that is often a brief query with only one or two keywords, which leads to a series of problems like inaccurate query words, information overload and information disorientation. The query expansion addresses this issue by reformulating each search query with additional words. By analyzing the limitation of existing query expansion methods in folksonomy, this paper proposes a novel query expansion method, based on user profile and topic model, for search in folksonomy. In detail, topic model is constructed by variational antoencoder with Word2Vec firstly. Then, query expansion is conducted by user profile and topic model. Finally, the proposed method is evaluated by a real dataset. Evaluation results show that the proposed method outperforms the baseline methods.

Download Full-text

A Framework for Personalizing Atypical Web Search Sessions with Concept-Based User Profiles Using Selective Machine Learning Techniques

Advanced Computing and Intelligent Technologies - Lecture Notes in Networks and Systems ◽

10.1007/978-981-16-2164-2_23 ◽

2021 ◽

pp. 279-291

Author(s):

Pradeep Bedi ◽

S. B. Goyal ◽

Anand Singh Rajawat ◽

Rabindra Nath Shaw ◽

Ankush Ghosh

Keyword(s):

Machine Learning ◽

Web Search ◽

Machine Learning Techniques ◽

User Profiles ◽

Learning Techniques

Download Full-text

An Insight of Machine Learning in Web Network Analysis

International Journal of Distributed Artificial Intelligence ◽

10.4018/ijdai.2019070103 ◽

2019 ◽

Vol 11 (2) ◽

pp. 20-34

Author(s):

Meenakshi Sharma ◽

Anshul Garg

Keyword(s):

Machine Learning ◽

Machine Learning Techniques ◽

Effective Solution ◽

Business Organisation ◽

Web Structure ◽

Learning Techniques ◽

Effective Decision ◽

Effective Decision Making ◽

The Given ◽

The Web

The World Wide Web is immensely rich in knowledge. The knowledge comes from both the content and distinctive characteristics of the web like its hyperlink structure. The problem comes in digging the relevant data from the web and giving the most appropriate decision to solve the given problem, which can be used for improving any business organisation. The effective solution of the problem depends on how efficiently and effectively the analysis of the web data is done. In analysing the data on web, not only relevant content analysis is essential but also the analysis of web structure is important. This article gives a brief introduction about the various terminologies and measures like centrality, Page Rank, and density used in the web networking analysis. This article will also give a brief introduction about the various supervised ML techniques such as classification, regression, and unsupervised machine learning techniques such as clustering, etc., which are very useful in analysing the web network so that user can make quick and effective decision making

Download Full-text

Detecting Suspicious Texts Using Machine Learning Techniques

Applied Sciences ◽

10.3390/app10186527 ◽

2020 ◽

Vol 10 (18) ◽

pp. 6527 ◽

Cited By ~ 1

Author(s):

Omar Sharif ◽

Mohammed Moshiul Hoque ◽

A. S. M. Kayes ◽

Raza Nowrozy ◽

Iqbal H. Sarker

Keyword(s):

Machine Learning ◽

Instant Messaging ◽

Classification Model ◽

Machine Learning Techniques ◽

Text Documents ◽

Digital Platforms ◽

Malicious Activity ◽

Internet Users ◽

Learning Techniques ◽

The Web

Due to the substantial growth of internet users and its spontaneous access via electronic devices, the amount of electronic contents has been growing enormously in recent years through instant messaging, social networking posts, blogs, online portals and other digital platforms. Unfortunately, the misapplication of technologies has increased with this rapid growth of online content, which leads to the rise in suspicious activities. People misuse the web media to disseminate malicious activity, perform the illegal movement, abuse other people, and publicize suspicious contents on the web. The suspicious contents usually available in the form of text, audio, or video, whereas text contents have been used in most of the cases to perform suspicious activities. Thus, one of the most challenging issues for NLP researchers is to develop a system that can identify suspicious text efficiently from the specific contents. In this paper, a Machine Learning (ML)-based classification model is proposed (hereafter called STD) to classify Bengali text into non-suspicious and suspicious categories based on its original contents. A set of ML classifiers with various features has been used on our developed corpus, consisting of 7000 Bengali text documents where 5600 documents used for training and 1400 documents used for testing. The performance of the proposed system is compared with the human baseline and existing ML techniques. The SGD classifier ‘tf-idf’ with the combination of unigram and bigram features are used to achieve the highest accuracy of 84.57%.

Download Full-text

Explanation Generation Mechanism for Black Box Recommendation Model

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.h6232.069820 ◽

2020 ◽

Vol 9 (8) ◽

pp. 275-279

Keyword(s):

Recommender System ◽

Semantic Network ◽

Black Box ◽

Machine Learning Techniques ◽

New Approach ◽

Semantic Web Technology ◽

Box Models ◽

Learning Techniques ◽

Available Information ◽

Black Box Models

The recommender system is everywhere, and even streaming platform they have been looking for a maze of user available information handling products and services. Unfortunately, these black box systems do not have sufficient transparency, as they provide littlie description about the their prediction. In contrast, the white box system by its nature can produce a brief description. However, their predictions are less accurate than complex black box models. Recent research has shown that explanations are an important component in bringing powerful big data predictions and machine learning techniques to a mass audience without compromising trust.This paper proposes a new approach using semantic web technology to generate an explanation for the output of a black box recommender system. The developed model is trained to make predictions accompanied by explanations that are automatically extracted from the semantic network.

Download Full-text

Intelligent Peer Networks for Collaborative Web Search

AI Magazine ◽

10.1609/aimag.v29i3.2155 ◽

2008 ◽

Vol 29 (3) ◽

pp. 35 ◽

Cited By ~ 1

Author(s):

Filippo Menczer ◽

Le-Shin Wu ◽

Ruj Akavipat

Keyword(s):

Web Search ◽

Peer Networks ◽

Machine Learning Techniques ◽

New Paradigm ◽

Query Routing ◽

Search Results ◽

Knowledge Diversity ◽

Learning Techniques ◽

Peer Network ◽

Desktop Search

Collaborative query routing is a new paradigm for Web search that treats both established search engines and other publicly available indices as intelligent peer agents in a search network. The approach makes it transparent for anyone to build their own (micro) search engine, by integrating established Web search services, desktop search, and topical crawling techniques. The challenge in this model is that each of these agents must learn about its environment— the existence, knowledge, diversity, reliability, and trustworthiness of other agents — by analyzing the queries received from and results exchanged with these other agents. We present the 6S peer network, which uses machine learning techniques to learn about the changing query environment. We show that simple reinforcement learning algorithms are sufficient to detect and exploit semantic locality in the network, resulting in efficient routing and high-quality search results. A prototype of 6S is available for public use and is intended to assist in the evaluation of different AI techniques employed by the networked agents.

Download Full-text

Measuring inflation expectations ofthe Russian population with the help of machine learning

Voprosy Ekonomiki ◽

10.32609/0042-8736-2017-6-71-93 ◽

2017 ◽

pp. 71-93 ◽

Cited By ~ 1

Author(s):

I. Goloshchapova ◽

M. Andreev

Keyword(s):

Machine Learning ◽

Text Mining ◽

Population Based ◽

Russian Population ◽

Machine Learning Techniques ◽

The Internet ◽

Inflation Expectations ◽

New Approach ◽

Learning Techniques ◽

The Web

The paper proposes a new approach to measure inflation expectations of the Russian population based on text mining of information on the Internet with the help of machine learning techniques. Two indicators were constructed on the base of readers’ comments to inflation news in major Russian economic media available in the web at the period from 2014 through 2016: with the help of words frequency and sentiment analysis of comments content. During the whole considered period of time both indicators were characterized by dynamics adequate to the development of macroeconomic situation and were also able to forecast dynamics of official Bank of Russia indicators of population inflation expectations for approximately one month in advance.

Download Full-text

Web search personalization using machine learning techniques

2014 IEEE International Advance Computing Conference (IACC) ◽

10.1109/iadcc.2014.6779514 ◽

2014 ◽

Cited By ~ 1

Author(s):

Tarannum Bibi ◽

Pratiksha Dixit ◽

Rutuja Ghule ◽

Rohini Jadhav

Keyword(s):

Machine Learning ◽

Web Search ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

An Information Retrieval Expansion Model Based on Wikipedia

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.977.464 ◽

2014 ◽

Vol 977 ◽

pp. 464-467

Author(s):

Li Xin Gan ◽

Wei Tu

Keyword(s):

Information Retrieval ◽

Classification Accuracy ◽

Query Expansion ◽

Semantic Network ◽

Computational Cost ◽

Retrieval Performance ◽

Network Information ◽

Key Technologies ◽

Query Classification ◽

Expansion Model

Query expansion is one of the key technologies for improving precision and recall in information retrieval. In order to overcome limitations of single corpus, in this paper, semantic characteristics of Wikipedia corpus is combined with the standard corpus to extract more rich relationship between terms for construction of a steady Markov semantic network. Information of the entity pages and disambiguation pages in Wikipedia is comprehensively utilized to classify query terms to improve query classification accuracy. Related candidates with high quality can be used for query expansion according to semantic pruning. The proposal in our work is benefit to improve retrieval performance and to save search computational cost.

Download Full-text

An Enhanced Web Document Search Engine using a Semantic Network

REV Journal on Electronics and Communications ◽

10.21553/rev-jec.134 ◽

2016 ◽

Author(s):

Sang Thanh Thi Nguyen ◽

Tuan Thanh Nguyen

Keyword(s):

Web Search ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Semantic Network ◽

Daily Basis ◽

Web Pages ◽

Web Crawler ◽

Web Document ◽

Modelling Techniques ◽

The Web

With the rapid advancement of ICT technology, the World Wide Web (referred to as the Web) has become the biggest information repository whose volume keeps growing on a daily basis. The challenge is how to find the most wanted information from the Web with a minimum effort. This paper presents a novel ontology-based framework for searching the related web pages to a given term within a few given specific websites. With this framework, a web crawler first learns the content of web pages within the given websites, then the topic modeller finds the relations between web pages and topics via key words found on the web pages using the Latent Dirichlet Allocation (LDA) technique. After that, the ontology builder establishes an ontology which is a semantic network of web pages based on the topic model. Finally, a reasoner can find the related web pages to a given term by making use of the ontology. The framework and related modelling techniques have been verified using a few test websites and the results convince its superiority over the existing web search tools.

Download Full-text