Semantic Question Answering Using Wikipedia Categories Clustering

We describe a system that performs semantic Question Answering based on the combination of classic Information Retrieval methods with semantic ones. First, we use a search engine to gather web pages and then apply a noun phrase extractor to extract all the candidate answer entities from them. Candidate entities are ranked using a linear combination of two IR measures to pick the most relevant ones. For each one of the top ranked candidate entities we find the corresponding Wikipedia page. We then propose a novel way to exploit Semantic Information contained in the structure of Wikipedia. A vector is built for every entity from Wikipedia category names by splitting and lemmatizing the words that form them. These vectors maintain Semantic Information in the sense that we are given the ability to measure semantic closeness between the entities. Based on this, we apply an intelligent clustering method to the candidate entities and show that candidate entities in the biggest cluster are the most semantically related to the ideal answers to the query. Results on the topics of the TREC 2009 Related Entity Finding task dataset show promising performance.

Download Full-text

Deep Learning Based Question Answering Search Engine

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit2172139 ◽

2021 ◽

pp. 25-32

Author(s):

Mrunal Malekar

Keyword(s):

Information Retrieval ◽

Deep Learning ◽

Natural Language ◽

Search Engine ◽

Language Processing ◽

Question Answering ◽

Research Work ◽

Construction Company ◽

Exact Answer ◽

Search For Information

Domain based Question Answering is concerned with building systems which provide answers to natural language questions that are asked specific to a domain. It comes under Information Retrieval and Natural language processing. Using Information Retrieval, one can search for the relevant documents which may contain the answer but it won’t give the exact answer for the question asked. In the presented work, a question answering search engine has been developed which first finds out the relevant documents from a huge textual document data of a construction company and then goes a step beyond to extract answer from the extracted document. The robust question answering system developed uses Elastic Search for Information Retrieval [paragraphs extraction] and Deep Learning for answering the question from the short extracted paragraph. It leverages BERT Deep Learning Model to understand the layers and representations between the question and answer. The research work also focuses on how to improve the search accuracy of the Information Retrieval based Elastic Search engine which returns the relevant documents which may contain the answer.

Download Full-text

User Relevance Feedback in Semantic Information Retrieval

Emerging Topics and Technologies in Information Systems ◽

10.4018/978-1-60566-222-0.ch016 ◽

2009 ◽

pp. 270-281

Author(s):

Antonio Picariello ◽

Antonio M. Rinaldi

Keyword(s):

Information Retrieval ◽

Relevance Feedback ◽

Semantic Information ◽

Information Retrieval System ◽

Web Pages ◽

General Knowledge ◽

Retrieval Systems ◽

Crucial Component ◽

Information Retrieval Systems ◽

Semantic Information Retrieval

The user dimension is a crucial component in the information retrieval process and for this reason it must be taken into account in planning and technique implementation in information retrieval systems. In this paper we present a technique based on relevance feedback to improve the accuracy in an ontology based information retrieval system. Our proposed method combines the semantic information in a general knowledge base with statistical information using relevance feedback. Several experiments and results are presented using a test set constituted of Web pages.

Download Full-text

ALGORITHMS FOR COMBINING INFORMATION ABOUT WEB-PAGES WITH BACKGROUND ONTOLOGICAL KNOWLEDGE

Modeling of systems and processes ◽

10.12737/article_5db1e3e60bef13.23500137 ◽

2019 ◽

Vol 12 (2) ◽

pp. 32-37

Author(s):

Е. Коновальчук ◽

E. Konoval'chuk ◽

В. Лавлинский ◽

V. Lavlinskiy ◽

С. Яньшин ◽

...

Keyword(s):

Information Retrieval ◽

Information Technologies ◽

Semantic Information ◽

Semantic Networks ◽

Web Pages ◽

Web Data ◽

Web Information ◽

Semantic Information Retrieval ◽

Combining Information

Currently, web information technologies are widely developed on the basis of the development of approaches related to the addition of semantics to web data, and the development of the apparatus of semantic networks. This article proposes the development of methods of semantic information retrieval based on algorithms of its association. This approach allows to use the analysis of web-pages with background ontological knowledge and increases its efficiency in comparison with standard approaches.

Download Full-text

Why-type Question to Query Reformulation for efficient Document Retrieval

International Journal of Information Retrieval Research ◽

10.4018/ijirr.289948 ◽

2022 ◽

Vol 12 (1) ◽

pp. 0-0

Keyword(s):

User Interface ◽

Search Engine ◽

Question Answering ◽

Document Retrieval ◽

Web Pages ◽

Query Reformulation ◽

Different Types ◽

Why Questions ◽

Google Search ◽

Type Question

Understanding the actual need of user from a question is very crucial in non-factoid why-question answering as Why-questions are complex and involve ambiguity and redundancy in their understanding. The precise requirement is to determine the focus of question and reformulate them accordingly to retrieve expected answers to a question. The paper analyzes different types of why-questions and proposes an algorithm for each class to determine the focus and reformulate it into a query by appending focal terms and cue phrase ‘because’ with it. Further, a user interface is implemented which asks input why-question, applies different components of question , reformulates it and finally retrieve web pages by posing query to Google search engine. To measure the accuracy of the process, user feedback is taken which asks them to assign scoring from 1 to 10, on how relevant are the retrieved web pages according to their understanding. The results depict that maximum precision of 89% is achieved in Informational type why-questions and minimum of 48% in opinionated type why-questions.

Download Full-text

A Detection Method for Phishing Web Page Using DOM-Based Doc2Vec Model

Journal of Computing and Information Technology ◽

10.20532/cit.2020.1004899 ◽

2020 ◽

Vol 28 (1) ◽

pp. 19-31

Author(s):

Jian Feng ◽

Ying Zhang ◽

Yuqiang Qiao

Keyword(s):

Semantic Information ◽

Detection Method ◽

Structural Characteristics ◽

Web Pages ◽

Web Page ◽

Clustering Method ◽

Semantic Clustering ◽

Dom Tree ◽

Structural Semantics ◽

Linguistic Approach

Detecting phishing web pages is a challenging task. The existing detection method for phishing web page based on DOM (Document Object Model) is mainly aiming at obtaining structural characteristics but ignores the overall representation of web pages and the semantic information that HTML tags may have. This paper regards DOMs as a natural language with Doc2Vec model and learns the structural semantics automatically to detect phishing web pages. Firstly, the DOM structure of the obtained web page is parsed to construct the DOM tree, then the Doc2Vec model is used to vectorize the DOM tree, and to measure the semantic similarity in web pages by the distance between different DOM vectors. Finally, the hierarchical clustering method is used to implement clustering of web pages. Experiments show that the method proposed in the paper achieves higher recall and precision for phishing classification, compared to DOM-based structural clustering method and TF-IDF-based semantic clustering method. The result shows that using Paragraph Vector is effective on DOM in a linguistic approach.

Download Full-text

Design of a Novel Search Engine for Prospective Question Answering

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2014040102 ◽

2014 ◽

Vol 4 (2) ◽

pp. 19-40

Author(s):

Rosy Madaan ◽

A.K. Sharma ◽

Ashutosh Dixit

Keyword(s):

Information Processing ◽

Search Engine ◽

Question Answering ◽

Experimental Results ◽

Web Pages ◽

Source Of Information ◽

Answering Questions ◽

Better Than

Question answering offers a more intuitive approach to information processing. A number of approaches have been used for answering questions. In this paper, we propose a questionansweringsystem that uses blogs as its source of information. The system deals with crawling blog pages, summarizing them, indexing and then ranking the summarized content. The user asks a question and gets answer(s) in response. The answer(s) obtained are better as compared to those provided by the existing QA systems that use the general web pages for the purpose of answering. The experimental results show that the proposed system has shown promising results and the responses given by the system are better than those given by the existing QA systems.

Download Full-text

SEARCH AND INDEX DATA USING ELASTICSEARCH

Issues of radio electronics ◽

10.21778/2218-5453-2019-3-74-77 ◽

2019 ◽

pp. 74-77

Author(s):

V. A. Fedorova ◽

E. A. Efremov ◽

I. A. Kolyagina

Keyword(s):

Big Data ◽

Information Retrieval ◽

Search Engine ◽

Amount Of Information ◽

Index Data ◽

Speed Up ◽

Information Retrieval Methods ◽

Inverted Indexing

Currently, the use of traditional information retrieval methods for analyzing big data is becoming ineffective. Analysis and processing of a large amount of information require completely new conceptual solutions, one of which is Elasticsearch, a search engine based on the Lucene library. Elasticsearch uses the concept of inverted indexing to speed up searches when a list of all unique words is created for each document and a list of documents for each word. The paper considers the principles of the Elasticsearch search technology. The actual task is to analyze and identify the specific capabilities of the Elasticsearch system associated with the search and processing of large amounts of information. The paper also describes examples of the work of Elasticsearch, which will help professionals to solve problems inherent in the systems of relevant and personalized information retrieval.

Download Full-text

Algorithms of relationships and dependencies search in Web-pages

PROBLEMS IN PROGRAMMING ◽

10.15407/pp2016.01.044 ◽

2016 ◽

pp. 044-050 ◽

Cited By ~ 1

Author(s):

A.M. Glybovets ◽

◽

Keyword(s):

Information Retrieval ◽

Computer Science ◽

Search Engine ◽

Data Storage ◽

Data Selection ◽

Web Pages ◽

Retrieval Systems ◽

Data Store ◽

Information Retrieval Systems ◽

Existing Data

Methods of extraction and analysis of data – a relatively new and promising branch of computer science, has found its application in information retrieval systems. An algorithm of relationships and dependencies searching in the collections of Web pages. The algorithm does not provide relevant search resources. This function is performed by the search engine. It also produces cleaning, integration, and data selection. A special feature of the algorithm is to use the existing data store (search engine or data storage), language independence and ease of implementation.

Download Full-text

Similarity Web Pages Retrieval Technologies on the Internet

Encyclopedia of Information Science and Technology, First Edition ◽

10.4018/978-1-59140-553-5.ch440 ◽

2005 ◽

pp. 2486-2491

Author(s):

Rung Ching Chen ◽

Ming Yung Tsai ◽

Chung Hsun Hsieh

Keyword(s):

Information Retrieval ◽

Search Engine ◽

Search Engines ◽

Fast Growth ◽

The Other ◽

The Internet ◽

Web Pages ◽

Query Term ◽

Web Page ◽

Critical Problems

In recent years, due to the fast growth of the Internet, the services and information it provides are constantly expanding. Madria and Bhowmick (1999) and Baeza-Yates (2003) indicated that most large search engines need to comply to, on average, at least millions of hits daily in order to satisfy the users’ needs for information. Each search engine has its own sorting policy and the keyword format for the query term, but there are some critical problems. The searches may get more or less information. In the former, the user always gets buried in the information. Requiring only a little information, they always select some former items from the large amount of returned information. In the latter, the user always re-queries using another searching keyword to do searching work. The re-query operation also leads to retrieving information in a great amount, which leads to having a large amount of useless information. That is a bad cycle of information retrieval. The similarity Web page retrieval can help avoid browsing the useless information. The similarity Web page retrieval indicates a Web page, and then compares the page with the other Web pages from the searching results of search engines. The similarity Web page retrieval will allow users to save time by not browsing unrelated Web pages and reject non-similar Web pages, rank the similarity order of Web pages and cluster the similarity Web pages into the same classification.

Download Full-text

Application of OASys approaches for Prophetic Food Ontology

Malaysian Journal of Science Health & Technology ◽

10.33102/mjosht.v5i1.132 ◽

2021 ◽

Vol 5 (1) ◽

Author(s):

Siti Fatimah Mohd Tawil ◽

Rosita Ismail ◽

Fauziah Abdul Wahid ◽

Norita Md Norwawi ◽

Ahmad Akmaluddin Mazlan

Keyword(s):

Information Retrieval ◽

Knowledge Representation ◽

Data Integration ◽

Search Engine ◽

Semantic Information ◽

Medical Problem ◽

Semantic Interpretation ◽

Semantic Information Retrieval ◽

Query System ◽

Context Mining

Ontology is an established knowledge representation enriched with a semantic interpretation that offered a mechanism for sharing mutual ideas and understanding among the members of a related domain. Semantic interpretation provided by the ontology has a structure that could facilitate the presentation of information for the users. This paper presents the ontology construction of prophetic food specifically for Dates and Goats Milk by using the OASys approaches. The ontology content focusing on the dates attributes, the developing stages of dates, defect and diseases of dates, health benefits, its compositions, and the chain of operation. Besides, the ontology content for goat’s milk includes its nutrition, its cure for a medical problem, and the production. The construction of this ontology can be used to answer user queries, data integration to other applications as well as expand the ontology to a context mining semantic information retrieval search engine known as Naqli Aqli Integrated Search Engine (NAISE). This system is a query system based on integrated Naqli and Aqli knowledge heterogeneous sources on prophetic food.

Download Full-text