Improving Sentence Retrieval Using Sequence Similarity

Sentence retrieval is an information retrieval technique that aims to find sentences corresponding to an information need. It is used for tasks like question answering (QA) or novelty detection. Since it is similar to document retrieval but with a smaller unit of retrieval, methods for document retrieval are also used for sentence retrieval like term frequency—inverse document frequency (TF-IDF), BM 25 , and language modeling-based methods. The effect of partial matching of words to sentence retrieval is an issue that has not been analyzed. We think that there is a substantial potential for the improvement of sentence retrieval methods if we consider this approach. We adapted TF-ISF, BM 25 , and language modeling-based methods to test the partial matching of terms through combining sentence retrieval with sequence similarity, which allows matching of words that are similar but not identical. All tests were conducted using data from the novelty tracks of the Text Retrieval Conference (TREC). The scope of this paper was to find out if such approach is generally beneficial to sentence retrieval. However, we did not examine in depth how partial matching helps or hinders the finding of relevant sentences.

Download Full-text

Effects of language modeling on speech-driven question answering

10.21437/interspeech.2004-370 ◽

2004 ◽

Author(s):

Katsunobu Itou ◽

Atsushi Fujii ◽

Tomoyosi Akiba

Keyword(s):

Question Answering ◽

Language Modeling

Download Full-text

I Know What You Need: Investigating Document Retrieval Effectiveness with Partial Session Contexts

ACM Transactions on Information Systems ◽

10.1145/3488667 ◽

2022 ◽

Vol 40 (3) ◽

pp. 1-30

Author(s):

Procheta Sen ◽

Debasis Ganguly ◽

Gareth J. F. Jones

Keyword(s):

Relevant Information ◽

Document Retrieval ◽

Context Information ◽

Information Need ◽

Search System ◽

Query Log ◽

Sequence Modeling ◽

Joint Embedding ◽

One Step ◽

A Current

Reducing user effort in finding relevant information is one of the key objectives of search systems. Existing approaches have been shown to effectively exploit the context from the current search session of users for automatically suggesting queries to reduce their search efforts. However, these approaches do not accomplish the end goal of a search system—that of retrieving a set of potentially relevant documents for the evolving information need during a search session. This article takes the problem of query prediction one step further by investigating the problem of contextual recommendation within a search session. More specifically, given the partial context information of a session in the form of a small number of queries, we investigate how a search system can effectively predict the documents that a user would have been presented with had he continued the search session by submitting subsequent queries. To address the problem, we propose a model of contextual recommendation that seeks to capture the underlying semantics of information need transitions of a current user’s search context. This model leverages information from a number of past interactions of other users with similar interactions from an existing search log. To identify similar interactions, as a novel contribution, we propose an embedding approach that jointly learns representations of both individual query terms and also those of queries (in their entirety) from a search log data by leveraging session-level containment relationships. Our experiments conducted on a large query log, namely the AOL, demonstrate that using a joint embedding of queries and their terms within our proposed framework of document retrieval outperforms a number of text-only and sequence modeling based baselines.

Download Full-text

Semantic indexing and document retrieval for personalized language modeling

2017 International Symposium ELMAR ◽

10.23919/elmar.2017.8124458 ◽

2017 ◽

Cited By ~ 1

Author(s):

Jan Stas ◽

Daniel Hladek ◽

Jozef Juhar

Keyword(s):

Document Retrieval ◽

Language Modeling ◽

Semantic Indexing

Download Full-text

Language modeling approaches to question answering

10.17918/etd-3126 ◽

2021 ◽

Author(s):

Protima Banerjee

Keyword(s):

Question Answering ◽

Language Modeling ◽

Modeling Approaches

Download Full-text

Novelty Detection in Human Behavior through Analysis of Energy Utilization

Human Behavior Recognition Technologies ◽

10.4018/978-1-4666-3682-8.ch004 ◽

2013 ◽

pp. 65-85 ◽

Cited By ~ 2

Author(s):

Chao Chen ◽

Diane J. Cook

Keyword(s):

Energy Consumption ◽

Human Behavior ◽

Novelty Detection ◽

Electricity Consumption ◽

Energy Utilization ◽

Smart Environments ◽

Data Sets ◽

Smart Environment ◽

Energy Data ◽

Using Data

The value of smart environments in understanding and monitoring human behavior has become increasingly obvious in the past few years. Using data collected from sensors in these environments, scientists have been able to recognize activities that residents perform and use the information to provide context-aware services and information. However, less attention has been paid to monitoring and analyzing energy usage in smart homes, despite the fact that electricity consumption in homes has grown dramatically. In this chapter, the authors demonstrate how energy consumption relates to human activity through verifying that energy consumption can be predicted based on the activity that is being performed. The authors then automatically identify novelties in human behavior by recognizing outliers in energy consumption generated by the residents in a smart environment. To validate these approaches, they use real energy data collected in their CASAS smart apartment testbed and analyze the results for two different data sets collected in this smart home.

Download Full-text

A Persian Medical Question Answering System

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213020500190 ◽

2020 ◽

Vol 29 (06) ◽

pp. 2050019

Author(s):

Hadi Veisi ◽

Hamed Fakour Shandi

Keyword(s):

Natural Language ◽

Language Processing ◽

Question Answering ◽

Document Retrieval ◽

Main Concept ◽

Question Answering System ◽

Detection Algorithms ◽

Persian Language ◽

Part Of Speech ◽

Answer Extraction

A question answering system is a type of information retrieval that takes a question from a user in natural language as the input and returns the best answer to it as the output. In this paper, a medical question answering system in the Persian language is designed and implemented. During this research, a dataset of diseases and drugs is collected and structured. The proposed system includes three main modules: question processing, document retrieval, and answer extraction. For the question processing module, a sequential architecture is designed which retrieves the main concept of a question by using different components. In these components, rule-based methods, natural language processing, and dictionary-based techniques are used. In the document retrieval module, the documents are indexed and searched using the Lucene library. The retrieved documents are ranked using similarity detection algorithms and the highest-ranked document is selected to be used by the answer extraction module. This module is responsible for extracting the most relevant section of the text in the retrieved document. During this research, different customized language processing tools such as part of speech tagger and lemmatizer are also developed for Persian. Evaluation results show that this system performs well for answering different questions about diseases and drugs. The accuracy of the system for 500 sample questions is 83.6%.

Download Full-text

O ONTOLOGY-BASED PARAGRAPH EXTRACTION AND CAUSALITY DETECTION-BASED SIMILARITY FOR ANSWERING WHY-QUESTION

Jurnal Ilmu Komputer ◽

10.24843/jik.2018.v11.i01.p02 ◽

2018 ◽

Vol 11 (1) ◽

pp. 9

Author(s):

A A I N Eka Karyawati

Keyword(s):

Knowledge Base ◽

Main Part ◽

Question Answering ◽

Document Retrieval ◽

Domain Ontology ◽

Typical Problem ◽

Scoring Method ◽

Method Performance ◽

Question Answering System ◽

Selection For

Paragraph extraction is a main part of an automatic question answering system, especially in answering why-question. It is because the answer of a why-question usually contained in one paragraph instead of one or two sentences. There have been some researches on paragraph extraction approaches, but there are still few studies focusing on involving the domain ontology as a knowledge base. Most of the paragraph extraction studies used keyword-based method with small portion of semantic approaches. Thus, the question answering system faces a typical problem often occuring in keyword-based method that is word mismatches problem. The main contribution of this research is a paragraph scoring method that incorporates the TFIDF-based and causality-detection-based similarity. This research is a part of the ontology-based why-question answering method, where ontology is used as a knowledge base for each steps of the method including indexing, question analyzing, document retrieval, and paragraph extraction/selection. For measuring the method performance, the evaluations were conducted by comparing the proposed method over two baselines methods that did not use causality-detection-based similarity. The proposed method shown improvements over the baseline methods regarding MRR (95%, 0.82-0.42), P@1 (105%, 0.78-0.38), P@5(91%, 0.88-0.46), Precision (95%, 0.80-0.41), and Recall (66%, 0.88-0.53).

Download Full-text

Question Answering

10.1093/oxfordhb/9780199276349.013.0031 ◽

2012 ◽

Cited By ~ 1

Author(s):

Sanda Harabagiu ◽

Dan Moldovan

Keyword(s):

Language Processing ◽

Question Answering ◽

Document Retrieval ◽

Semantic Features ◽

Knowledge Based ◽

Answer Extraction ◽

Small Set ◽

Processing Module ◽

On Line ◽

Processing Techniques

Textual Question Answering (QA) identifies the answer to a question in large collections of on-line documents. By providing a small set of exact answers to questions, QA takes a step closer to information retrieval rather than document retrieval. A QA system comprises three modules: a question-processing module, a document-processing module, and an answer extraction and formulation module. Questions may be asked about any topic, in contrast with Information Extraction (IE), which identifies textual information relevant only to a predefined set of events and entities. The natural language processing (NLP) techniques used in open-domain QA systems may range from simple lexical and semantic disambiguation of question stems to complex processing that combines syntactic and semantic features of the questions with pragmatic information derived from the context of candidate answers. This article reviews current research in integrating knowledge-based NLP methods with shallow processing techniques for QA.

Download Full-text