A document classification and retrieval system for R&D in semiconductor industry – A hybrid approach

Query expansion (QE) is an efficient method for enhancing the efficiency of information retrieval system. In this work, we try to capture the limitations of pseudo-feedback based QE approach and propose a hybrid approach for enhancing the efficiency of feedback based QE by combining corpus-based, contextual based information of query terms, and semantic based knowledge of query terms. First of all, this paper explores the use of different corpus-based lexical co-occurrence approaches to select an optimal combination of query terms from a pool of terms obtained using pseudo-feedback based QE. Next, we explore semantic similarity approach based on word2vec for ranking the QE terms obtained from top pseudo-feedback documents. Further, we combine co-occurrence statistics, contextual window statistics, and semantic similarity based approaches together to select the best expansion terms for query reformulation. The experiments were performed on FIRE ad-hoc and TREC-3 benchmark datasets. The statistics of our proposed experimental results show significant improvement over baseline method.

Get full-text (via PubEx)

Design and implementation of an effective web-based hybrid stemmer for Odia language

International Journal of Advances in Applied Sciences ◽

10.11591/ijaas.v9.i1.pp12-19 ◽

2020 ◽

Vol 9 (1) ◽

pp. 12

Author(s):

Gouranga Charan Jena ◽

Siddharth Swarup Rautaray

Keyword(s):

Information Retrieval ◽

Morphological Analysis ◽

Retrieval System ◽

Hybrid Approach ◽

Information Retrieval System ◽

Retrieval Process ◽

Web Based ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Domain Independent

<p><span>Stemmer is used for reducing inflectional or derived word to its stem. This technique involves removing the suffix or prefix affixed in a word. It can be used for information retrieval system to refine the overall execution of the retrieval process. This process is not equivalent to morphological analysis. This process only finds the stem of a word. This technique decreases the number of terms in information retrieval system. There are various techniques exists for stemming. In this paper, a new web-based stemmer has been proposed named as “Mula” for Odia Language. It uses the Hybrid approach (i.e. combination of brute force and suffix removal approach) for Odia language. The new born stemmer is both computationally faster and domain independent. The results are favourable and indicate that the proposed stemmer can be used effectively in Odia Information Retrieval systems. This stemmer also handles the problem of over-stemming and under-stemming in some extend.</span></p>

Get full-text (via PubEx)

A review on the Detection of Missing Content Queries in FAQ Retrieval Systems

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v16i2.5996 ◽

2017 ◽

Vol 16 (2) ◽

pp. 6203-6206

Author(s):

Edwin Thuma ◽

Moemedi Lefoane ◽

Gontlafetse Mosweunyane

Keyword(s):

Information Needs ◽

Retrieval System ◽

Binary Classification ◽

Hybrid Approach ◽

Information Need ◽

Comprehensive Review ◽

Classification Approach ◽

Relevant Question ◽

Retrieval Systems

When developing an automated FAQ retrieval system, the information supplier constructs question candidates in advance using their own knowledge. Then they answer these question candidates to create question-answer pairs to use in the FAQ retrieval system. However, these question-answer pairs will not always satisfy the usersâ€™ information needs. When there is no relevant questionâ€“answer pair to a usersâ€™ query, such a user may submit various query reformulations browsing over the long results list and may abandon the search before their information need has been satisfied. Such users many never return to use the system again because of the inability of the system to return relevant question-answer pairs to their query. In order to alleviate this, modern automated FAQ retrieval systems use a Missing Content Query (MCQ) detection subsystem to detect those queries that do not have the relevant questionâ€“answer pair. In this article we conduct a review of the different approaches proposed in the literature for detecting these MCQs. In particular, we provide a comprehensive review of the different systems that deployed the binary classification approach, the thresholding approach and the hybrid approach in the detection of MCQs. Moreover, we describe the strength and weaknesses of each approach.

Get full-text (via PubEx)