A document classification and retrieval system for R&D in semiconductor industry – A hybrid approach

2009 ◽  
Vol 36 (3) ◽  
pp. 4753-4764 ◽  
Author(s):  
Shui-Shun Lin
Author(s):  
Muhammad Nabeel Asim ◽  
Muhammad Usman Ghani Khan ◽  
Muhammad Imran Malik ◽  
Andreas Dengel ◽  
Sheraz Ahmed

2017 ◽  
Vol 13 (3) ◽  
pp. 57-78 ◽  
Author(s):  
Jagendra Singh ◽  
Rakesh Kumar

Query expansion (QE) is an efficient method for enhancing the efficiency of information retrieval system. In this work, we try to capture the limitations of pseudo-feedback based QE approach and propose a hybrid approach for enhancing the efficiency of feedback based QE by combining corpus-based, contextual based information of query terms, and semantic based knowledge of query terms. First of all, this paper explores the use of different corpus-based lexical co-occurrence approaches to select an optimal combination of query terms from a pool of terms obtained using pseudo-feedback based QE. Next, we explore semantic similarity approach based on word2vec for ranking the QE terms obtained from top pseudo-feedback documents. Further, we combine co-occurrence statistics, contextual window statistics, and semantic similarity based approaches together to select the best expansion terms for query reformulation. The experiments were performed on FIRE ad-hoc and TREC-3 benchmark datasets. The statistics of our proposed experimental results show significant improvement over baseline method.


Author(s):  
Gouranga Charan Jena ◽  
Siddharth Swarup Rautaray

<p><span>Stemmer is used for reducing inflectional or derived word to its stem. This technique involves removing the suffix or prefix affixed in a word. It can be used for information retrieval system to refine the overall execution of the retrieval process. This process is not equivalent to morphological analysis. This process only finds the stem of a word. This technique decreases the number of terms in information retrieval system. There are various techniques exists for stemming. In this paper, a new web-based stemmer has been proposed named as “Mula” for Odia Language. It uses the Hybrid approach (i.e. combination of brute force and suffix removal approach) for Odia language. The new born stemmer is both computationally faster and domain independent. The results are favourable and indicate that the proposed stemmer can be used effectively in Odia Information Retrieval systems. This stemmer also handles the problem of over-stemming and under-stemming in some extend.</span></p>


2017 ◽  
Vol 16 (2) ◽  
pp. 6203-6206
Author(s):  
Edwin Thuma ◽  
Moemedi Lefoane ◽  
Gontlafetse Mosweunyane

When developing an automated FAQ retrieval system, the information supplier constructs question candidates in advance using their own knowledge. Then they answer these question candidates to create question-answer pairs to use in the FAQ retrieval system. However, these question-answer pairs will not always satisfy the users’ information needs. When there is no relevant question–answer pair to a users’ query, such a user may submit various query reformulations browsing over the long results list and may abandon the search before their information need has been satisfied. Such users many never return to use the system again because of the inability of the system to return relevant question-answer pairs to their query. In order to alleviate this, modern automated FAQ retrieval systems use a Missing Content Query (MCQ) detection subsystem to detect those queries that do not have the relevant question–answer pair. In this article we conduct a review of the different approaches proposed in the literature for detecting these MCQs. In particular, we provide a comprehensive review of the different systems that deployed the binary classification approach, the thresholding approach and the hybrid approach in the detection of MCQs. Moreover, we describe the strength and weaknesses of each approach.


Sign in / Sign up

Export Citation Format

Share Document