scholarly journals Analogy-based Matching Model for Domain-specific Information Retrieval

Author(s):  
Myriam Bounhas ◽  
Bilel Elayeb
Author(s):  
Shaiful Bakhtiar Bin Rodzman ◽  
Normaly Kamal Ismail ◽  
Nurazzah Abd Rahman ◽  
Syed Ahmad Aljunid ◽  
Zulhilmi Mohamed Nor ◽  
...  

<span>Ranking function is a predictive algorithm that is used to establish a simple ordering of documents according to its relevance. This step is critical because the results’ quality of a Domain Specific Information Retrieval (IR) such as Hadith Information Retrieval is fundamentally dependent of the ranking function. A Hierarchical Fuzzy Logic Controller of <em>Mamdani</em>-type Fuzzy Inference System has been built to define the ranking function, based on the Malay Information retrieval’s BM25 Model. The model examines three-inputs (Ontology BM25 Score, Fabrication Rate of Hadith and Shia Rate of Hadith) and four-output values of Final Ranking Score which consist of three triangular membership functions. The proposed system has outperformed the BM25 original score and the Vector Space Model (VM) on 16 queries, while the BM25 original score and Vector Space Model only yield better result in 9 and 2 queries respectively on the P@10, %no measures and MAP. P@10 represent the values of Precision at Rank 10 P@10), %no measures represent the percentage of queries with no relevant documents in the top ten retrieved and MAP represents Mean Average Precision of the queries. The results show the proposed system have capability to demote negative documents and move up the relevant documents in the ranking list and its capability to recall unseen document with the application of ontology in text retrieval. For the future works, the researcher would like to apply the usage of other Malay Semantic elements and another corpus for positive ranking indicator.</span>


Author(s):  
Tao Peng ◽  
Lu Liu

Today more and more information on the Web makes it difficult to get domain-specific information due to the huge amount of data sources and the keywords that have few features. Anchor texts, which contain a few features of a specific topic, play an important role in domain-specific information retrieval, especially in Web page classification. However, the features contained in anchor texts are not informative enough. This paper presents a novel incremental method for Web page classification enhanced by link-contexts and clustering. Directly applying the vector of anchor text to a classifier might not get a good result because of the limited amount of features. Link-context is used first to obtain the contextual information of the anchor text. Then, a hierarchical clustering method is introduced to cluster feature vectors and content unit, which increases the length of a feature vector belonging to one specific class. Finally, incremental SVM is proposed to get the final classifier and increase the accuracy and efficiency of a classifier. Experimental results show that the performance of our proposed method outperforms the conventional topical Web crawler in Harvest rate and Target recall.


Sign in / Sign up

Export Citation Format

Share Document