Combining IR Models for Bengali Information Retrieval

2018 ◽  
Vol 8 (3) ◽  
pp. 68-83
Author(s):  
Soma Chatterjee ◽  
Kamal Sarkar

Word mismatch between queries and documents is a fundamental problem in information retrieval domain. In this article, the authors present an effective approach to Bengali information retrieval that combines two IR models to tackle the word mismatch problem in Bengali IR. The proposed hybrid model combines the traditional word-based IR model with another IR model that uses semantic text similarity measure based on vector embeddings of words. Experimental results show that the performance of our proposed hybrid Bengali IR model significantly improves over the baseline IR model.

Author(s):  
Budi Yulianto ◽  
Widodo Budiharto ◽  
Iman Herwidiana Kartowisastro

Boolean Retrieval (BR) and Vector Space Model (VSM) are very popular methods in information retrieval for creating an inverted index and querying terms. BR method searches the exact results of the textual information retrieval without ranking the results. VSM method searches and ranks the results. This study empirically compares the two methods. The research utilizes a sample of the corpus data obtained from Reuters. The experimental results show that the required times to produce an inverted index by the two methods are nearly the same. However, a difference exists on the querying index. The results also show that the numberof generated indexes, the sizes of the generated files, and the duration of reading and searching an index are proportional with the file number in the corpus and thefile size.


Author(s):  
Rohan Nanda ◽  
Llio Humphreys ◽  
Lorenzo Grossio ◽  
Adebayo Kolawole John

This paper presents a multilingual legal information retrieval system for mapping recitals to articles in European Union (EU) directives and normative provisions in national legislation. Such a system could be useful for purposive interpretation of norms. A previous work on mapping recitals and normative provisions was limited to EU legislation in English and only one lexical text similarity technique. In this paper, we develop state-of-the-art text similarity models to investigate the interplay between directive recitals, directive (sub-)articles and provisions of national implementing measures (NIMs) on a multilingual corpus (from Ireland, Italy and Luxembourg). Our results indicate that directive recitals do not have a direct influence on NIM provisions, but they sometimes contain additional information that is not present in the transposed directive sub-article, and can therefore facilitate purposive interpretation.


2013 ◽  
Vol 748 ◽  
pp. 967-971
Author(s):  
Wei Gao ◽  
Tian Wei Xu ◽  
Li Liang ◽  
Jian Hou Gan

Ontology similarity calculation and ontology mapping are important research topics in information retrieval. One method for ontology similarity measure is using multi-dividing approach. Assume that the notation and terminology used but undefined in this paper can be found in [4] and [5]. We show that the assumption of strict ordering of xi*can be relaxed to allow some ties in the likelihood ratio. The same proof remains true if we consider equivalence classes defined by the likelihood ratio and relabel xi* as its equivalence class denoted by [xi*].


Author(s):  
Misturah Adunni Alaran ◽  
AbdulAkeem Adesina Agboola ◽  
Adio Taofiki Akinwale ◽  
Olusegun Folorunso

The reality of human existence and their interactions with various things that surround them reveal that the world is imprecise, incomplete, vague, and even sometimes indeterminate. Neutrosophic logic is the only theory that attempts to unify all previous logics in the same global theoretical framework. Extracting data from a similar environment is becoming a problem as the volume of data keeps growing day-in and day-out. This chapter proposes a new neutrosophic string similarity measure based on the longest common subsequence (LCS) to address uncertainty in string information search. This new method has been compared with four other existing classical string similarity measure using wordlist as data set. The analyses show the performance of proposed neutrosophic similarity measure to be better than the existing in information retrieval task as the evaluation is based on precision, recall, highest false match, lowest true match, and separation.


2014 ◽  
Vol 4 (3) ◽  
pp. 1-13
Author(s):  
Khadoudja Ghanem

In this paper the authors propose a semantic approach to document categorization. The idea is to create for each category a semantic index (representative term vector) by performing a local Latent Semantic Analysis (LSA) followed by a clustering process. A second use of LSA (Global LSA) is adopted on a term-Class matrix in order to retrieve the class which is the most similar to the query (document to classify) in the same way where the LSA is used to retrieve documents which are the most similar to a query in Information Retrieval. The proposed system is evaluated on a popular dataset which is 20 Newsgroup corpus. Obtained results show the effectiveness of the method compared with those obtained with the classic KNN and SVM classifiers as well as with methods presented in the literature. Experimental results show that the new method has high precision and recall rates and classification accuracy is significantly improved.


Sign in / Sign up

Export Citation Format

Share Document