What is the difference? A cognitive dissimilarity measure for information retrieval result sets

2011 ◽  
Vol 30 (2) ◽  
pp. 319-340 ◽  
Author(s):  
Carsten Keßler
2015 ◽  
Vol 103 (1) ◽  
pp. 131-138 ◽  
Author(s):  
Yves Bestgen

Abstract Average precision (AP) is one of the most widely used metrics in information retrieval and natural language processing research. It is usually thought that the expected AP of a system that ranks documents randomly is equal to the proportion of relevant documents in the collection. This paper shows that this value is only approximate, and provides a procedure for efficiently computing the exact value. An analysis of the difference between the approximate and the exact value shows that the discrepancy is large when the collection contains few documents, but becomes very small when it contains at least 600 documents.


2020 ◽  
Vol 28 (3) ◽  
pp. 148-168
Author(s):  
Jin Zhang ◽  
Yuehua Zhao ◽  
Xin Cai ◽  
Taowen Le ◽  
Wei Fei ◽  
...  

Relevance judgment plays an extremely significant role in information retrieval. This study investigates the differences between American users and Chinese users in relevance judgment during the information retrieval process. 384 sets of relevance scores with 50 scores in each set were collected from 16 American users and 16 Chinese users as they judged retrieval records from two major search engines based on 24 predefined search tasks from 4 domain categories. Statistical analyses reveal that there are significant differences between American assessors and Chinese assessors in relevance judgments. Significant gender differences also appear within both the American and the Chinese assessor groups. The study also revealed significant interactions among cultures, genders, and subject categories. These findings can enhance the understanding of cultural impact on information retrieval and can assist in the design of effective cross-language information retrieval systems.


10.29007/5zzj ◽  
2018 ◽  
Author(s):  
Masaharu Yoshioka ◽  
Daiki Onodera

In this paper, we introduce a system for COLIEE task phase 1 that retrieves relevant civil code article(s) for making correct entailment to the questions of Japanese Bar Exam. This system is an extended version of our previous system that based on legal terminology and civil code article structure. However, the performance of the previous system is not as good as best performance system of the task. In this paper, we introduce concept of phrase alignment that takes into account the civil code article structure. In addition, due to the variations of the question types, the settings that are good for particular type of questions may not be good for other types of questions. Therefore, we propose to use systems with different settings and generate final answer by aggregating the output of different systems based on ensemble approach. Finally, we also discuss the difference between English task and Japanese task based on the retrieval results of Indri, one of the state-of-the-art information retrieval system.


2015 ◽  
Vol 67 (4) ◽  
pp. 408-421
Author(s):  
Sri Devi Ravana ◽  
MASUMEH SADAT TAHERI ◽  
Prabha Rajagopal

Purpose – The purpose of this paper is to propose a method to have more accurate results in comparing performance of the paired information retrieval (IR) systems with reference to the current method, which is based on the mean effectiveness scores of the systems across a set of identified topics/queries. Design/methodology/approach – Based on the proposed approach, instead of the classic method of using a set of topic scores, the documents level scores are considered as the evaluation unit. These document scores are the defined document’s weight, which play the role of the mean average precision (MAP) score of the systems as a significance test’s statics. The experiments were conducted using the TREC 9 Web track collection. Findings – The p-values generated through the two types of significance tests, namely the Student’s t-test and Mann-Whitney show that by using the document level scores as an evaluation unit, the difference between IR systems is more significant compared with utilizing topic scores. Originality/value – Utilizing a suitable test collection is a primary prerequisite for IR systems comparative evaluation. However, in addition to reusable test collections, having an accurate statistical testing is a necessity for these evaluations. The findings of this study will assist IR researchers to evaluate their retrieval systems and algorithms more accurately.


Author(s):  
Nguyen Van Dinh ◽  
Nguyen Xuan Thao

To measure the difference of two fuzzy sets (FSs) / intuitionistic sets (IFSs), we can use the distance measure and dissimilarity measure between fuzzy sets/intuitionistic fuzzy set. Characterization of distance/dissimilarity measure between fuzzy sets/intuitionistic fuzzy set is important as it has application in different areas: pattern recognition, image segmentation, and decision making. Picture fuzzy set (PFS) is a generalization of fuzzy set and intuitionistic set, so that it have many application. In this paper, we introduce concepts: difference between PFS-sets, distance measure and dissimilarity measure between picture fuzzy sets, and also provide  the formulas for determining these values. We also present an application of dissimilarity measures in the sample recognition problems, can also be considered a decision-making problem.


2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
Mengmeng Ma ◽  
Jiyao An

To solve the invalidation problem of Dempster-Shafer theory of evidence (DS) with high conflict in multisensor data fusion, this paper presents a novel combination approach of conflict evidence with different weighting factors using a new probabilistic dissimilarity measure. Firstly, an improved probabilistic transformation function is proposed to map basic belief assignments (BBAs) to probabilities. Then, a new dissimilarity measure integrating fuzzy nearness and introduced correlation coefficient is proposed to characterize not only the difference between basic belief functions (BBAs) but also the divergence degree of the hypothesis that two BBAs support. Finally, the weighting factors used to reassign conflicts on BBAs are developed and Dempster’s rule is chosen to combine the discounted sources. Simple numerical examples are employed to demonstrate the merit of the proposed method. Through analysis and comparison of the results, the new combination approach can effectively solve the problem of conflict management with better convergence performance and robustness.


2012 ◽  
Vol 241-244 ◽  
pp. 3121-3124 ◽  
Author(s):  
Yang Luo

Information retrieval is an important direction in the area of natural language processing .This paper introduced semidiscrete matrix decomposition in latent semantic indexing. We aimed at it’s disadvantage in storage space and presented SSDD,then we compare the difference of SVD and SDD and SSDD in performance


2018 ◽  
Vol 3 (1) ◽  
pp. 768
Author(s):  
Gema Castillo ◽  
Aránzazu Berbey Álvarez ◽  
Humberto Alvarez ◽  
Isabel De La Torre Diez

The goal is to present the main free and open access search engines such as PATENTSCOPE and Google Patents. It also seeks to verify the information retrieval system, which seeks to transform the user's information needs into a list or collection of documents whose content satisfies that need. We present the comparison of both verifying each one independently and then, a summary table. Finally, it is concluded that the constant search for inventions can make the difference between the positions of competences between global companies; It is for this reason that patents prove to be a source of reliable information on the subjects of interest of the people or companies. Pantestscope and Google Patents allows you to download as much data as a table for future analysis of the information. Keywords: Information retrieval, Patents, Patentscope, Google Patents, Web


2007 ◽  
Vol 25 (1-2) ◽  
pp. 179-191 ◽  
Author(s):  
D. Lillis ◽  
F. Toolan ◽  
A. Mur ◽  
L. Peng ◽  
R. Collier ◽  
...  

2020 ◽  
Vol 3 (2) ◽  
pp. 106
Author(s):  
Aji Prasetya Wibawa ◽  
Hidayah Kariima Fithri ◽  
Ilham Ari Elbaith Zaeni ◽  
Andrew Nafalski

Stopword removal necessary in Information Retrieval. It can remove frequently appeared and general words to reduce memory storage. The algorithm eliminates each word that is precisely the same as the word in the stopword list. However, generating the list could be time-consuming. The words in a specific language and domain must be collected and validated by specialists. This research aims to develop a new way to generate a stop word list using the K-means Clustering method. The proposed approach groups words based on their frequency. The confusion matrix calculates the difference between the findings with a valid stopword list created by a Javanese linguist. The accuracy of the proposed method is 78.28% (K=7). The result shows that the generation of Javanese stopword lists using a clustering method is reliable.


Sign in / Sign up

Export Citation Format

Share Document