Document-based approach to improve the accuracy of pairwise comparison in evaluating information retrieval systems

2015 ◽  
Vol 67 (4) ◽  
pp. 408-421
Author(s):  
Sri Devi Ravana ◽  
MASUMEH SADAT TAHERI ◽  
Prabha Rajagopal

Purpose – The purpose of this paper is to propose a method to have more accurate results in comparing performance of the paired information retrieval (IR) systems with reference to the current method, which is based on the mean effectiveness scores of the systems across a set of identified topics/queries. Design/methodology/approach – Based on the proposed approach, instead of the classic method of using a set of topic scores, the documents level scores are considered as the evaluation unit. These document scores are the defined document’s weight, which play the role of the mean average precision (MAP) score of the systems as a significance test’s statics. The experiments were conducted using the TREC 9 Web track collection. Findings – The p-values generated through the two types of significance tests, namely the Student’s t-test and Mann-Whitney show that by using the document level scores as an evaluation unit, the difference between IR systems is more significant compared with utilizing topic scores. Originality/value – Utilizing a suitable test collection is a primary prerequisite for IR systems comparative evaluation. However, in addition to reusable test collections, having an accurate statistical testing is a necessity for these evaluations. The findings of this study will assist IR researchers to evaluate their retrieval systems and algorithms more accurately.

2018 ◽  
Vol 36 (3) ◽  
pp. 445-456 ◽  
Author(s):  
Shahram Sedghi ◽  
Zeinab Shormeij ◽  
Iman Tahamtan

Purpose Information seeking is an interactive behaviour of the end users with information systems, which occurs in a real environment known as context. Context affects information-seeking behaviour in many different ways. The purpose of this paper is to investigate the factors that potentially constitute the context of visual information seeking. Design/methodology/approach The authors used a Straussian version of grounded theory, a qualitative approach, to conduct the study. Using a purposive sampling method, 28 subjects participated in the study. The data were analysed using open, axial and selective coding in MAXQDA software. Findings The contextual factors influencing visual information seeking were classified into seven categories, including: “user characteristics”, “general search features”, “visual search features”, “display of results”, “accessibility of results”, “task type” and “environmental factors”. Practical/implications This study contributes to a better understanding of how people conduct searches in and interact with visual search interfaces. Results have important implications for the designers of information retrieval systems. Originality/value This paper is among the pioneer studies investigating contextual factors influencing information seeking in visual information retrieval systems.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Sanaz Manouchehri ◽  
Mahdieh Mirzabeigi ◽  
Tahere Jowkar

PurposeThis paper aims to discover the effectiveness of Farsi-English query using ontology.Design/methodology/approachThe present study is quasi-experimental. The sample consisted of 60 students and graduate and doctoral staff from Shiraz University and the Regional Center for Science and Technology. A researcher-made questionnaire was used to assess the level of English language proficiency of users, background knowledge and their level of satisfaction with search results before and after using ontology. Each user also evaluated the relevance of the top ten results on the Google search engine results page before and after using ontology.FindingsThe findings showed that the level of complexity of the task, the use of ontology, the interactive effect of the level of complexity of the task with the domain knowledge of the users, and the interactive effect of the level of complexity of the task with ontology, influence the effectiveness of retrieval results from the users' point of view. The results of the present study also showed that the level of complexity of the task, the use of ontology, and the interactive effect of the level of complexity of the task and the use of ontology, affect the level of user satisfaction.Originality/valueThe results of this research are significant in both theoretical and practical aspects. Theoretically, given the lack of research in which the interactive effect of the use of ontology has examined the level of complexity of tasks and domain knowledge of users, the present study can be considered as an attempt to improve information retrieval systems. From a practical point of view, the results of this research will help researchers and designers of information retrieval systems to understand that the use of ontologies can be used to retrieve information and improve the query and assess the needs of users and their satisfaction in this field, and ultimately, making the information retrieval process more effective.


2014 ◽  
Vol 2014 ◽  
pp. 1-13 ◽  
Author(s):  
Parnia Samimi ◽  
Sri Devi Ravana

Test collection is used to evaluate the information retrieval systems in laboratory-based evaluation experimentation. In a classic setting, generating relevance judgments involves human assessors and is a costly and time consuming task. Researchers and practitioners are still being challenged in performing reliable and low-cost evaluation of retrieval systems. Crowdsourcing as a novel method of data acquisition is broadly used in many research fields. It has been proven that crowdsourcing is an inexpensive and quick solution as well as a reliable alternative for creating relevance judgments. One of the crowdsourcing applications in IR is to judge relevancy of query document pair. In order to have a successful crowdsourcing experiment, the relevance judgment tasks should be designed precisely to emphasize quality control. This paper is intended to explore different factors that have an influence on the accuracy of relevance judgments accomplished by workers and how to intensify the reliability of judgments in crowdsourcing experiment.


2018 ◽  
Vol 36 (1) ◽  
pp. 55-70 ◽  
Author(s):  
Sanjeev K. Sunny ◽  
Mallikarjun Angadi

Purpose The purpose of this study is to carry out a systematic literature review for evidence-based assessment of the effectiveness of thesaurus in digital information retrieval systems. It also aimed to identify the evaluation methods, evaluation measures and data collection tools which may be used in evaluating digital information retrieval systems. Design/methodology/approach A systematic literature review (SLR) of 344 publications from LISA and 238 from Scopus has been carried out to identify the evaluation studies for analysis, and 15 evaluation studies have been analyzed. Findings This study presents evidences for the effectiveness of thesaurus in digital information retrieval systems. Various methods for evaluating digital information systems have been identified. Also, a wide range of evaluation measures and data collection tools have been identified. Research limitations/implications The study was limited to the literature published in English language and indexed in LISA and Scopus. The evaluation methods, evaluation measures and data collection tools identified in this study may be used to design more cognizant evaluation studies for digital information retrieval systems. Practical implications The findings have significant implications for the administrators of any type of digital information retrieval systems in making more informed decisions toward implementation of thesaurus in resource description and access to digital collections. Originality/value This study extends our knowledge on the potentials of thesauri in digital information retrieval systems. It also provides cues for designing more cognizant evaluation studies for digital information systems.


2018 ◽  
Vol 36 (3) ◽  
pp. 430-444
Author(s):  
Sholeh Arastoopoor

Purpose The degree to which a text is considered readable depends on the capability of the reader. This assumption puts different information retrieval systems at the risk of retrieving unreadable or hard-to-be-read yet relevant documents for their users. This paper aims to examine the potential use of concept-based readability measures along with classic measures for re-ranking search results in information retrieval systems, specifically in the Persian language. Design/methodology/approach Flesch–Dayani as a classic readability measure along with document scope (DS) and document cohesion (DC) as domain-specific measures have been applied for scoring the retrieved documents from Google (181 documents) and the RICeST database (215 documents) in the field of computer science and information technology (IT). The re-ranked result has been compared with the ranking of potential users regarding their readability. Findings The results show that there is a difference among subcategories of the computer science and IT field according to their readability and understandability. This study also shows that it is possible to develop a hybrid score based on DS and DC measures and, among all four applied scores in re-ranking the documents, the re-ranked list of documents based on the DSDC score shows correlation with re-ranking of the participants in both groups. Practical implications The findings of this study would foster a new option in re-ranking search results based on their difficulty for experts and non-experts in different fields. Originality/value The findings and the two-mode re-ranking model proposed in this paper along with its primary focus on domain-specific readability in the Persian language would help Web search engines and online databases in further refining the search results in pursuit of retrieving useful texts for users with differing expertise.


1981 ◽  
Vol 3 (4) ◽  
pp. 177-183 ◽  
Author(s):  
Martin Lennon ◽  
David S. Peirce ◽  
Brian D. Tarry ◽  
Peter Willett

The characteristics of conflation algorithms are discussed and examples given of some algorithms which have been used for information retrieval systems. Comparative experiments with a range of keyword dictionaries and with the Cranfield document test collection suggest that there is relatively little difference in the performance of the algorithms despite the widely disparate means by which they have been developed and by which they operate.


2014 ◽  
Vol 9 (4) ◽  
pp. 47
Author(s):  
Joanne L. Jordan

A Review of: Mu, X., Lu, K., Ryu, H. (2014). Explicitly integrating MeSH thesaurus help into health information retrieval systems: An empirical user study. Information Processing and Management, 50(1), 24-40. http://dx.doi.org/10.1016/j.ipm.2013.03.005 Abstract Objectives – To compare the effectiveness of a search interface with built-in thesaurus (MeSH) terms and tree browsers (MeshMed) to a simple search interface (SimpleMed) in supporting health information retrieval. Researchers also examined the contribution of the MeSH term and tree browser components towards effective information retrieval and assessed whether and how these elements influence the users’ search methods and strategies. Design – Empirical comparison study. Setting – A four-year university in the United States of America. Subjects – 45 undergraduate and postgraduate students from 12 different academic departments. Methods – Researchers recruited 55 students, of which 10 were excluded, using flyers posted across a university campus from a wide range of disciplines. Participants were paid a small stipend taking part in the study. The authors developed two information retrieval systems, SimpleMed and MeshMed, to search across a test collection, OHSUMED, a database containing 348,566 Medline citations used in information retrieval research. SimpleMed includes a search browser and a popup window displaying record details. The MeshMed search interface includes two additional browsers, one for looking up details of MeSH terms and another showing where the term fits into the tree structure. The search tasks had two parts: to define a key biomedical term, and to explore the association between concepts. After a brief tutorial covering the key functions of both systems, avoiding suggestion of one interface being better than the other, each participant then searched for six topics, three on each interface, allocated randomly using a 6x6 Latin square design. The study tracked participants’ perceived topic familiarity using a 9-point Likert scale, measured before and after each search, with changes in score recorded. It examined the time spent in each search system, as recorded objectively by system logs, to measure engagement with searching task. Finally, the study examined whether participants found an answer to the set question, and whether that response was wrong, partially correct, or correct. Participants were asked about the portion of time they spent on each of the system components, and transaction log data was used to capture transitions between the search components. The participants also added their comments to a questionnaire after the search phase of the experiment. Main results – The baseline mean topic familiarity scores were similar for both interfaces, with SimpleMed’s mean of 2.01, with a standard deviation 1.43, compared to MeSHMed’s mean of 2.08 with a standard deviation of 1.60. The mean was taken for topic familiarity change scores over three questions on each interface and compared using a paired sample two-tailed t-test. This showed a statistically significant difference between the mean change in topic familiarity scores for SimpleMed and MeSHMed. Only 46 (17%) of the questions were not answered, 34 (74%) when participants were using SimpleMed and 12 (26%) when using MeSHMed. Researchers found a chi-squared test association between the interface and whether the answer was correct, suggesting that MeSHMed users were less likely to answer questions incorrectly. The question-answer scores positively correlated to the topic familiarity change scores, indicating that those participants whose familiarity with the topic improved the most were more likely to answer the question correctly. The mean amount of time spent overall using the two interfaces was not significantly different, though researchers do not provide data on mean times, only total time and test statistics. On the MeSHMed interface, on average participants found the Term Browser feature the most useful aspect and spent the most amount of time in this component. The Tree Browser feature was rated as contributing the least to the searching task and the participants spent the least amount of time in this part of the interface. Patterns of transitions between the components are reported, the most common of which were from the Search Browser to the Popup records, from the Term to the Search Browser, and vice versa. These observations suggest that participants were verifying the terms and clicking back and forth between the components to carry out iterative and more accurate searches. The authors identify seven typical patterns and described four different combinations of transitions between components. Based on questionnaire feedback, participants found the Term Browser helpful to define the medical terms used, and for additional suggested terms to add to their search. The Tree Browser allowed participants to see how terms relate to each other, and helped identify related terms, despite many negative feedback comments about this feature. Almost all participants (43 of 45) preferred MeSHMed for searching, finding the extra components helpful to produce better results. Conclusion – MeSHMed was shown to be more effective than SimpleMed for improving topic familiarity and finding correct answers to the set questions. Most participants reported a preference for the MeSHMed interface that included a Term Browser and Tree Browser to the straightforward SimpleMed interface. Both MeSHMed components contributed to the search process; the Term Browser was particularly helpful for defining and developing new concepts, and the Tree Browser added a view of the relationship between terms. The authors suggest that health information retrieval systems include visible and accessible thesaurus searching to assist with developing search strategies.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Magdalena Wójcik

PurposeThe subject of this paper is the idea of Brain–Computer Interface (BCI). The main goal is to assess the potential impact of BCI on the design, use and evaluation of information retrieval systems operating in libraries.Design/methodology/approachThe method of literature review was used to establish the state of research. The search according to accepted queries was carried out in the Scopus database and complementary in Google Scholar. To determine the state of research on BCI on the basis of library and information science, a specialist LISTA abstract database was also searched. The most current papers published in the years 2015–2019 in the English language or having at least an abstract in this language were taken into account.FindingsThe analysis showed that BCI issues are extremely popular in subject literature from various fields, mainly computer science, but practically does not occur in the context of using this technology in information retrieval systems.Research limitations/implicationsDue to the fact that BCI solutions are not yet implemented in libraries and are rarely the subject of scientific considerations in the field of library and information science, this article is mainly based on literature from other disciplines. The goal was to consider how much BCI solutions can affect library information retrieval systems. The considerations presented in this article are theoretical in nature due to the lack of empirical materials on which to base. The author's assumption was to initiate a discussion about BCI on the basis of library and information science, not to propose final solutions.Practical implicationsThe results can be widely used in practice as a framework for the implementation of BCI in libraries.Social implicationsThe article can help to facilitate the debate on the role of implementing new technologies in libraries.Originality/valueThe problem of BCI is very rarely addressed in the subject literature in the field of library and information science.


2019 ◽  
Vol 71 (1) ◽  
pp. 2-17
Author(s):  
Prabha Rajagopal ◽  
Sri Devi Ravana ◽  
Yun Sing Koh ◽  
Vimala Balakrishnan

Purpose The effort in addition to relevance is a major factor for satisfaction and utility of the document to the actual user. The purpose of this paper is to propose a method in generating relevance judgments that incorporate effort without human judges’ involvement. Then the study determines the variation in system rankings due to low effort relevance judgment in evaluating retrieval systems at different depth of evaluation. Design/methodology/approach Effort-based relevance judgments are generated using a proposed boxplot approach for simple document features, HTML features and readability features. The boxplot approach is a simple yet repeatable approach in classifying documents’ effort while ensuring outlier scores do not skew the grading of the entire set of documents. Findings The retrieval systems evaluation using low effort relevance judgments has a stronger influence on shallow depth of evaluation compared to deeper depth. It is proved that difference in the system rankings is due to low effort documents and not the number of relevant documents. Originality/value Hence, it is crucial to evaluate retrieval systems at shallow depth using low effort relevance judgments.


Sign in / Sign up

Export Citation Format

Share Document