information retrieval
Recently Published Documents


TOTAL DOCUMENTS

13115
(FIVE YEARS 2766)

H-INDEX

120
(FIVE YEARS 27)

2022 ◽  
Vol 15 (1) ◽  
pp. 1-13
Author(s):  
David Otero ◽  
Patricia Martin-Rodilla ◽  
Javier Parapar

Social networks constitute a valuable source for documenting heritage constitution processes or obtaining a real-time snapshot of a cultural heritage research topic. Many heritage researchers use social networks as a social thermometer to study these processes, creating, for this purpose, collections that constitute born-digital archives potentially reusable, searchable, and of interest to other researchers or citizens. However, retrieval and archiving techniques used in social networks within heritage studies are still semi-manual, being a time-consuming task and hindering the reproducibility, evaluation, and open-up of the collections created. By combining Information Retrieval strategies with emerging archival techniques, some of these weaknesses can be left behind. Specifically, pooling is a well-known Information Retrieval method to extract a sample of documents from an entire document set (posts in case of social network’s information), obtaining the most complete and unbiased set of relevant documents on a given topic. Using this approach, researchers could create a reference collection while avoiding annotating the entire corpus of documents or posts retrieved. This is especially useful in social media due to the large number of topics treated by the same user or in the same thread or post. We present a platform for applying pooling strategies combined with expert judgment to create cultural heritage reference collections from social networks in a customisable, reproducible, documented, and shareable way. The platform is validated by building a reference collection from a social network about the recent attacks on patrimonial entities motivated by anti-racist protests. This reference collection and the results obtained from its preliminary study are available for use. This real application has allowed us to validate the platform and the pooling strategies for creating reference collections in heritage studies from social networks.


Author(s):  
Md. Saddam Hossain Mukta ◽  
Md. Adnanul Islam ◽  
Faisal Ahamed Khan ◽  
Afjal Hossain ◽  
Shuvanon Razik ◽  
...  

Sentiment Analysis (SA) is a Natural Language Processing (NLP) and an Information Extraction (IE) task that primarily aims to obtain the writer’s feelings expressed in positive or negative by analyzing a large number of documents. SA is also widely studied in the fields of data mining, web mining, text mining, and information retrieval. The fundamental task in sentiment analysis is to classify the polarity of a given content as Positive, Negative, or Neutral . Although extensive research has been conducted in this area of computational linguistics, most of the research work has been carried out in the context of English language. However, Bengali sentiment expression has varying degree of sentiment labels, which can be plausibly distinct from English language. Therefore, sentiment assessment of Bengali language is undeniably important to be developed and executed properly. In sentiment analysis, the prediction potential of an automatic modeling is completely dependent on the quality of dataset annotation. Bengali sentiment annotation is a challenging task due to diversified structures (syntax) of the language and its different degrees of innate sentiments (i.e., weakly and strongly positive/negative sentiments). Thus, in this article, we propose a novel and precise guideline for the researchers, linguistic experts, and referees to annotate Bengali sentences immaculately with a view to building effective datasets for automatic sentiment prediction efficiently.


2022 ◽  
Vol 24 (3) ◽  
pp. 0-0

The cost-effective and easy availability of handheld mobile devices and ubiquity of location acquisition services such as GPS and GSM networks has helped expedient logging and sharing of location histories of mobile users. This work aims to find semantic user similarity using their past travel histories. Application of the semantic similarity measure can be found in tourism-related recommender systems and information retrieval. The paper presents Earth Mover’s Distance (EMD) based semantic user similarity measure using users' GPS logs. The similarity measure is applied and evaluated on the GPS dataset of 182 users collected from April 2007 to August 2012 by Microsoft's GeoLife project. The proposed similarity measure is compared with conventional similarity measures used in literature such as Jaccard, Dice, and Pearsons’ Correlation. The percentage improvement of EMD based approach over existing approaches in terms of average RMSE is 10.70%, and average MAE is 5.73%.


2022 ◽  
Vol 54 (9) ◽  
pp. 1-40
Author(s):  
Chao Liu ◽  
Xin Xia ◽  
David Lo ◽  
Cuiyun Gao ◽  
Xiaohu Yang ◽  
...  

Code search is a core software engineering task. Effective code search tools can help developers substantially improve their software development efficiency and effectiveness. In recent years, many code search studies have leveraged different techniques, such as deep learning and information retrieval approaches, to retrieve expected code from a large-scale codebase. However, there is a lack of a comprehensive comparative summary of existing code search approaches. To understand the research trends in existing code search studies, we systematically reviewed 81 relevant studies. We investigated the publication trends of code search studies, analyzed key components, such as codebase, query, and modeling technique used to build code search tools, and classified existing tools into focusing on supporting seven different search tasks. Based on our findings, we identified a set of outstanding challenges in existing studies and a research roadmap for future code search research.


2022 ◽  
Vol 24 (3) ◽  
pp. 1-17
Author(s):  
Sunita Tiwari ◽  
Saroj Kaushik

The cost-effective and easy availability of handheld mobile devices and ubiquity of location acquisition services such as GPS and GSM networks has helped expedient logging and sharing of location histories of mobile users. This work aims to find semantic user similarity using their past travel histories. Application of the semantic similarity measure can be found in tourism-related recommender systems and information retrieval. The paper presents Earth Mover’s Distance (EMD) based semantic user similarity measure using users' GPS logs. The similarity measure is applied and evaluated on the GPS dataset of 182 users collected from April 2007 to August 2012 by Microsoft's GeoLife project. The proposed similarity measure is compared with conventional similarity measures used in literature such as Jaccard, Dice, and Pearsons’ Correlation. The percentage improvement of EMD based approach over existing approaches in terms of average RMSE is 10.70%, and average MAE is 5.73%.


2022 ◽  
Vol 54 (7) ◽  
pp. 1-38
Author(s):  
Lynda Tamine ◽  
Lorraine Goeuriot

The explosive growth and widespread accessibility of medical information on the Internet have led to a surge of research activity in a wide range of scientific communities including health informatics and information retrieval (IR). One of the common concerns of this research, across these disciplines, is how to design either clinical decision support systems or medical search engines capable of providing adequate support for both novices (e.g., patients and their next-of-kin) and experts (e.g., physicians, clinicians) tackling complex tasks (e.g., search for diagnosis, search for a treatment). However, despite the significant multi-disciplinary research advances, current medical search systems exhibit low levels of performance. This survey provides an overview of the state of the art in the disciplines of IR and health informatics, and bridging these disciplines shows how semantic search techniques can facilitate medical IR. First,we will give a broad picture of semantic search and medical IR and then highlight the major scientific challenges. Second, focusing on the semantic gap challenge, we will discuss representative state-of-the-art work related to feature-based as well as semantic-based representation and matching models that support medical search systems. In addition to seminal works, we will present recent works that rely on research advancements in deep learning. Third, we make a thorough cross-model analysis and provide some findings and lessons learned. Finally, we discuss some open issues and possible promising directions for future research trends.


Author(s):  
Abdullah Saleh Alqahtani ◽  
P. Saravanan ◽  
M. Maheswari ◽  
Sami Alshmrany

PeerJ ◽  
2022 ◽  
Vol 10 ◽  
pp. e12764
Author(s):  
Raul Rodriguez-Esteban

Delays in the propagation of scientific discoveries across scientific communities have been an oft-maligned feature of scientific research for introducing a bias towards knowledge that is produced within a scientist’s closest community. The vastness of the scientific literature has been commonly blamed for this phenomenon, despite recent improvements in information retrieval and text mining. Its actual negative impact on scientific progress, however, has never been quantified. This analysis attempts to do so by exploring its effects on biomedical discovery, particularly in the discovery of relations between diseases, genes and chemical compounds. Results indicate that the probability that two scientific facts will enable the discovery of a new fact depends on how far apart these two facts were originally within the scientific landscape. In particular, the probability decreases exponentially with the citation distance. Thus, the direction of scientific progress is distorted based on the location in which each scientific fact is published, representing a path-dependent bias in which originally closely-located discoveries drive the sequence of future discoveries. To counter this bias, scientists should open the scope of their scientific work with modern information retrieval and extraction approaches.


Author(s):  
Meng Yuan ◽  
Justin Zobel ◽  
Pauline Lin

AbstractClustering of the contents of a document corpus is used to create sub-corpora with the intention that they are expected to consist of documents that are related to each other. However, while clustering is used in a variety of ways in document applications such as information retrieval, and a range of methods have been applied to the task, there has been relatively little exploration of how well it works in practice. Indeed, given the high dimensionality of the data it is possible that clustering may not always produce meaningful outcomes. In this paper we use a well-known clustering method to explore a variety of techniques, existing and novel, to measure clustering effectiveness. Results with our new, extrinsic techniques based on relevance judgements or retrieved documents demonstrate that retrieval-based information can be used to assess the quality of clustering, and also show that clustering can succeed to some extent at gathering together similar material. Further, they show that intrinsic clustering techniques that have been shown to be informative in other domains do not work for information retrieval. Whether clustering is sufficiently effective to have a significant impact on practical retrieval is unclear, but as the results show our measurement techniques can effectively distinguish between clustering methods.


Sign in / Sign up

Export Citation Format

Share Document