document retrieval
Recently Published Documents





2022 ◽  
Vol 40 (3) ◽  
pp. 1-30
Procheta Sen ◽  
Debasis Ganguly ◽  
Gareth J. F. Jones

Reducing user effort in finding relevant information is one of the key objectives of search systems. Existing approaches have been shown to effectively exploit the context from the current search session of users for automatically suggesting queries to reduce their search efforts. However, these approaches do not accomplish the end goal of a search system—that of retrieving a set of potentially relevant documents for the evolving information need during a search session. This article takes the problem of query prediction one step further by investigating the problem of contextual recommendation within a search session. More specifically, given the partial context information of a session in the form of a small number of queries, we investigate how a search system can effectively predict the documents that a user would have been presented with had he continued the search session by submitting subsequent queries. To address the problem, we propose a model of contextual recommendation that seeks to capture the underlying semantics of information need transitions of a current user’s search context. This model leverages information from a number of past interactions of other users with similar interactions from an existing search log. To identify similar interactions, as a novel contribution, we propose an embedding approach that jointly learns representations of both individual query terms and also those of queries (in their entirety) from a search log data by leveraging session-level containment relationships. Our experiments conducted on a large query log, namely the AOL, demonstrate that using a joint embedding of queries and their terms within our proposed framework of document retrieval outperforms a number of text-only and sequence modeling based baselines.

2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

Understanding the actual need of user from a question is very crucial in non-factoid why-question answering as Why-questions are complex and involve ambiguity and redundancy in their understanding. The precise requirement is to determine the focus of question and reformulate them accordingly to retrieve expected answers to a question. The paper analyzes different types of why-questions and proposes an algorithm for each class to determine the focus and reformulate it into a query by appending focal terms and cue phrase ‘because’ with it. Further, a user interface is implemented which asks input why-question, applies different components of question , reformulates it and finally retrieve web pages by posing query to Google search engine. To measure the accuracy of the process, user feedback is taken which asks them to assign scoring from 1 to 10, on how relevant are the retrieved web pages according to their understanding. The results depict that maximum precision of 89% is achieved in Informational type why-questions and minimum of 48% in opinionated type why-questions.

2022 ◽  
Vol 12 (1) ◽  
pp. 0-0

In the context of big data and the 4.0 industrial revolution era, enhancing document/information retrieval frameworks efficiency to handle the ever‐growing volume of text data in an ever more digital world is a must. This article describes a double-stage system of document/information retrieval. First, a Lucene-based document retrieval tool is implemented, and a couple of query expansion techniques using a comparable corpus (Wikipedia) and word embeddings are proposed and tested. Second, a retention-fidelity summarization protocol is performed on top of the retrieved documents to create a short, accurate, and fluent extract of a longer retrieved single document (or a set of top retrieved documents). Obtained results show that using word embeddings is an excellent way to achieve higher precision rates and retrieve more accurate documents. Also, obtained summaries satisfy the retention and fidelity criteria of relevant summaries.

2021 ◽  
Vol 5 (12) ◽  
pp. 82-87
Haixia He

With the development of big data, all walks of life in society have begun to venture into big data to serve their own enterprises and departments. Big data has been embraced by university digital libraries. The most cumbersome work for the management of university libraries is document retrieval. This article uses Hadoop algorithm to extract semantic keywords and then calculates semantic similarity based on the literature retrieval keyword calculation process. The fast-matching method is used to determine the weight of each keyword, so as to ensure an efficient and accurate document retrieval in digital libraries, thus completing the design of the document retrieval method for university digital libraries based on Hadoop technology.

2021 ◽  
Vol 11 (24) ◽  
pp. 12040
Mustafa A. Al Sibahee ◽  
Ayad I. Abdulsada ◽  
Zaid Ameen Abduljabbar ◽  
Junchao Ma ◽  
Vincent Omollo Nyangaresi ◽  

Applications for document similarity detection are widespread in diverse communities, including institutions and corporations. However, currently available detection systems fail to take into account the private nature of material or documents that have been outsourced to remote servers. None of the existing solutions can be described as lightweight techniques that are compatible with lightweight client implementation, and this deficiency can limit the effectiveness of these systems. For instance, the discovery of similarity between two conferences or journals must maintain the privacy of the submitted papers in a lightweight manner to ensure that the security and application requirements for limited-resource devices are fulfilled. This paper considers the problem of lightweight similarity detection between document sets while preserving the privacy of the material. The proposed solution permits documents to be compared without disclosing the content to untrusted servers. The fingerprint set for each document is determined in an efficient manner, also developing an inverted index that uses the whole set of fingerprints. Before being uploaded to the untrusted server, this index is secured by the Paillier cryptosystem. This study develops a secure, yet efficient method for scalable encrypted document comparison. To evaluate the computational performance of this method, this paper carries out several comparative assessments against other major approaches.

2021 ◽  
pp. 096100062110632
Wancheng Yang ◽  
Hailin Ning

Thousands of new publications appear every day in bibliometric databases, so the demand for document retrieval technology is growing. Bibliometrics makes it possible to perform a quantitative analysis of text publications; however, the problem of classifying complex videos with a high level of semantics remains unsolved. Meanwhile, short-form videos gain popularity and attract more researchers. Knowledge Graph seems to be a promising technology in this area. This technology makes it possible to modernize the information search infrastructure. The experiment involved 461 short-video studies. The material for the experiment was collected from the Chinese Social Sciences Citation Index (CSSCI) database. The bibliometric method was recognized as expedient for the analysis. The keyword mapping and clustering operations were performed using the CiteSpace software. The results demonstrate that short-form video research has been popular among Chinese scientists since 2017. Short-form video research focuses on five major topics, that is, development trends, modern media convergence, video production, visual content management, and short-form videos in the public sector. The present findings may be employed in future research to collect relevant samples with exact semantic relationships. The technology is not limited to specific applications and, therefore, may be useful in any field of research.

2021 ◽  
pp. 999-1007
Saicharan Gadamshetti ◽  
Gerard Deepak ◽  
A. Santhanavijayan ◽  
K. R. Venugopal

Mohamed Trabelsi ◽  
Zhiyu Chen ◽  
Brian D. Davison ◽  
Jeff Heflin

AbstractRanking models are the main components of information retrieval systems. Several approaches to ranking are based on traditional machine learning algorithms using a set of hand-crafted features. Recently, researchers have leveraged deep learning models in information retrieval. These models are trained end-to-end to extract features from the raw data for ranking tasks, so that they overcome the limitations of hand-crafted features. A variety of deep learning models have been proposed, and each model presents a set of neural network components to extract features that are used for ranking. In this paper, we compare the proposed models in the literature along different dimensions in order to understand the major contributions and limitations of each model. In our discussion of the literature, we analyze the promising neural components, and propose future research directions. We also show the analogy between document retrieval and other retrieval tasks where the items to be ranked are structured documents, answers, images and videos.

Sign in / Sign up

Export Citation Format

Share Document