document retrieval Latest Research Papers

2022 ◽

Vol 40 (3) ◽

pp. 1-30

Author(s):

Procheta Sen ◽

Debasis Ganguly ◽

Gareth J. F. Jones

Keyword(s):

Relevant Information ◽

Document Retrieval ◽

Context Information ◽

Information Need ◽

Search System ◽

Query Log ◽

Sequence Modeling ◽

Joint Embedding ◽

One Step ◽

A Current

Reducing user effort in finding relevant information is one of the key objectives of search systems. Existing approaches have been shown to effectively exploit the context from the current search session of users for automatically suggesting queries to reduce their search efforts. However, these approaches do not accomplish the end goal of a search system—that of retrieving a set of potentially relevant documents for the evolving information need during a search session. This article takes the problem of query prediction one step further by investigating the problem of contextual recommendation within a search session. More specifically, given the partial context information of a session in the form of a small number of queries, we investigate how a search system can effectively predict the documents that a user would have been presented with had he continued the search session by submitting subsequent queries. To address the problem, we propose a model of contextual recommendation that seeks to capture the underlying semantics of information need transitions of a current user’s search context. This model leverages information from a number of past interactions of other users with similar interactions from an existing search log. To identify similar interactions, as a novel contribution, we propose an embedding approach that jointly learns representations of both individual query terms and also those of queries (in their entirety) from a search log data by leveraging session-level containment relationships. Our experiments conducted on a large query log, namely the AOL, demonstrate that using a joint embedding of queries and their terms within our proposed framework of document retrieval outperforms a number of text-only and sequence modeling based baselines.

Download Full-text

Why-type Question to Query Reformulation for efficient Document Retrieval

International Journal of Information Retrieval Research ◽

10.4018/ijirr.289948 ◽

2022 ◽

Vol 12 (1) ◽

pp. 0-0

Keyword(s):

User Interface ◽

Search Engine ◽

Question Answering ◽

Document Retrieval ◽

Web Pages ◽

Query Reformulation ◽

Different Types ◽

Why Questions ◽

Google Search ◽

Type Question

Understanding the actual need of user from a question is very crucial in non-factoid why-question answering as Why-questions are complex and involve ambiguity and redundancy in their understanding. The precise requirement is to determine the focus of question and reformulate them accordingly to retrieve expected answers to a question. The paper analyzes different types of why-questions and proposes an algorithm for each class to determine the focus and reformulate it into a query by appending focal terms and cue phrase ‘because’ with it. Further, a user interface is implemented which asks input why-question, applies different components of question , reformulates it and finally retrieve web pages by posing query to Google search engine. To measure the accuracy of the process, user feedback is taken which asks them to assign scoring from 1 to 10, on how relevant are the retrieved web pages according to their understanding. The results depict that maximum precision of 89% is achieved in Informational type why-questions and minimum of 48% in opinionated type why-questions.

Download Full-text

An End-to-End Efficient Lucene-Based Framework of Document/Information Retrieval

International Journal of Information Retrieval Research ◽

10.4018/ijirr.289950 ◽

2022 ◽

Vol 12 (1) ◽

pp. 0-0

Keyword(s):

Big Data ◽

Information Retrieval ◽

Query Expansion ◽

Industrial Revolution ◽

Document Retrieval ◽

Word Embeddings ◽

Text Data ◽

Digital World ◽

Stage System ◽

End To End

In the context of big data and the 4.0 industrial revolution era, enhancing document/information retrieval frameworks efficiency to handle the ever‐growing volume of text data in an ever more digital world is a must. This article describes a double-stage system of document/information retrieval. First, a Lucene-based document retrieval tool is implemented, and a couple of query expansion techniques using a comparable corpus (Wikipedia) and word embeddings are proposed and tested. Second, a retention-fidelity summarization protocol is performed on top of the retrieved documents to create a short, accurate, and fluent extract of a longer retrieved single document (or a set of top retrieved documents). Obtained results show that using word embeddings is an excellent way to achieve higher precision rates and retrieve more accurate documents. Also, obtained summaries satisfy the retention and fidelity criteria of relevant summaries.

Download Full-text

Designing a Document Retrieval Method for University Digital Libraries Based on Hadoop Technology

Journal of Contemporary Educational Research ◽

10.26689/jcer.v5i12.2821 ◽

2021 ◽

Vol 5 (12) ◽

pp. 82-87

Author(s):

Haixia He

Keyword(s):

Big Data ◽

Semantic Similarity ◽

Digital Libraries ◽

Document Retrieval ◽

University Libraries ◽

Matching Method ◽

Retrieval Method ◽

Calculation Process

With the development of big data, all walks of life in society have begun to venture into big data to serve their own enterprises and departments. Big data has been embraced by university digital libraries. The most cumbersome work for the management of university libraries is document retrieval. This article uses Hadoop algorithm to extract semantic keywords and then calculates semantic similarity based on the literature retrieval keyword calculation process. The fast-matching method is used to determine the weight of each keyword, so as to ensure an efficient and accurate document retrieval in digital libraries, thus completing the design of the document retrieval method for university digital libraries based on Hadoop technology.

Download Full-text

Lightweight, Secure, Similar-Document Retrieval over Encrypted Data

Applied Sciences ◽

10.3390/app112412040 ◽

2021 ◽

Vol 11 (24) ◽

pp. 12040

Author(s):

Mustafa A. Al Sibahee ◽

Ayad I. Abdulsada ◽

Zaid Ameen Abduljabbar ◽

Junchao Ma ◽

Vincent Omollo Nyangaresi ◽

...

Keyword(s):

Limited Resource ◽

Document Retrieval ◽

Efficient Manner ◽

Document Similarity ◽

Encrypted Data ◽

Detection Systems ◽

Similarity Detection ◽

Computational Performance ◽

Paillier Cryptosystem ◽

Application Requirements

Applications for document similarity detection are widespread in diverse communities, including institutions and corporations. However, currently available detection systems fail to take into account the private nature of material or documents that have been outsourced to remote servers. None of the existing solutions can be described as lightweight techniques that are compatible with lightweight client implementation, and this deficiency can limit the effectiveness of these systems. For instance, the discovery of similarity between two conferences or journals must maintain the privacy of the submitted papers in a lightweight manner to ensure that the security and application requirements for limited-resource devices are fulfilled. This paper considers the problem of lightweight similarity detection between document sets while preserving the privacy of the material. The proposed solution permits documents to be compared without disclosing the content to untrusted servers. The fingerprint set for each document is determined in an efficient manner, also developing an inverted index that uses the whole set of fingerprints. Before being uploaded to the untrusted server, this index is secured by the Paillier cryptosystem. This study develops a secure, yet efficient method for scalable encrypted document comparison. To evaluate the computational performance of this method, this paper carries out several comparative assessments against other major approaches.

Download Full-text

Knowledge graph technology application in Chinese SSCI: An example of short videos research

Journal of Librarianship and Information Science ◽

10.1177/09610006211063201 ◽

2021 ◽

pp. 096100062110632

Author(s):

Wancheng Yang ◽

Hailin Ning

Keyword(s):

Information Search ◽

Short Form ◽

Document Retrieval ◽

Content Management ◽

Video Production ◽

Future Research ◽

Knowledge Graph ◽

Bibliometric Method ◽

Technology Application ◽

Video Research

Thousands of new publications appear every day in bibliometric databases, so the demand for document retrieval technology is growing. Bibliometrics makes it possible to perform a quantitative analysis of text publications; however, the problem of classifying complex videos with a high level of semantics remains unsolved. Meanwhile, short-form videos gain popularity and attract more researchers. Knowledge Graph seems to be a promising technology in this area. This technology makes it possible to modernize the information search infrastructure. The experiment involved 461 short-video studies. The material for the experiment was collected from the Chinese Social Sciences Citation Index (CSSCI) database. The bibliometric method was recognized as expedient for the analysis. The keyword mapping and clustering operations were performed using the CiteSpace software. The results demonstrate that short-form video research has been popular among Chinese scientists since 2017. Short-form video research focuses on five major topics, that is, development trends, modern media convergence, video production, visual content management, and short-form videos in the public sector. The present findings may be employed in future research to collect relevant samples with exact semantic relationships. The technology is not limited to specific applications and, therefore, may be useful in any field of research.

Download Full-text

Document Retrieval and Ranking using Similarity Graph Mean Hitting Times.

10.2172/1835671 ◽

2021 ◽

Author(s):

Daniel Dunlavy ◽

Peter Chew

Keyword(s):

Document Retrieval ◽

Hitting Times ◽

Similarity Graph

Download Full-text

RDRLLJ: Integrating Deep Learning Approach with Latent Semantic Analysis for Document Retrieval

10.1007/978-981-16-1342-5_79 ◽

2021 ◽

pp. 999-1007

Author(s):

Saicharan Gadamshetti ◽

Gerard Deepak ◽

A. Santhanavijayan ◽

K. R. Venugopal

Keyword(s):

Deep Learning ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Document Retrieval ◽

Learning Approach

Download Full-text

Neural ranking models for document retrieval

Information Retrieval ◽

10.1007/s10791-021-09398-0 ◽

2021 ◽

Author(s):

Mohamed Trabelsi ◽

Zhiyu Chen ◽

Brian D. Davison ◽

Jeff Heflin

Keyword(s):

Information Retrieval ◽

Deep Learning ◽

Document Retrieval ◽

Machine Learning Algorithms ◽

Future Research ◽

Learning Models ◽

Ranking Models ◽

Information Retrieval Systems ◽

Main Components ◽

Ranking Tasks

AbstractRanking models are the main components of information retrieval systems. Several approaches to ranking are based on traditional machine learning algorithms using a set of hand-crafted features. Recently, researchers have leveraged deep learning models in information retrieval. These models are trained end-to-end to extract features from the raw data for ranking tasks, so that they overcome the limitations of hand-crafted features. A variety of deep learning models have been proposed, and each model presents a set of neural network components to extract features that are used for ranking. In this paper, we compare the proposed models in the literature along different dimensions in order to understand the major contributions and limitations of each model. In our discussion of the literature, we analyze the promising neural components, and propose future research directions. We also show the analogy between document retrieval and other retrieval tasks where the items to be ranked are structured documents, answers, images and videos.

Download Full-text

Combining semi-supervised and active learning to rank algorithms: application to Document Retrieval

Information Retrieval ◽

10.1007/s10791-021-09396-2 ◽

2021 ◽

Author(s):

Faiza Dammak ◽

Hager Kammoun

Keyword(s):

Active Learning ◽

Learning To Rank ◽

Document Retrieval

Download Full-text

document retrieval
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

I Know What You Need: Investigating Document Retrieval Effectiveness with Partial Session Contexts

Why-type Question to Query Reformulation for efficient Document Retrieval

An End-to-End Efficient Lucene-Based Framework of Document/Information Retrieval

Designing a Document Retrieval Method for University Digital Libraries Based on Hadoop Technology

Lightweight, Secure, Similar-Document Retrieval over Encrypted Data

Knowledge graph technology application in Chinese SSCI: An example of short videos research

Document Retrieval and Ranking using Similarity Graph Mean Hitting Times.

RDRLLJ: Integrating Deep Learning Approach with Latent Semantic Analysis for Document Retrieval

Neural ranking models for document retrieval

Combining semi-supervised and active learning to rank algorithms: application to Document Retrieval

Export Citation Format

document retrievalRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

I Know What You Need: Investigating Document Retrieval Effectiveness with Partial Session Contexts

Why-type Question to Query Reformulation for efficient Document Retrieval

An End-to-End Efficient Lucene-Based Framework of Document/Information Retrieval

Designing a Document Retrieval Method for University Digital Libraries Based on Hadoop Technology

Lightweight, Secure, Similar-Document Retrieval over Encrypted Data

Knowledge graph technology application in Chinese SSCI: An example of short videos research

Document Retrieval and Ranking using Similarity Graph Mean Hitting Times.

RDRLLJ: Integrating Deep Learning Approach with Latent Semantic Analysis for Document Retrieval

Neural ranking models for document retrieval

Combining semi-supervised and active learning to rank algorithms: application to Document Retrieval

document retrieval
Recently Published Documents