A Chinese Document Retrieval Method Enhanced by Concept Base

Nowadays, document retrieval was an important way of academic exchange and achieving new knowledge. Choosing corresponding category of database and matching the input key words was the traditional document retrieval method. Using the method, a mass of documents would be got and it was hard for users to find the most relevant document. The paper put forward text quantification method. That was mining the features of each element in some document, which including word concept, weight value for position function, improved weights characteristic value, text distribution function weights value and text element length. Then the word’ contributions to this document would be got from the combination of five elements characteristics. Every document in database was stored digitally by the contribution of elements. And a subject mapping scheme was designed in the paper, which the similarity calculation method based on contribution and association rule was firstly designed, according to the method, the documents in the database would be conducted text clustering, and then feature extraction method was used to find class subject. When searching some document, the description which users input would be quantified and mapped to some class automatically by subject mapping, then the document sequences would be retrieved by computing the similarity between the description and the other documents’ features in the class. Experiment shows that the scheme has many merits such as intelligence, accuracy as well as improving retrieval speed.

Download Full-text

A novel document retrieval method using the discrete wavelet transform

ACM Transactions on Information Systems ◽

10.1145/1080343.1080345 ◽

2005 ◽

Vol 23 (3) ◽

pp. 267-298 ◽

Cited By ~ 23

Author(s):

Laurence A. F. Park ◽

Kotagiri Ramamohanarao ◽

Marimuthu Palaniswami

Keyword(s):

Wavelet Transform ◽

Discrete Wavelet Transform ◽

Document Retrieval ◽

Discrete Wavelet ◽

Retrieval Method

Download Full-text

A fuzzy document retrieval method based on two-valued indexing

Fuzzy Sets and Systems ◽

10.1016/0165-0114(89)90074-2 ◽

1989 ◽

Vol 30 (2) ◽

pp. 103-120 ◽

Cited By ~ 21

Author(s):

Tetsuya Murai ◽

Masaaki Miyakoshi ◽

Masaru Shimbo

Keyword(s):

Document Retrieval ◽

Retrieval Method

Download Full-text

Designing a Document Retrieval Method for University Digital Libraries Based on Hadoop Technology

Journal of Contemporary Educational Research ◽

10.26689/jcer.v5i12.2821 ◽

2021 ◽

Vol 5 (12) ◽

pp. 82-87

Author(s):

Haixia He

Keyword(s):

Big Data ◽

Semantic Similarity ◽

Digital Libraries ◽

Document Retrieval ◽

University Libraries ◽

Matching Method ◽

Retrieval Method ◽

Calculation Process

With the development of big data, all walks of life in society have begun to venture into big data to serve their own enterprises and departments. Big data has been embraced by university digital libraries. The most cumbersome work for the management of university libraries is document retrieval. This article uses Hadoop algorithm to extract semantic keywords and then calculates semantic similarity based on the literature retrieval keyword calculation process. The fast-matching method is used to determine the weight of each keyword, so as to ensure an efficient and accurate document retrieval in digital libraries, thus completing the design of the document retrieval method for university digital libraries based on Hadoop technology.

Download Full-text

Document retrieval method using random walk with restart on weighted co-citation network

Proceedings of the American Society for Information Science and Technology ◽

10.1002/meet.2014.14505101126 ◽

2014 ◽

Vol 51 (1) ◽

pp. 1-4 ◽

Cited By ~ 3

Author(s):

Masaki Eto

Keyword(s):

Random Walk ◽

Citation Network ◽

Document Retrieval ◽

Random Walk With Restart ◽

Retrieval Method

Download Full-text

Two-Phase Path Retrieval Method for Similar XML Document Retrieval

Lecture Notes in Computer Science - Knowledge-Based Intelligent Information and Engineering Systems ◽

10.1007/11552413_138 ◽

2005 ◽

pp. 967-971 ◽

Cited By ~ 1

Author(s):

Jae-Min Lee ◽

Byung-Yeon Hwang

Keyword(s):

Document Retrieval ◽

Two Phase ◽

Retrieval Method ◽

Xml Document ◽

Phase Path

Download Full-text

A document retrieval method from handwritten characters based on OCR and character shape information

Proceedings of Sixth International Conference on Document Analysis and Recognition ◽

10.1109/icdar.2001.953859 ◽

2002 ◽

Cited By ~ 3

Author(s):

T. Kameshiro ◽

T. Hirano ◽

Y. Okada ◽

F. Yoda

Keyword(s):

Document Retrieval ◽

Shape Information ◽

Retrieval Method

Download Full-text

A TREATMENT OF USEFULNESS OF KEYWORDS IN FUZZY REQUESTS FOR AN INFORMATION RETRIEVAL SYSTEM WITH BAYESIAN NETWORKS

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488599000350 ◽

1999 ◽

Vol 07 (04) ◽

pp. 399-406

Author(s):

KENJI SAITO ◽

HIROYUKI SHIOYA ◽

TSUTOMU DA-TE

Keyword(s):

Information Retrieval ◽

Bayesian Network ◽

Maximum Entropy ◽

Probabilistic Model ◽

Retrieval System ◽

Maximum Entropy Principle ◽

Document Retrieval ◽

Information Retrieval System ◽

Retrieval Method ◽

Entropy Principle

We improve a document retrieval method based on the so-called maximum entropy principle proposed by Cooper, and show how to implement this system on a Bayesian network. A Bayesian network is a probabilistic model for expressing probabilistic relations among random variables. We show advantages of a document retrieval system on a Bayesian network in comparison with Cooper's system. The original document retrieval system based on the maximum entropy principle has a drawback: a result of retrieval can not be obtained in some cases. In this paper, we resolve this drawback by fuzzification of user retrieval requests.

Download Full-text

A new Passage Retrieval Method in Arabic Question Answering Systems

10.21203/rs.3.rs-119562/v1 ◽

2020 ◽

Author(s):

Lana Alsabbagh ◽

Oumayma AlDakkak ◽

Nada Ghneim

Keyword(s):

Query Expansion ◽

Question Answering ◽

Document Retrieval ◽

Open Domain ◽

Retrieval Method ◽

Passage Retrieval ◽

Question Analysis ◽

The Core ◽

Question Answering Systems ◽

Retrieval Phase

Abstract In this paper, we present our approach to improve the performance of open-domain Arabic Question Answering systems. We focus on the passage retrieval phase which aims to retrieve the most related passages to the correct answer. To extract passages that are related to the question, the system passes through three phases: Question Analysis, Document Retrieval and Passage Retrieval. We define the passage as the sentence that ends with a dot ".". In the Question Processing phase, we applied the traditional NLP steps of tokenization, stopwords and unrelated symbols removal, and replacing the question words with their stems. We also applied Query Expansion by adding synonyms to the question words. In the Document Retrieval phase, we used the Vector Space Model (VSM) with TF-IDF vectorizer and cosine similarity. For the Passage Retrieval phase, which is the core of our system, we measured the similarity between passages and the question by a combination of the BM25 ranker and Word Embedding approach. We tested our system on ACRD dataset, which contains 1395 questions in different domains, and the system was able to achieve correct results with a precision of 92.2% and recall of 79.9% in finding the top-3 related passages for the query.

Download Full-text

An efficient document retrieval method using n-gram indexing

Systems and Computers in Japan ◽

10.1002/scj.1106 ◽

2002 ◽

Vol 33 (2) ◽

pp. 54-63 ◽

Cited By ~ 5

Author(s):

Yasushi Ogawa ◽

Toru Matsuda

Keyword(s):

Document Retrieval ◽

Retrieval Method ◽

N Gram

Download Full-text