A Chinese Document Retrieval Method Enhanced by Concept Base

Author(s):  
Jian Su ◽  
Wenyong Weng ◽  
Zebing Wang
2012 ◽  
Vol 605-607 ◽  
pp. 2561-2568
Author(s):  
Qin Wang ◽  
Shou Ning Qu ◽  
Tao Du ◽  
Ming Jing Zhang

Nowadays, document retrieval was an important way of academic exchange and achieving new knowledge. Choosing corresponding category of database and matching the input key words was the traditional document retrieval method. Using the method, a mass of documents would be got and it was hard for users to find the most relevant document. The paper put forward text quantification method. That was mining the features of each element in some document, which including word concept, weight value for position function, improved weights characteristic value, text distribution function weights value and text element length. Then the word’ contributions to this document would be got from the combination of five elements characteristics. Every document in database was stored digitally by the contribution of elements. And a subject mapping scheme was designed in the paper, which the similarity calculation method based on contribution and association rule was firstly designed, according to the method, the documents in the database would be conducted text clustering, and then feature extraction method was used to find class subject. When searching some document, the description which users input would be quantified and mapped to some class automatically by subject mapping, then the document sequences would be retrieved by computing the similarity between the description and the other documents’ features in the class. Experiment shows that the scheme has many merits such as intelligence, accuracy as well as improving retrieval speed.


2005 ◽  
Vol 23 (3) ◽  
pp. 267-298 ◽  
Author(s):  
Laurence A. F. Park ◽  
Kotagiri Ramamohanarao ◽  
Marimuthu Palaniswami

1989 ◽  
Vol 30 (2) ◽  
pp. 103-120 ◽  
Author(s):  
Tetsuya Murai ◽  
Masaaki Miyakoshi ◽  
Masaru Shimbo

2021 ◽  
Vol 5 (12) ◽  
pp. 82-87
Author(s):  
Haixia He

With the development of big data, all walks of life in society have begun to venture into big data to serve their own enterprises and departments. Big data has been embraced by university digital libraries. The most cumbersome work for the management of university libraries is document retrieval. This article uses Hadoop algorithm to extract semantic keywords and then calculates semantic similarity based on the literature retrieval keyword calculation process. The fast-matching method is used to determine the weight of each keyword, so as to ensure an efficient and accurate document retrieval in digital libraries, thus completing the design of the document retrieval method for university digital libraries based on Hadoop technology.


Author(s):  
KENJI SAITO ◽  
HIROYUKI SHIOYA ◽  
TSUTOMU DA-TE

We improve a document retrieval method based on the so-called maximum entropy principle proposed by Cooper, and show how to implement this system on a Bayesian network. A Bayesian network is a probabilistic model for expressing probabilistic relations among random variables. We show advantages of a document retrieval system on a Bayesian network in comparison with Cooper's system. The original document retrieval system based on the maximum entropy principle has a drawback: a result of retrieval can not be obtained in some cases. In this paper, we resolve this drawback by fuzzification of user retrieval requests.


2020 ◽  
Author(s):  
Lana Alsabbagh ◽  
Oumayma AlDakkak ◽  
Nada Ghneim

Abstract In this paper, we present our approach to improve the performance of open-domain Arabic Question Answering systems. We focus on the passage retrieval phase which aims to retrieve the most related passages to the correct answer. To extract passages that are related to the question, the system passes through three phases: Question Analysis, Document Retrieval and Passage Retrieval. We define the passage as the sentence that ends with a dot ".". In the Question Processing phase, we applied the traditional NLP steps of tokenization, stopwords and unrelated symbols removal, and replacing the question words with their stems. We also applied Query Expansion by adding synonyms to the question words. In the Document Retrieval phase, we used the Vector Space Model (VSM) with TF-IDF vectorizer and cosine similarity. For the Passage Retrieval phase, which is the core of our system, we measured the similarity between passages and the question by a combination of the BM25 ranker and Word Embedding approach. We tested our system on ACRD dataset, which contains 1395 questions in different domains, and the system was able to achieve correct results with a precision of 92.2% and recall of 79.9% in finding the top-3 related passages for the query.


2002 ◽  
Vol 33 (2) ◽  
pp. 54-63 ◽  
Author(s):  
Yasushi Ogawa ◽  
Toru Matsuda

Sign in / Sign up

Export Citation Format

Share Document