Designing a Document Retrieval Method for University Digital Libraries Based on Hadoop Technology

With the development of big data, all walks of life in society have begun to venture into big data to serve their own enterprises and departments. Big data has been embraced by university digital libraries. The most cumbersome work for the management of university libraries is document retrieval. This article uses Hadoop algorithm to extract semantic keywords and then calculates semantic similarity based on the literature retrieval keyword calculation process. The fast-matching method is used to determine the weight of each keyword, so as to ensure an efficient and accurate document retrieval in digital libraries, thus completing the design of the document retrieval method for university digital libraries based on Hadoop technology.

Download Full-text

Document Retrieval Method Using Semantic Similarity and Word Sense Disambiguation

Journal of Natural Language Processing ◽

10.5715/jnlp.4.3_51 ◽

1997 ◽

Vol 4 (3) ◽

pp. 51-70 ◽

Cited By ~ 4

Author(s):

KOZO OI ◽

EIICHIRO SUMITA ◽

HITOSHI IIDA

Keyword(s):

Semantic Similarity ◽

Word Sense Disambiguation ◽

Document Retrieval ◽

Word Sense ◽

Retrieval Method ◽

Sense Disambiguation

Download Full-text

The Research and Application in Intelligent Document Retrieval Based on Text Quantification and Subject Mapping

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.605-607.2561 ◽

2012 ◽

Vol 605-607 ◽

pp. 2561-2568

Author(s):

Qin Wang ◽

Shou Ning Qu ◽

Tao Du ◽

Ming Jing Zhang

Keyword(s):

Association Rule ◽

Document Retrieval ◽

Retrieval Method ◽

Feature Extraction Method ◽

Quantification Method ◽

Academic Exchange ◽

Similarity Calculation ◽

Five Elements ◽

Mapping Scheme ◽

Text Element

Nowadays, document retrieval was an important way of academic exchange and achieving new knowledge. Choosing corresponding category of database and matching the input key words was the traditional document retrieval method. Using the method, a mass of documents would be got and it was hard for users to find the most relevant document. The paper put forward text quantification method. That was mining the features of each element in some document, which including word concept, weight value for position function, improved weights characteristic value, text distribution function weights value and text element length. Then the word’ contributions to this document would be got from the combination of five elements characteristics. Every document in database was stored digitally by the contribution of elements. And a subject mapping scheme was designed in the paper, which the similarity calculation method based on contribution and association rule was firstly designed, according to the method, the documents in the database would be conducted text clustering, and then feature extraction method was used to find class subject. When searching some document, the description which users input would be quantified and mapped to some class automatically by subject mapping, then the document sequences would be retrieved by computing the similarity between the description and the other documents’ features in the class. Experiment shows that the scheme has many merits such as intelligence, accuracy as well as improving retrieval speed.

Download Full-text

A New Subject-based Document Retrieval from Digital Libraries Using Vector Space Model

Proceedings of the 2018 Federated Conference on Computer Science and Information Systems ◽

10.15439/2018f260 ◽

2018 ◽

Cited By ~ 1

Author(s):

Sayed Mahmood Bakhshayesh ◽

Azadeh Mohebi ◽

Abbas Ahmadi ◽

Amir Badamchi

Keyword(s):

Vector Space ◽

Digital Libraries ◽

Vector Space Model ◽

Document Retrieval ◽

Space Model

Download Full-text

البيانات الضخمة في المکتبات الجامعية السعودية: مکتبة الملک عبد الله الجامعية نموذجاً Big data in Saudi university libraries: King Abdullah University Library as a model

International Journal of Library and Information Sciences ◽

10.21608/ijlis.2021.97988.1115 ◽

2021 ◽

Vol 0 (0) ◽

pp. 0-0

Author(s):

lotfia shenishen

Keyword(s):

Big Data ◽

University Libraries ◽

University Library

Download Full-text

Risk Prediction Pattern Matching Method of Construction Project Management System in Big Data Era

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Advanced Hybrid Information Processing ◽

10.1007/978-3-030-67871-5_37 ◽

2021 ◽

pp. 411-424

Author(s):

Qiu-yi Li

Keyword(s):

Big Data ◽

Project Management ◽

Risk Prediction ◽

Pattern Matching ◽

Management System ◽

Construction Project ◽

Construction Project Management ◽

Matching Method ◽

Project Management System

Download Full-text

Simulation of Fast Retrieval Method for Large-Scale Image Database

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.556-562.4959 ◽

2014 ◽

Vol 556-562 ◽

pp. 4959-4962

Author(s):

Sai Qiao

Keyword(s):

Feature Extraction ◽

Large Scale ◽

Great Increase ◽

Texture Features ◽

Image Database ◽

Image Databases ◽

Matching Method ◽

Retrieval Method ◽

Database Retrieval ◽

Database Constraints

The traditional database information retrieval method is achieved by retrieving simple corresponding association of the attributes, which has the necessary requirement that image only have a single characteristic, with increasing complexity of image, it is difficult to process further feature extraction for the image, resulting in great increase of time consumed by large-scale image database retrieval. A fast retrieval method for large-scale image databases is proposed. Texture features are extracted in the database to support retrieval in database. Constraints matching method is introduced, in large-scale image database, referring to the texture features of image in the database to complete the target retrieval. The experimental results show that the proposed algorithm applied in the large-scale image database retrieval, augments retrieval speed, thereby improves the performance of large-scale image database.

Download Full-text

A Research on Information Security of University Libraries in the Era of Big Data

Advances in Intelligent Systems and Computing - Recent Developments in Mechatronics and Intelligent Robotics ◽

10.1007/978-3-319-65978-7_17 ◽

2017 ◽

pp. 112-116

Author(s):

Likun Zheng ◽

Yongxin Qu ◽

Hui Zhang ◽

Huiying Shi ◽

Xinglan Wang

Keyword(s):

Big Data ◽

Information Security ◽

University Libraries

Download Full-text

Handling Big Data Scalability in Biological Domain Using Parallel and Distributed Processing: A Case of Three Biological Semantic Similarity Measures

BioMed Research International ◽

10.1155/2019/6750296 ◽

2019 ◽

Vol 2019 ◽

pp. 1-20 ◽

Cited By ~ 1

Author(s):

Ameera M. Almasoud ◽

Hend S. Al-Khalifa ◽

Abdulmalik S. Al-Salman

Keyword(s):

Big Data ◽

Semantic Similarity ◽

Data Clustering ◽

Input Data ◽

Distributed Processing ◽

Clustering Algorithms ◽

Similarity Measures ◽

Parallel And Distributed Processing ◽

Time Reduction ◽

Improved Performance

In the field of biology, researchers need to compare genes or gene products using semantic similarity measures (SSM). Continuous data growth and diversity in data characteristics comprise what is called big data; current biological SSMs cannot handle big data. Therefore, these measures need the ability to control the size of big data. We used parallel and distributed processing by splitting data into multiple partitions and applied SSM measures to each partition; this approach helped manage big data scalability and computational problems. Our solution involves three steps: split gene ontology (GO), data clustering, and semantic similarity calculation. To test this method, split GO and data clustering algorithms were defined and assessed for performance in the first two steps. Three of the best SSMs in biology [Resnik, Shortest Semantic Differentiation Distance (SSDD), and SORA] are enhanced by introducing threaded parallel processing, which is used in the third step. Our results demonstrate that introducing threads in SSMs reduced the time of calculating semantic similarity between gene pairs and improved performance of the three SSMs. Average time was reduced by 24.51% for Resnik, 22.93%, for SSDD, and 33.68% for SORA. Total time was reduced by 8.88% for Resnik, 23.14% for SSDD, and 39.27% for SORA. Using these threaded measures in the distributed system, combined with using split GO and data clustering algorithms to split input data based on their similarity, reduced the average time more than did the approach of equally dividing input data. Time reduction increased with increasing number of splits. Time reduction percentage was 24.1%, 39.2%, and 66.6% for Threaded SSDD; 33.0%, 78.2%, and 93.1% for Threaded SORA in the case of 2, 3, and 4 slaves, respectively; and 92.04% for Threaded Resnik in the case of four slaves.

Download Full-text