inverted indexing
Recently Published Documents


TOTAL DOCUMENTS

32
(FIVE YEARS 7)

H-INDEX

7
(FIVE YEARS 0)

2021 ◽  
pp. 016555152110184
Author(s):  
Gunjan Chandwani ◽  
Anil Ahlawat ◽  
Gaurav Dubey

Document retrieval plays an important role in knowledge management as it facilitates us to discover the relevant information from the existing data. This article proposes a cluster-based inverted indexing algorithm for document retrieval. First, the pre-processing is done to remove the unnecessary and redundant words from the documents. Then, the indexing of documents is done by the cluster-based inverted indexing algorithm, which is developed by integrating the piecewise fuzzy C-means (piFCM) clustering algorithm and inverted indexing. After providing the index to the documents, the query matching is performed for the user queries using the Bhattacharyya distance. Finally, the query optimisation is done by the Pearson correlation coefficient, and the relevant documents are retrieved. The performance of the proposed algorithm is analysed by the WebKB data set and Twenty Newsgroups data set. The analysis exposes that the proposed algorithm offers high performance with a precision of 1, recall of 0.70 and F-measure of 0.8235. The proposed document retrieval system retrieves the most relevant documents and speeds up the storing and retrieval of information.


2021 ◽  
Vol 11 (4) ◽  
pp. 1781
Author(s):  
Peng Wang ◽  
Xiang Li

Recent years have seen an increasing emphasis on information security, and various encryption methods have been proposed. However, for symmetric encryption methods, the well-known encryption techniques still rely on the key space to guarantee security and suffer from frequent key updating. Aiming to solve those problems, this paper proposes a novel symmetry-key method for text encryption based on deep learning called TEDL, where the secret key includes hyperparameters in the deep learning model and the core step of encryption is transforming input data into weights trained under hyperparameters. Firstly, both communication parties establish a word vector table by training a deep learning model according to specified hyperparameters. Then, a self-update codebook is constructed on the word vector table with the SHA-256 function and other tricks. When communication starts, encryption and decryption are equivalent to indexing and inverted indexing on the codebook, respectively, thus achieving the transformation between plaintext and ciphertext. Results of experiments and relevant analyses show that TEDL performs well for security, efficiency, generality, and has a lower demand for the frequency of key redistribution. Especially, as a supplement to current encryption methods, the time-consuming process of constructing a codebook increases the difficulty of brute-force attacks, meanwhile, it does not degrade the efficiency of communications.


2020 ◽  
Vol 54 (4) ◽  
pp. 921-941
Author(s):  
Marwah Alian ◽  
Ghazi Al-Naymat ◽  
Banda Ramadan

The recent advancement in technologies are generating huge amount of data and extracting information from it is being outpaced by data accumulation. The development of hybrid approaches by combining different algorithms for extraction of required from the stock-pile of data is a demand of the hour. One such algorithm is vector space model for inverted indexing that has been used traditionally for search engine indexing in computers. In bioinformatics also it has been used for assembly of DNA fragments generated after sequencing. But it has not been applied for retrieval of relevant protein sequence to the query, based on presence or absence of motifs and domains in it. In this paper the concept of inverted indexing has been applied on small motif/domain data of proteins contained in Motivated Proteins database at http://motif.gla.ac.uk/motif/index.html. The index has been built using 17 small hydrogen bonded motifs present in a dataset of 430 proteins. The entire dataset of 430 proteins has been divided into 19 classes. Seven classes’ example cyanovirin, antibiotic and concavalin etc. had very few instances (1 or 2), hence have been omitted from further studies. Rest 12 classes with more than 10 proteins were considered further for testing information retrieval (IR) strategy. The document vector of all the proteins belonging to one class was averaged and 12 queries with averaged vector were prepared for testing. The similarity coefficient (SC) was then compared between query and all the proteins of the dataset. This approach could successfully classify the query as belonging to the class from which it derived. To further validate the importance of document vector as novel attribute for classification, entire dataset of document vector was clustered to ten (10) clusters. Testing was then performed with similarity coefficient (SC) of the query with clusters obtained above. The allocation of cluster to the 12 query sequences followed the same pattern as done with relevant document search using inverted indexing approach. But clustering allocated the queries to only four (4) classes. Maximum number of query proteins (7 proteins or 58%) were found belonging to cluster 5.


2019 ◽  
pp. 74-77
Author(s):  
V. A. Fedorova ◽  
E. A. Efremov ◽  
I. A. Kolyagina

Currently, the use of traditional information retrieval methods for analyzing big data is becoming ineffective. Analysis and processing of a large amount of information require completely new conceptual solutions, one of which is Elasticsearch, a search engine based on the Lucene library. Elasticsearch uses the concept of inverted indexing to speed up searches when a list of all unique words is created for each document and a list of documents for each word. The paper considers the principles of the Elasticsearch search technology. The actual task is to analyze and identify the specific capabilities of the Elasticsearch system associated with the search and processing of large amounts of information. The paper also describes examples of the work of Elasticsearch, which will help professionals to solve problems inherent in the systems of relevant and personalized information retrieval.


2018 ◽  
Vol 78 (6) ◽  
pp. 7727-7747
Author(s):  
Kai Zhang ◽  
Wengang Zhou ◽  
Shaoyan Sun ◽  
Bin Li

Author(s):  
Shweta Malhotra ◽  
Mohammad Najmud Doja ◽  
Bashir Alam ◽  
Mansaf Alam

This article describes how data indexing plays a very crucial role in query processing. Systems based on traditional indexes like B-tree, R-tree, Bitmap, inverted indexing techniques are not suitable for efficient query evaluation as these systems are based on simple key-value pair and used only for point queries. In cloud data repositories, point queries are not sufficient for query as a cloud consists of multidimensional data. For multidimensional query processing, many techniques have been developed. In this article, a dynamic double layer indexing structure with the help of a Skipnet overlay for global indexing and an Octree index technique for local indexing has been proposed. It has been concluded from the experiments that Skipnet-Octree performs better than the previous double-layer indexing technique for complex queries.


2018 ◽  
pp. 1307-1321
Author(s):  
Vinh-Tiep Nguyen ◽  
Thanh Duc Ngo ◽  
Minh-Triet Tran ◽  
Duy-Dinh Le ◽  
Duc Anh Duong

Large-scale image retrieval has been shown remarkable potential in real-life applications. The standard approach is based on Inverted Indexing, given images are represented using Bag-of-Words model. However, one major limitation of both Inverted Index and Bag-of-Words presentation is that they ignore spatial information of visual words in image presentation and comparison. As a result, retrieval accuracy is decreased. In this paper, the authors investigate an approach to integrate spatial information into Inverted Index to improve accuracy while maintaining short retrieval time. Experiments conducted on several benchmark datasets (Oxford Building 5K, Oxford Building 5K+100K and Paris 6K) demonstrate the effectiveness of our proposed approach.


Sign in / Sign up

Export Citation Format

Share Document