scholarly journals Using Inverted Index for Fingerprint Search

2021 ◽  
Vol 12 (5) ◽  
Author(s):  
Johnny Marcos S. Soares ◽  
Luciano Barbosa ◽  
Paulo Antonio Leal Rego ◽  
Regis Pires Magalhães ◽  
Jose Antônio F. de Macêdo

Fingerprints are the most used biometric information for identifying people. With the increase in fingerprint data, indexing techniques are essential to perform an efficient search. In this work, we devise a solution that applies traditional inverted index, widely used in textual information retrieval, for fingerprint search. For that, it first converts fingerprints to text documents using techniques, such as Minutia Cylinder-Code and Locality-Sensitive Hashing, and then indexes them in inverted files. In the experimental evaluation, our approach obtained 0.42% of error rate with 10% of penetration rate in the FVC2002 DB1a data set, surpassing some established methods.

Author(s):  
Budi Yulianto ◽  
Widodo Budiharto ◽  
Iman Herwidiana Kartowisastro

Boolean Retrieval (BR) and Vector Space Model (VSM) are very popular methods in information retrieval for creating an inverted index and querying terms. BR method searches the exact results of the textual information retrieval without ranking the results. VSM method searches and ranks the results. This study empirically compares the two methods. The research utilizes a sample of the corpus data obtained from Reuters. The experimental results show that the required times to produce an inverted index by the two methods are nearly the same. However, a difference exists on the querying index. The results also show that the numberof generated indexes, the sizes of the generated files, and the duration of reading and searching an index are proportional with the file number in the corpus and thefile size.


2020 ◽  
Vol 10 (7) ◽  
pp. 2539 ◽  
Author(s):  
Toan Nguyen Mau ◽  
Yasushi Inoguchi

It is challenging to build a real-time information retrieval system, especially for systems with high-dimensional big data. To structure big data, many hashing algorithms that map similar data items to the same bucket to advance the search have been proposed. Locality-Sensitive Hashing (LSH) is a common approach for reducing the number of dimensions of a data set, by using a family of hash functions and a hash table. The LSH hash table is an additional component that supports the indexing of hash values (keys) for the corresponding data/items. We previously proposed the Dynamic Locality-Sensitive Hashing (DLSH) algorithm with a dynamically structured hash table, optimized for storage in the main memory and General-Purpose computation on Graphics Processing Units (GPGPU) memory. This supports the handling of constantly updated data sets, such as songs, images, or text databases. The DLSH algorithm works effectively with data sets that are updated with high frequency and is compatible with parallel processing. However, the use of a single GPGPU device for processing big data is inadequate, due to the small memory capacity of GPGPU devices. When using multiple GPGPU devices for searching, we need an effective search algorithm to balance the jobs. In this paper, we propose an extension of DLSH for big data sets using multiple GPGPUs, in order to increase the capacity and performance of the information retrieval system. Different search strategies on multiple DLSH clusters are also proposed to adapt our parallelized system. With significant results in terms of performance and accuracy, we show that DLSH can be applied to real-life dynamic database systems.


Author(s):  
Orabe Almanaseer

The huge volume of text documents available on the internet has made it difficult to find valuable information for specific users. In fact, the need for efficient applications to extract interested knowledge from textual documents is vitally important. This paper addresses the problem of responding to user queries by fetching the most relevant documents from a clustered set of documents. For this purpose, a cluster-based information retrieval framework was proposed in this paper, in order to design and develop a system for analysing and extracting useful patterns from text documents. In this approach, a preprocessing step is first performed to find frequent and high-utility patterns in the data set. Then a Vector Space Model (VSM) is performed to represent the dataset. The system was implemented through two main phases. In phase 1, the clustering analysis process is designed and implemented to group documents into several clusters, while in phase 2, an information retrieval process was implemented to rank clusters according to the user queries in order to retrieve the relevant documents from specific clusters deemed relevant to the query. Then the results are evaluated according to evaluation criteria. Recall and Precision (P@5, P@10) of the retrieved results. P@5 was 0.660 and P@10 was 0.655.


2021 ◽  
Vol 55 (1) ◽  
pp. 1-2
Author(s):  
Bhaskar Mitra

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks. We ground our contributions with a detailed survey of the growing body of neural IR literature [Mitra and Craswell, 2018]. Our key contribution towards improving the effectiveness of deep ranking models is developing the Duet principle [Mitra et al., 2017] which emphasizes the importance of incorporating evidence based on both patterns of exact term matches and similarities between learned latent representations of query and document. To efficiently retrieve from large collections, we develop a framework to incorporate query term independence [Mitra et al., 2019] into any arbitrary deep model that enables large-scale precomputation and the use of inverted index for fast retrieval. In the context of stochastic ranking, we further develop optimization strategies for exposure-based objectives [Diaz et al., 2020]. Finally, this dissertation also summarizes our contributions towards benchmarking neural IR models in the presence of large training datasets [Craswell et al., 2019] and explores the application of neural methods to other IR tasks, such as query auto-completion.


Author(s):  
Yuqian Xu ◽  
Mor Armony ◽  
Anindya Ghose

Social media platforms for healthcare services are changing how patients choose physicians. The digitization of healthcare reviews has been providing additional information to patients when choosing their physicians. On the other hand, the growing online information introduces more uncertainty among providers regarding the expected future demand and how different service features can affect patient decisions. In this paper, we derive various service-quality proxies from online reviews and show that leveraging textual information can derive useful operational measures to better understand patient choices. To do so, we study a unique data set from one of the leading appointment-booking websites in the United States. We derive from the text reviews the seven most frequently mentioned topics among patients, namely, bedside manner, diagnosis accuracy, waiting time, service time, insurance process, physician knowledge, and office environment, and then incorporate these service features into a random-coefficient choice model to quantify the economic values of these service-quality proxies. By introducing quality proxies from text reviews, we find the predictive power of patient choice increases significantly, for example, a 6%–12% improvement measured by mean squared error for both in-sample and out-of-sample tests. In addition, our estimation results indicate that contextual description may better characterize users’ perceived quality than numerical ratings on the same service feature. Broadly speaking, this paper shows how to incorporate textual information into an econometric model to understand patient choice in healthcare delivery. Our interdisciplinary approach provides a framework that combines machine learning and structural modeling techniques to advance the literature in empirical operations management, information systems, and marketing. This paper was accepted by David Simchi-Levi, operations management.


2015 ◽  
Vol 39 (6) ◽  
pp. 779-794 ◽  
Author(s):  
Mustafa Utku Özmen

Purpose – The purpose of this paper is to analyse users’ attitudes towards online information retrieval and processing. The aim is to identify the characteristics of information that better capture the attention of the users and to provide evidence for the information retrieval behaviour of the users by studying online photo archives as information units. Design/methodology/approach – The paper analyses a unique quasi-experimental data of photo archive access counts collected by the author from an online newspaper. In addition to access counts of each photo in 500 randomly chosen photo galleries, characteristics of the photo galleries are also recorded. Survival (duration) analysis is used in order to analyse the factors affecting the share of the photo gallery viewed by a certain proportion of the initial number of viewers. Findings – The results of the survival analysis indicate that users are impatient in case of longer photo galleries; they lose attention faster and stop viewing earlier when gallery length is uncertain; they are attracted by keywords and initial presentation and they give more credit to specific rather than general information categories. Practical implications – Results of the study offer applicable implications for information providers, especially on the online domain. In order to attract more attention, entities can engage in targeted information provision by taking into account people’s attitude towards information retrieval and processing as presented in this paper. Originality/value – This paper uses a unique data set in a quasi-experimental setting in order to identify the characteristics of online information that users are attracted to.


2017 ◽  
Vol 10 (2) ◽  
pp. 311-325
Author(s):  
Suruchi Chawla

The main challenge for effective web Information Retrieval(IR) is to infer the information need from user’s query and retrieve relevant documents. The precision of search results is low due to vague and imprecise user queries and hence could not retrieve sufficient relevant documents. Fuzzy set based query expansion deals with imprecise and vague queries for inferring user’s information need. Trust based web page recommendations retrieve search results according to the user’s information need. In this paper an algorithm is designed for Intelligent Information Retrieval using hybrid of Fuzzy set and Trust in web query session mining to perform Fuzzy query expansion for inferring user’s information need and trust is used for recommendation of web pages according to the user’s information need. Experiment was performed on the data set collected in domains Academics, Entertainment and Sports and search results confirm the improvement of precision.


Author(s):  
Humberto Oliveira Serra ◽  
Lucas Bezerra Maia ◽  
Alexis Salomon ◽  
Nigel da Silva Lima ◽  
Rubem de Sousa Silva ◽  
...  

Author(s):  
Misturah Adunni Alaran ◽  
AbdulAkeem Adesina Agboola ◽  
Adio Taofiki Akinwale ◽  
Olusegun Folorunso

The reality of human existence and their interactions with various things that surround them reveal that the world is imprecise, incomplete, vague, and even sometimes indeterminate. Neutrosophic logic is the only theory that attempts to unify all previous logics in the same global theoretical framework. Extracting data from a similar environment is becoming a problem as the volume of data keeps growing day-in and day-out. This chapter proposes a new neutrosophic string similarity measure based on the longest common subsequence (LCS) to address uncertainty in string information search. This new method has been compared with four other existing classical string similarity measure using wordlist as data set. The analyses show the performance of proposed neutrosophic similarity measure to be better than the existing in information retrieval task as the evaluation is based on precision, recall, highest false match, lowest true match, and separation.


Sign in / Sign up

Export Citation Format

Share Document