A data structure for information retrieval

1970 ◽  
Vol 21 (2) ◽  
pp. 145-148
Author(s):  
J. W. Winings
2021 ◽  
Author(s):  
Konstantin Pogorelko

The information system “Scientific Heritage of Russia” has been created in stages since 2007. Currently, the existing software does not meet the needs of the system and complicates its further development. It was decided to implement the new version of the software in the asp.net core crossplatform environment. The article describes the decisions made in the implementation of software and modernization of the data structure. Particular attention is paid to the development of information retrieval tools.


Author(s):  
Mohammed Erritali

The growth in the volume of text data such as books and articles in libraries for centuries has imposed to establish effective mechanisms to locate them. Early techniques such as abstraction, indexing and the use of classification categories have marked the birth of a new field of research called "Information Retrieval". Information Retrieval (IR) can be defined as the task of defining models and systems whose purpose is to facilitate access to a set of documents in electronic form (corpus) to allow a user to find the relevant ones for him, that is to say, the contents which matches with the information needs of the user.  Most of the models of information retrieval use a specific data structure to index a corpus which is called "inverted file" or "reverse index". This inverted file collects information on all terms over the corpus documents specifying the identifiers of documents that contain the term in question, the frequency of each term in the documents of the corpus, the positions of the occurrences of the word. In this paper we use an oriented object database (db4o) instead of the inverted file, that is to say, instead to search a term in the inverted file, we will search it in the db4o database. The purpose of this work is to make a comparative study to see if the oriented object databases may be competing for the inverse index in terms of access speed and resource consumption using a large volume of data.


2020 ◽  
Author(s):  
Bernhard Rieder

This chapter investigates early attempts in information retrieval to tackle the full text of document collections. Underpinning a large number of contemporary applications, from search to sentiment analysis, the concepts and techniques pioneered by Hans Peter Luhn, Gerard Salton, Karen Spärck Jones, and others involve particular framings of language, meaning, and knowledge. They also introduce some of the fundamental mathematical formalisms and methods running through information ordering, preparing the extension to digital objects other than text documents. The chapter discusses the considerable technical expressivity that comes out of the sprawling landscape of research and experimentation that characterizes the early decades of information retrieval. This includes the emergence of the conceptual construct and intermediate data structure that is fundamental to most algorithmic information ordering: the feature vector.


Author(s):  
K. P. Pogorelko

The data structure and software of the Scientific Heritage of Russia electronic library was created in 2007 and currently does not meet the needs of the system. The article describes the decisions made when implementing a new version of the software. These decisions affect both the organization of the database structure and the protocols for interacting with the system. Particular attention is paid to the development of information retrieval tools.


Algorithms ◽  
2020 ◽  
Vol 13 (11) ◽  
pp. 276
Author(s):  
Paniz Abedin ◽  
Arnab Ganguly ◽  
Solon P. Pissis ◽  
Sharma V. Thankachan

Let T[1,n] be a string of length n and T[i,j] be the substring of T starting at position i and ending at position j. A substring T[i,j] of T is a repeat if it occurs more than once in T; otherwise, it is a unique substring of T. Repeats and unique substrings are of great interest in computational biology and information retrieval. Given string T as input, the Shortest Unique Substring problem is to find a shortest substring of T that does not occur elsewhere in T. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over T answering the following type of online queries efficiently. Given a range [α,β], return a shortest substring T[i,j] of T with exactly one occurrence in [α,β]. We present an O(nlogn)-word data structure with O(logwn) query time, where w=Ω(logn) is the word size. Our construction is based on a non-trivial reduction allowing for us to apply a recently introduced optimal geometric data structure [Chan et al., ICALP 2018]. Additionally, we present an O(n)-word data structure with O(nlogϵn) query time, where ϵ>0 is an arbitrarily small constant. The latter data structure relies heavily on another geometric data structure [Nekrich and Navarro, SWAT 2012].


Author(s):  
Richard E. Hartman ◽  
Roberta S. Hartman ◽  
Peter L. Ramos

We have long felt that some form of electronic information retrieval would be more desirable than conventional photographic methods in a high vacuum electron microscope for various reasons. The most obvious of these is the fact that with electronic data retrieval the major source of gas load is removed from the instrument. An equally important reason is that if any subsequent analysis of the data is to be made, a continuous record on magnetic tape gives a much larger quantity of data and gives it in a form far more satisfactory for subsequent processing.


Author(s):  
Hilton H. Mollenhauer

Many factors (e.g., resolution of microscope, type of tissue, and preparation of sample) affect electron microscopical images and alter the amount of information that can be retrieved from a specimen. Of interest in this report are those factors associated with the evaluation of epoxy embedded tissues. In this context, informational retrieval is dependant, in part, on the ability to “see” sample detail (e.g., contrast) and, in part, on tue quality of sample preservation. Two aspects of this problem will be discussed: 1) epoxy resins and their effect on image contrast, information retrieval, and sample preservation; and 2) the interaction between some stains commonly used for enhancing contrast and information retrieval.


Author(s):  
Fox T. R. ◽  
R. Levi-Setti

At an earlier meeting [1], we discussed information retrieval in the scanning transmission ion microscope (STIM) compared with the electron microscope at the same energy. We treated elastic scattering contrast, using total elastic cross sections; relative damage was estimated from energy loss data. This treatment is valid for “thin” specimens, where the incident particles suffer only single scattering. Since proton cross sections exceed electron cross sections, a given specimen (e.g., 1 μg/cm2 of carbon at 25 keV) may be thin for electrons but “thick” for protons. Therefore, we now extend our previous analysis to include multiple scattering. Our proton results are based on the calculations of Sigmund and Winterbon [2], for 25 keV protons on carbon, using a Thomas-Fermi screened potential with a screening length of 0.0226 nm. The electron results are from Crewe and Groves [3] at 30 keV.


Sign in / Sign up

Export Citation Format

Share Document