A data structure for information retrieval

The information system “Scientific Heritage of Russia” has been created in stages since 2007. Currently, the existing software does not meet the needs of the system and complicates its further development. It was decided to implement the new version of the software in the asp.net core crossplatform environment. The article describes the decisions made in the implementation of software and modernization of the data structure. Particular attention is paid to the development of information retrieval tools.

Download Full-text

Information Retrieval: Textual Indexing Using an Oriented Object Database

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v2.i1.pp205-214 ◽

2016 ◽

Vol 2 (1) ◽

pp. 205 ◽

Cited By ~ 1

Author(s):

Mohammed Erritali

Keyword(s):

Information Retrieval ◽

Data Structure ◽

Large Volume ◽

Information Needs ◽

Text Data ◽

Specific Data ◽

Inverted File ◽

Object Database ◽

Object Databases ◽

Oriented Object

The growth in the volume of text data such as books and articles in libraries for centuries has imposed to establish effective mechanisms to locate them. Early techniques such as abstraction, indexing and the use of classification categories have marked the birth of a new field of research called "Information Retrieval". Information Retrieval (IR) can be defined as the task of defining models and systems whose purpose is to facilitate access to a set of documents in electronic form (corpus) to allow a user to find the relevant ones for him, that is to say, the contents which matches with the information needs of the user. Most of the models of information retrieval use a specific data structure to index a corpus which is called "inverted file" or "reverse index". This inverted file collects information on all terms over the corpus documents specifying the identifiers of documents that contain the term in question, the frequency of each term in the documents of the corpus, the positions of the occurrences of the word. In this paper we use an oriented object database (db4o) instead of the inverted file, that is to say, instead to search a term in the inverted file, we will search it in the db4o database. The purpose of this work is to make a comparative study to see if the oriented object databases may be competing for the inverse index in terms of access speed and resource consumption using a large volume of data.

Download Full-text

From Frequencies to Vectors

Engines of Order ◽

10.5117/9789462986190_ch05 ◽

2020 ◽

Author(s):

Bernhard Rieder

Keyword(s):

Information Retrieval ◽

Data Structure ◽

Sentiment Analysis ◽

Full Text ◽

Text Documents ◽

Document Collections ◽

Intermediate Data ◽

Algorithmic Information ◽

Digital Objects ◽

Language Meaning

This chapter investigates early attempts in information retrieval to tackle the full text of document collections. Underpinning a large number of contemporary applications, from search to sentiment analysis, the concepts and techniques pioneered by Hans Peter Luhn, Gerard Salton, Karen Spärck Jones, and others involve particular framings of language, meaning, and knowledge. They also introduce some of the fundamental mathematical formalisms and methods running through information ordering, preparing the extension to digital objects other than text documents. The chapter discusses the considerable technical expressivity that comes out of the sprawling landscape of research and experimentation that characterizes the early decades of information retrieval. This includes the emergence of the conceptual construct and intermediate data structure that is fundamental to most algorithmic information ordering: the feature vector.

Download Full-text

A data structure for cognitive information retrieval

International Journal of Computer & Information Sciences ◽

10.1007/bf01108516 ◽

1972 ◽

Vol 1 (1) ◽

pp. 17-27 ◽

Cited By ~ 1

Author(s):

K. O. Biss ◽

R. T. Chien ◽

F. A. Stahl

Keyword(s):

Information Retrieval ◽

Data Structure ◽

Cognitive Information

Download Full-text

Development of the Software for the Electronic Library «Scientifi c Heritage of Russia»

Единое цифровое пространство научных знаний: проблемы и решения ◽

10.51218/978-5-4499-1905-2-2021-199-207 ◽

2021 ◽

Author(s):

K. P. Pogorelko

Keyword(s):

Information Retrieval ◽

Data Structure ◽

Electronic Library ◽

Scientific Heritage ◽

Database Structure

The data structure and software of the Scientific Heritage of Russia electronic library was created in 2007 and currently does not meet the needs of the system. The article describes the decisions made when implementing a new version of the software. These decisions affect both the organization of the database structure and the protocols for interacting with the system. Particular attention is paid to the development of information retrieval tools.

Download Full-text

Efficient Data Structures for Range Shortest Unique Substring Queries

Algorithms ◽

10.3390/a13110276 ◽

2020 ◽

Vol 13 (11) ◽

pp. 276

Author(s):

Paniz Abedin ◽

Arnab Ganguly ◽

Solon P. Pissis ◽

Sharma V. Thankachan

Keyword(s):

Information Retrieval ◽

Data Structure ◽

Data Structures ◽

Query Time ◽

Geometric Data ◽

Small Constant ◽

Efficient Data ◽

Online Queries ◽

Efficient Data Structures ◽

Unique Substrings

Let T[1,n] be a string of length n and T[i,j] be the substring of T starting at position i and ending at position j. A substring T[i,j] of T is a repeat if it occurs more than once in T; otherwise, it is a unique substring of T. Repeats and unique substrings are of great interest in computational biology and information retrieval. Given string T as input, the Shortest Unique Substring problem is to find a shortest substring of T that does not occur elsewhere in T. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over T answering the following type of online queries efficiently. Given a range [α,β], return a shortest substring T[i,j] of T with exactly one occurrence in [α,β]. We present an O(nlogn)-word data structure with O(logwn) query time, where w=Ω(logn) is the word size. Our construction is based on a non-trivial reduction allowing for us to apply a recently introduced optimal geometric data structure [Chan et al., ICALP 2018]. Additionally, we present an O(n)-word data structure with O(nlogϵn) query time, where ϵ>0 is an arbitrarily small constant. The latter data structure relies heavily on another geometric data structure [Nekrich and Navarro, SWAT 2012].

Download Full-text

Demonstration of the Use of Tape-Recorded Information from a High Vacuum Electron Microscope

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100063007 ◽

1969 ◽

Vol 27 ◽

pp. 218-219

Author(s):

Richard E. Hartman ◽

Roberta S. Hartman ◽

Peter L. Ramos

Keyword(s):

Information Retrieval ◽

Electron Microscope ◽

Magnetic Tape ◽

High Vacuum ◽

Data Retrieval ◽

Electronic Information ◽

Electronic Data ◽

Subsequent Processing ◽

Continuous Record

We have long felt that some form of electronic information retrieval would be more desirable than conventional photographic methods in a high vacuum electron microscope for various reasons. The most obvious of these is the fact that with electronic data retrieval the major source of gas load is removed from the instrument. An equally important reason is that if any subsequent analysis of the data is to be made, a continuous record on magnetic tape gives a much larger quantity of data and gives it in a form far more satisfactory for subsequent processing.

Download Full-text

Stain contamination and embedding in electron microscopy

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100141986 ◽

1986 ◽

Vol 44 ◽

pp. 50-53

Author(s):

Hilton H. Mollenhauer

Keyword(s):

Electron Microscopy ◽

Information Retrieval ◽

Epoxy Resins ◽

Image Contrast ◽

Sample Preservation ◽

Factors Associated ◽

Amount Of Information ◽

Electron Microscopical ◽

Contrast Information

Many factors (e.g., resolution of microscope, type of tissue, and preparation of sample) affect electron microscopical images and alter the amount of information that can be retrieved from a specimen. Of interest in this report are those factors associated with the evaluation of epoxy embedded tissues. In this context, informational retrieval is dependant, in part, on the ability to “see” sample detail (e.g., contrast) and, in part, on tue quality of sample preservation. Two aspects of this problem will be discussed: 1) epoxy resins and their effect on image contrast, information retrieval, and sample preservation; and 2) the interaction between some stains commonly used for enhancing contrast and information retrieval.

Download Full-text

Information and Dose in the Scanning Transmission Ion Microscope

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s1431927600001021 ◽

1980 ◽

Vol 38 ◽

pp. 232-233

Author(s):

Fox T. R. ◽

R. Levi-Setti

Keyword(s):

Information Retrieval ◽

Electron Microscope ◽

Elastic Scattering ◽

Energy Loss ◽

Cross Sections ◽

Previous Analysis ◽

Single Scattering ◽

Screening Length ◽

Relative Damage ◽

Scanning Transmission

At an earlier meeting [1], we discussed information retrieval in the scanning transmission ion microscope (STIM) compared with the electron microscope at the same energy. We treated elastic scattering contrast, using total elastic cross sections; relative damage was estimated from energy loss data. This treatment is valid for “thin” specimens, where the incident particles suffer only single scattering. Since proton cross sections exceed electron cross sections, a given specimen (e.g., 1 μg/cm2 of carbon at 25 keV) may be thin for electrons but “thick” for protons. Therefore, we now extend our previous analysis to include multiple scattering. Our proton results are based on the calculations of Sigmund and Winterbon [2], for 25 keV protons on carbon, using a Thomas-Fermi screened potential with a screening length of 0.0226 nm. The electron results are from Crewe and Groves [3] at 30 keV.

Download Full-text