A Practical q -Gram Index for Text Retrieval Allowing Errors

Gonzalo Navarro; Ricardo Baeza-Yates

doi:10.19153/cleiej.1.2.3

A Practical q -Gram Index for Text Retrieval Allowing Errors

CLEI electronic journal ◽

10.19153/cleiej.1.2.3 ◽

2018 ◽

Vol 1 (2) ◽

Cited By ~ 12

Author(s):

Gonzalo Navarro ◽

Ricardo Baeza-Yates

Keyword(s):

Natural Language ◽

Text Retrieval ◽

Search Pattern ◽

Natural Language Text ◽

Time Space ◽

On Line ◽

Text Searching ◽

Space Requirements ◽

Indexing Technique ◽

Language Text

We propose an indexing technique for approximate text searching, which is practical and powerful, and especially optimized for natural language text. Unlike other indices of this kind, it is able to retrieve any string that approximately matches the search pattern, not only words. Every text substring of a fixed length q is stored in the index, together with pointers to all the text positions where it appears. The search pattern is partitioned into pieces which are searched in the index, and all their occurrences in the text are verified for a complete match. To reduce space requirements, pointers to blocks instead of exact positions can be used, which increases querying costs. We design an algorithm to optimize the pattern partition into pieces so that the total number of verifications is minimized. This is especially well suited for natural language texts, and allows to know in advance the expected cost of the search and the expected relevance of the query to the user. We show experimentally the building time, space requirements and querying time of our index, finding that it is a practical alternative for text retrieval. The retrieval times are reduced from 10% to 60% of the best on-line algorithm.

Download Full-text

Document representation in natural language text retrieval

Proceedings of the workshop on Human Language Technology - HLT '94 ◽

10.3115/1075812.1075896 ◽

1994 ◽

Cited By ~ 1

Author(s):

Tomek Strzalkowski

Keyword(s):

Natural Language ◽

Text Retrieval ◽

Document Representation ◽

Natural Language Text ◽

Language Text

Download Full-text

Social Marketplace Monitoring and Sentiment Analysis

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1952136 ◽

2019 ◽

pp. 127-133

Author(s):

P. Monisha ◽

R. Rubanya ◽

N. Malarvizhi

Keyword(s):

Natural Language ◽

Language Processing ◽

Natural Language Text ◽

Domain Specific ◽

Keyboard Input ◽

On Line ◽

The Given ◽

Domain Independent ◽

Language Text ◽

Candidate Feature

The overwhelming majority of existing approaches to opinion feature extraction trust mining patterns for one review corpus, ignoring the nontrivial disparities in word spacing characteristics of opinion options across completely different corpora. During this research a unique technique to spot opinion options from on-line reviews by exploiting the distinction in opinion feature statistics across two corpora, one domain-specific corpus (i.e., the given review corpus) and one domain-independent corpus (i.e., the contrasting corpus). The tendency to capture this inequality called domain relevance (DR), characterizes the relevancy of a term to a text assortment. The tendency to extract an inventory of candidate opinion options from the domain review corpus by shaping a group of grammar dependence rules. for every extracted candidate feature, to have a tendency to estimate its intrinsic-domain relevancy (IDR) and extrinsic-domain relevance(EDR) scores on the domain-dependent and domain-independent corpora, severally. Natural language processing (NLP) refers to computer systems that analyze, attempt understand, or produce one or more human languages, such as English, Japanese, Italian, or Russian. Process information contained in natural language text. The input might be text, spoken language, or keyboard input. The field of NLP is primarily concerned with getting computers to perform useful and interesting tasks with human languages. The field of NLP is secondarily concerned with helping us come to a better understanding of human language

Download Full-text

Natural Language Text Retrieval Using a Large Semantic Network

10.21236/ada274319 ◽

1993 ◽

Author(s):

Paul Nelson

Keyword(s):

Natural Language ◽

Semantic Network ◽

Text Retrieval ◽

Natural Language Text ◽

Language Text

Download Full-text

Morality Classification in Natural Language Text

IEEE Transactions on Affective Computing ◽

10.1109/taffc.2020.3034050 ◽

2020 ◽

pp. 1-1

Author(s):

Matheus C. Pavan ◽

Vitor G. Santos ◽

Alex G. J. Lan ◽

Joao Martins ◽

Wesley Ramos Santos ◽

...

Keyword(s):

Natural Language ◽

Natural Language Text ◽

Language Text

Download Full-text

Accurate fact harvesting from natural language text in wikipedia with Lector

Proceedings of the 19th International Workshop on Web and Databases - WebDB '16 ◽

10.1145/2932194.2932203 ◽

2016 ◽

Cited By ~ 2

Author(s):

Matteo Cannaviccio ◽

Denilson Barbosa ◽

Paolo Merialdo

Keyword(s):

Natural Language ◽

Natural Language Text ◽

Language Text

Download Full-text

Generation of Natural Language Text using Perspective Descriptor in Frames

IETE Journal of Research ◽

10.1080/03772063.2001.11416202 ◽

2001 ◽

Vol 47 (1-2) ◽

pp. 43-57

Author(s):

G V Uma ◽

T V Geetha

Keyword(s):

Natural Language ◽

Natural Language Text ◽

Language Text

Download Full-text

Word-based self-indexes for natural language text

ACM Transactions on Information Systems ◽

10.1145/2094072.2094073 ◽

2012 ◽

Vol 30 (1) ◽

pp. 1-34 ◽

Cited By ~ 27

Author(s):

Antonio Fariña ◽

Nieves R. Brisaboa ◽

Gonzalo Navarro ◽

Francisco Claude ◽

Ángeles S. Places ◽

...

Keyword(s):

Natural Language ◽

Natural Language Text ◽

Language Text

Download Full-text

Process Model Generation from Natural Language Text

Notes on Numerical Fluid Mechanics and Multidisciplinary Design - Active Flow and Combustion Control 2018 ◽

10.1007/978-3-642-21640-4_36 ◽

2011 ◽

pp. 482-496 ◽

Cited By ~ 62

Author(s):

Fabian Friedrich ◽

Jan Mendling ◽

Frank Puhlmann

Keyword(s):

Natural Language ◽

Process Model ◽

Model Generation ◽

Natural Language Text ◽

Language Text

Download Full-text

Wordform as the main basis for analysis of natural language text

Informatization and communication ◽

10.34219/2078-8320-2021-12-2-101-108 ◽

2021 ◽

pp. 101-108

Author(s):

S.G. Antonov

Keyword(s):

Natural Language ◽

Natural Language Text ◽

Probability Approach ◽

Correction Problem ◽

Main Basis ◽

Language Corpus ◽

Language Text

In the article discuss the application aspects of wordforms of natural language text for decision the mistakes correction problem. Discuss the merits and demerits of two known approaches for decision – deterministic and based on probabilities/ Construction principles of natural language corpus described, wich apply in probability approach. Declare conclusion about necessity of complex using these approaches in dependence on properties of texts.

Download Full-text

A Review on Question Generation from Natural Language Text

ACM Transactions on Information Systems ◽

10.1145/3468889 ◽

2022 ◽

Vol 40 (1) ◽

pp. 1-43

Author(s):

Ruqing Zhang ◽

Jiafeng Guo ◽

Lu Chen ◽

Yixing Fan ◽

Xueqi Cheng

Keyword(s):

Natural Language ◽

Question Answering ◽

Data Augmentation ◽

Text Structure ◽

Current Status ◽

Question Generation ◽

Natural Language Text ◽

Question Answering Systems ◽

The Right ◽

Language Text

Question generation is an important yet challenging problem in Artificial Intelligence (AI), which aims to generate natural and relevant questions from various input formats, e.g., natural language text, structure database, knowledge base, and image. In this article, we focus on question generation from natural language text, which has received tremendous interest in recent years due to the widespread applications such as data augmentation for question answering systems. During the past decades, many different question generation models have been proposed, from traditional rule-based methods to advanced neural network-based methods. Since there have been a large variety of research works proposed, we believe it is the right time to summarize the current status, learn from existing methodologies, and gain some insights for future development. In contrast to existing reviews, in this survey, we try to provide a more comprehensive taxonomy of question generation tasks from three different perspectives, i.e., the types of the input context text, the target answer, and the generated question. We take a deep look into existing models from different dimensions to analyze their underlying ideas, major design principles, and training strategies We compare these models through benchmark tasks to obtain an empirical understanding of the existing techniques. Moreover, we discuss what is missing in the current literature and what are the promising and desired future directions.

Download Full-text