Database Principles and Challenges in Text Analysis

A common conceptual view of text analysis is that of a two-step process, where we first extract relations from text documents and then apply a relational query over the result. Hence, text analysis shares technical challenges with, and can draw ideas from, relational databases. A framework that formally instantiates this connection is that of the document spanners. In this article, we review recent advances in various research efforts that adapt fundamental database concepts to text analysis through the lens of document spanners. Among others, we discuss aspects of query evaluation, aggregate queries, provenance, and distributed query planning.

Download Full-text

Text Analysis of Project Completion Reports

10.18235/0003611 ◽

2021 ◽

Author(s):

César E. Montiel Olea ◽

Leonardo R. Corral

Keyword(s):

Text Analysis ◽

Text Documents ◽

Project Completion ◽

Analysis Tools ◽

Development Effectiveness ◽

Different Types ◽

Potential Applications ◽

Project Cycle ◽

Main Instrument ◽

Unique Dataset

Project Completion Reports (PCRs) are the main instrument through which different multilateral organizations measure the success of a project once it closes. PCRs are important for development effectiveness as they serve to understand achievements, failures, and challenges within the project cycle they can feed back into the design and execution of new projects. The aim of this paper is to introduce text analysis tools for the exploration of PCR documents. We describe and apply different text analysis tools to explore the content of a sample of PCRs. We seek to illustrate a way in which PCRs can be summarized and analyzed using innovative tools applied to a unique dataset. We believe that the methods presented in this investigation have numerous potential applications to different types of text documents routinely prepared within the Inter-American Development Bank (IDB).

Download Full-text

Recent Advances in Attacks, Technical Challenges, Vulnerabilities and Their Countermeasures in Wireless Sensor Networks

Wireless Personal Communications ◽

10.1007/s11277-017-4962-0 ◽

2017 ◽

Vol 98 (2) ◽

pp. 2037-2077 ◽

Cited By ~ 51

Author(s):

Bharat Bhushan ◽

Gadadhar Sahoo

Keyword(s):

Wireless Sensor Networks ◽

Sensor Networks ◽

Wireless Sensor ◽

Recent Advances ◽

Technical Challenges

Download Full-text

Distributed Query Evaluation over Large RDF Graphs

Web and Big Data - Lecture Notes in Computer Science ◽

10.1007/978-3-030-33982-1_1 ◽

2019 ◽

pp. 3-7

Author(s):

Peng Peng

Keyword(s):

Query Evaluation ◽

Distributed Query ◽

Rdf Graphs

Download Full-text

The Fixpoint Bounded-Variable Queries are PSPACE-Complete

BRICS Report Series ◽

10.7146/brics.v3i41.20023 ◽

1996 ◽

Vol 3 (41) ◽

Cited By ~ 1

Author(s):

Stefan Dziembowski

Keyword(s):

Upper Bound ◽

Relational Databases ◽

Query Evaluation ◽

Individual Variables

We study complexity of the evaluation of fixpoint bounded-variable<br />queries in relational databases. We exhibit a finite database such<br />that the problem whether a closed fixpoint formula using only 2 individual variables is satisfied in this database is PSPACE-complete. This clarifies the issues raised by Moshe Vardi in [Var95]. We study also the complexity of query evaluation for a number of restrictions of fixpoint logic. In particular we exhibit a sublogic for which the upper bound postulated by Vardi holds.

Download Full-text

Normalization and Translation of XQuery

Advanced Applications and Structures in XML Processing ◽

10.4018/978-1-61520-727-5.ch013 ◽

2010 ◽

pp. 283-307

Author(s):

Norman May ◽

Guido Moerkotte

Keyword(s):

Relational Databases ◽

Processing Method ◽

Query Evaluation ◽

Evaluation Technique ◽

Good Basis ◽

Algebra Representation ◽

Translation Techniques ◽

Comprehensive Survey ◽

Algebraic Optimization ◽

Evaluation Techniques

Early approaches to XQuery processing proposed proprietary techniques to optimize and evaluate XQuery statements. In this chapter, the authors argue for an algebraic optimization and evaluation technique for XQuery as it allows us to benefit from experience gained with relational databases. An algebraic XQuery processing method requires a translation into an algebra representation. While many publications already exist on algebraic optimizations and evaluation techniques for XQuery, an assessment of translation techniques is required. Consequently, they give a comprehensive survey for translating XQuery into various query representations. The authors relate these approaches to the way normalization and translation is implemented in Natix and discuss these two steps in detail. In their experience, their translation method is a good basis for further optimizations and query evaluation.

Download Full-text

Recent Advances in the Web PKI and the Technical Challenges in SCMS

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Applied Cryptography in Computer and Communications ◽

10.1007/978-3-030-80851-8_11 ◽

2021 ◽

pp. 155-167

Author(s):

Yunkun Wu ◽

Xiaokun Zhang ◽

Yajun Teng ◽

Zhenya Liu ◽

Liang Huang ◽

...

Keyword(s):

Recent Advances ◽

Technical Challenges ◽

The Web

Download Full-text

Approximate Processing for Medical Record Linking and Multidatabase Analysis

Medical Informatics ◽

10.4018/978-1-60566-050-9.ch167 ◽

2011 ◽

pp. 2203-2217

Author(s):

Qing Zhang

Keyword(s):

Query Processing ◽

Medical Record ◽

Approximate Query Processing ◽

Related Data ◽

Multidatabase Systems ◽

Aggregate Queries ◽

Health Related ◽

Approximate Query ◽

Approximate Answers ◽

Query Planning

In this article we investigate how approximate query processing (AQP) can be used in medical multidatabase systems. We identify two areas where this estimation technique will be of use. First, approximate query processing can be used to preprocess medical record linking in the multidatabase. Second, approximate answers can be given for aggregate queries. In the case of multidatabase systems used to link health and health related data sources, preprocessing can be used to find records related to the same patient. This may be the first step in the linking strategy. If the aim is to gather aggregate statistics, then the approximate answers may be enough to provide the required answers. At least they may provide initial answers to encourage further investigation. This estimation may also be used for general query planning and optimization, important in multidatabase systems. In this article we propose two techniques for the estimation. These techniques enable synopses of component local databases to be precalculated and then used for obtaining approximate results for linking records and for aggregate queries. The synopses are constructed with restrictions on the storage space. We report on experiments which show that good approximate results can be obtained in a much shorter time than performing the exact query.

Download Full-text

Information Extraction in Biomedical Literature

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch116 ◽

2011 ◽

pp. 615-620

Author(s):

Min Song ◽

Il-Yeol Song ◽

Xiaohua Hu ◽

Hyoil Han

Keyword(s):

Natural Language ◽

Information Extraction ◽

Latin American ◽

Relational Databases ◽

Biomedical Literature ◽

Text Documents ◽

Plain Text ◽

Natural Language Text ◽

Relational Form ◽

Structured Representation

Information extraction (IE) technology has been defined and developed through the US DARPA Message Understanding Conferences (MUCs). IE refers to the identification of instances of particular events and relationships from unstructured natural language text documents into a structured representation or relational table in databases. It has proved successful at extracting information from various domains, such as the Latin American terrorism, to identify patterns related to terrorist activities (MUC-4). Another domain, in the light of exploiting the wealth of natural language documents, is to extract the knowledge or information from these unstructured plain-text files into a structured or relational form. This form is suitable for sophisticated query processing, for integration with relational databases, and for data mining. Thus, IE is a crucial step for fully making text files more easily accessible.

Download Full-text