Database Principles and Challenges in Text Analysis

2021 ◽  
Vol 50 (2) ◽  
pp. 6-17
Author(s):  
Johannes Doleschal ◽  
Benny Kimelfeld ◽  
Wim Martens

A common conceptual view of text analysis is that of a two-step process, where we first extract relations from text documents and then apply a relational query over the result. Hence, text analysis shares technical challenges with, and can draw ideas from, relational databases. A framework that formally instantiates this connection is that of the document spanners. In this article, we review recent advances in various research efforts that adapt fundamental database concepts to text analysis through the lens of document spanners. Among others, we discuss aspects of query evaluation, aggregate queries, provenance, and distributed query planning.

2021 ◽  
Author(s):  
César E. Montiel Olea ◽  
Leonardo R. Corral

Project Completion Reports (PCRs) are the main instrument through which different multilateral organizations measure the success of a project once it closes. PCRs are important for development effectiveness as they serve to understand achievements, failures, and challenges within the project cycle they can feed back into the design and execution of new projects. The aim of this paper is to introduce text analysis tools for the exploration of PCR documents. We describe and apply different text analysis tools to explore the content of a sample of PCRs. We seek to illustrate a way in which PCRs can be summarized and analyzed using innovative tools applied to a unique dataset. We believe that the methods presented in this investigation have numerous potential applications to different types of text documents routinely prepared within the Inter-American Development Bank (IDB).


1996 ◽  
Vol 3 (41) ◽  
Author(s):  
Stefan Dziembowski

We study complexity of the evaluation of fixpoint bounded-variable<br />queries in relational databases. We exhibit a finite database such<br />that the problem whether a closed fixpoint formula using only 2 individual variables is satisfied in this database is PSPACE-complete. This clarifies the issues raised by Moshe Vardi in [Var95]. We study also the complexity of query evaluation for a number of restrictions of fixpoint logic. In particular we exhibit a sublogic for which the upper bound postulated by Vardi holds.


Author(s):  
Norman May ◽  
Guido Moerkotte

Early approaches to XQuery processing proposed proprietary techniques to optimize and evaluate XQuery statements. In this chapter, the authors argue for an algebraic optimization and evaluation technique for XQuery as it allows us to benefit from experience gained with relational databases. An algebraic XQuery processing method requires a translation into an algebra representation. While many publications already exist on algebraic optimizations and evaluation techniques for XQuery, an assessment of translation techniques is required. Consequently, they give a comprehensive survey for translating XQuery into various query representations. The authors relate these approaches to the way normalization and translation is implemented in Natix and discuss these two steps in detail. In their experience, their translation method is a good basis for further optimizations and query evaluation.


2011 ◽  
pp. 2203-2217
Author(s):  
Qing Zhang

In this article we investigate how approximate query processing (AQP) can be used in medical multidatabase systems. We identify two areas where this estimation technique will be of use. First, approximate query processing can be used to preprocess medical record linking in the multidatabase. Second, approximate answers can be given for aggregate queries. In the case of multidatabase systems used to link health and health related data sources, preprocessing can be used to find records related to the same patient. This may be the first step in the linking strategy. If the aim is to gather aggregate statistics, then the approximate answers may be enough to provide the required answers. At least they may provide initial answers to encourage further investigation. This estimation may also be used for general query planning and optimization, important in multidatabase systems. In this article we propose two techniques for the estimation. These techniques enable synopses of component local databases to be precalculated and then used for obtaining approximate results for linking records and for aggregate queries. The synopses are constructed with restrictions on the storage space. We report on experiments which show that good approximate results can be obtained in a much shorter time than performing the exact query.


Author(s):  
Min Song ◽  
Il-Yeol Song ◽  
Xiaohua Hu ◽  
Hyoil Han

Information extraction (IE) technology has been defined and developed through the US DARPA Message Understanding Conferences (MUCs). IE refers to the identification of instances of particular events and relationships from unstructured natural language text documents into a structured representation or relational table in databases. It has proved successful at extracting information from various domains, such as the Latin American terrorism, to identify patterns related to terrorist activities (MUC-4). Another domain, in the light of exploiting the wealth of natural language documents, is to extract the knowledge or information from these unstructured plain-text files into a structured or relational form. This form is suitable for sophisticated query processing, for integration with relational databases, and for data mining. Thus, IE is a crucial step for fully making text files more easily accessible.


2018 ◽  
Vol 32 (4) ◽  
pp. e3941 ◽  
Author(s):  
Alard Roebroeck ◽  
Karla L. Miller ◽  
Manisha Aggarwal

2019 ◽  
Vol 7 (12) ◽  
pp. 6596-6615 ◽  
Author(s):  
Masud Rana ◽  
Ming Li ◽  
Xia Huang ◽  
Bin Luo ◽  
Ian Gentle ◽  
...  

Different classes of coating materials with their functional groups and mechanism of interaction with PSs.


Sign in / Sign up

Export Citation Format

Share Document