Time-stratified sampling for approximate answers to aggregate queries

Approximate Processing for Medical Record Linking and Multidatabase Analysis

Medical Informatics ◽

10.4018/978-1-60566-050-9.ch167 ◽

2011 ◽

pp. 2203-2217

Author(s):

Qing Zhang

Keyword(s):

Query Processing ◽

Medical Record ◽

Approximate Query Processing ◽

Related Data ◽

Multidatabase Systems ◽

Aggregate Queries ◽

Health Related ◽

Approximate Query ◽

Approximate Answers ◽

Query Planning

In this article we investigate how approximate query processing (AQP) can be used in medical multidatabase systems. We identify two areas where this estimation technique will be of use. First, approximate query processing can be used to preprocess medical record linking in the multidatabase. Second, approximate answers can be given for aggregate queries. In the case of multidatabase systems used to link health and health related data sources, preprocessing can be used to find records related to the same patient. This may be the first step in the linking strategy. If the aim is to gather aggregate statistics, then the approximate answers may be enough to provide the required answers. At least they may provide initial answers to encourage further investigation. This estimation may also be used for general query planning and optimization, important in multidatabase systems. In this article we propose two techniques for the estimation. These techniques enable synopses of component local databases to be precalculated and then used for obtaining approximate results for linking records and for aggregate queries. The synopses are constructed with restrictions on the storage space. We report on experiments which show that good approximate results can be obtained in a much shorter time than performing the exact query.

Download Full-text

Models and Techniques for Approximate Queries in OLAP

Encyclopedia of Information Science and Technology, Second Edition ◽

10.4018/978-1-60566-026-4.ch425 ◽

2011 ◽

pp. 2665-2670

Author(s):

Alfredo Cuzzocrea

Keyword(s):

Decision Support ◽

Query Processing ◽

Trend Analysis ◽

Computational Cost ◽

The Other ◽

Qualitative Trend Analysis ◽

Aggregate Queries ◽

Processing Algorithms ◽

Approximate Answers ◽

Customer Services

Since the size of the underlying data warehouse server (DWS) is usually very large, response time needed for computing queries is the main issue in decision support systems (DSS). Business analysis is the main application field in the context of DSS, as well as OLAP queries being the most useful ones; in fact, these queries allow us to support different kinds of analysis based on a multi-resolution and a multi-dimensional view of the data. By performing OLAP queries, business analysts can efficiently extract summarized knowledge, by means of SQL aggregation operators, from very large repositories of data like those stored in massive DWSs. Then, the extracted knowledge is exploited to support decisions in strategic fields of the target business, thus efficiently taking advantage from the amenity of exploring and mining massive data via OLAP technologies. The negative aspect of such an approach is just represented by the size of the data, which is enormous, currently being tera-bytes and peta-bytes the typical orders of data magnitude for enterprise DWSs, and, as a consequence, data processing costs are explosive. Despite the complexity and the resource-intensiveness of processing OLAP queries against massive DWSs, client-side systems performing OLAP and data mining, the most common application interfaces versus DWSs, are often characterized by small amount of memory, small computational capability, and customized tools with interactive, graphical user interface supporting qualitative, trend analysis. For instance, consider the context of retail systems. Here, managers and analysts are very often more interested in the product-sale plot in a fixed time window rather than to know the sale of a particular product in a particular day of the year. In others words, managers and analysts are more interested in the trend analysis rather than in the punctual, quantitative analysis, which is, indeed, more proper for OLTP systems. This consideration makes it more convent and efficient to compute approximate answers rather than exact answers. In fact, typical decision-support queries can be very resource intensive in terms of spatial and temporal computational needs. Obviously, the other issue that must be faced is the accuracy of the answers, as providing fast and totally wrong answers is deleterious. All considering, the key is proving fast, exploratory answers with some guarantees on their degree of approximation. On the other hand, in the last few years, DSS have become very popular: for example, sales transaction databases, call detail repositories, customer services historical data, and so forth. As a consequence, providing fast, even if approximate, answers to aggregate queries has become a tight requirement to make DSS-based applications efficient, and, thus, has been addressed in research in the vest of the so-called approximate query answering (AQA) techniques. Furthermore, in such data warehousing environments, executing multi-steps, query-processing algorithms is particularly hard because the computational cost for accessing multi-dimensional data would be enormous. Therefore, the most important issues for enabling DSS-based applications are: (1) minimizing the time complexity of query processing algorithms by decreasing the number of the needed disk I/Os, and (2) ensuring the quality of the approximate answers with respect to the exact ones by providing some guarantees on the accuracy of the approximation. Nevertheless, proposals existent in literature devote little attention to the point (2), which is indeed critical for the investigated context.

Download Full-text

Fast approximate answers to aggregate queries on a data cube

Proceedings. Eleventh International Conference on Scientific and Statistical Database Management ◽

10.1109/ssdm.1999.787618 ◽

2003 ◽

Cited By ~ 35

Author(s):

V. Poosala ◽

V. Ganti

Keyword(s):

Data Cube ◽

Aggregate Queries ◽

Approximate Answers

Download Full-text

Allocation in stratified sampling based on preliminary tests of significance

10.31274/rtd-180813-4580 ◽

1971 ◽

Author(s):

Victor Kuang-Tao Tang

Keyword(s):

Stratified Sampling ◽

Tests Of Significance

Download Full-text

Stratified Sampling Applied to Estimation of the Volumetric Fraction of Alumina in Cast Metal Matrix Composites

Practical Metallography ◽

10.3139/147.110021 ◽

2009 ◽

Vol 46 (8) ◽

pp. 408-423 ◽

Cited By ~ 1

Author(s):

K. Ranieri ◽

A.F.B. Costa ◽

C. Kiyan

Keyword(s):

Metal Matrix Composites ◽

Metal Matrix ◽

Stratified Sampling ◽

Matrix Composites ◽

Volumetric Fraction ◽

Cast Metal

Download Full-text

Data Extraction Techniques for Spreadsheet Records

AIS Educator Journal ◽

10.3194/aise.2007.2.1.119 ◽

2007 ◽

Vol 2 (1) ◽

pp. 119-129 ◽

Cited By ~ 1

Author(s):

Mark G. Simkin

Keyword(s):

Data Extraction ◽

Stratified Sampling ◽

Specific Information ◽

Extraction Techniques ◽

Filtering Techniques

Abstract Many accounting applications use spreadsheets as repositories of accounting records, and a common requirement is the need to extract specific information from them. This paper describes a number of techniques that accountants can use to perform such tasks directly using common spreadsheet tools. These techniques include (1) simple and advanced filtering techniques, (2) database functions, (3) methods for both simple and stratified sampling, and, (4) tools for finding duplicate or unmatched records.

Download Full-text