TREE BASED INDEXES VERSUS BITMAP INDEXES: A PERFORMANCE STUDY

Data warehouses are used to store large amounts of data which is often used for On-Line Analytical Processing (OLAP). Short response times are essential for on-line decision support. Common approaches to reach this goal in read-mostly environments are the precomputation of materialized views and the use of index structures. This paper focuses on the use of index structures for supporting fast access to data. The performance of index structures depends on many different parameters. Here, we focus on a set of nine parameters. Two approaches are presented to support the decision making process which index structure should be applied. The first approach is based on classification trees. The second approach uses an aggregation and scatter diagram method. Both approaches are applied to four distinct index structures: a tree-based index structure without aggregated data, a tree-based index structure with aggregated data and two bitmap index structures. This paper presents results of the comparison with both approaches.

Download Full-text

Efficient Retrieval of Music Recordings Using Graph-Based Index Structures

Signals ◽

10.3390/signals2020021 ◽

2021 ◽

Vol 2 (2) ◽

pp. 336-352

Author(s):

Frank Zalkow ◽

Julian Brandner ◽

Meinard Müller

Keyword(s):

Nearest Neighbor ◽

Response Times ◽

Negative Impact ◽

Nearest Neighbor Search ◽

Index Structure ◽

Search Problem ◽

Index Structures ◽

Music Retrieval ◽

Retrieval Systems ◽

Music Recordings

Flexible retrieval systems are required for conveniently browsing through large music collections. In a particular content-based music retrieval scenario, the user provides a query audio snippet, and the retrieval system returns music recordings from the collection that are similar to the query. In this scenario, a fast response from the system is essential for a positive user experience. For realizing low response times, one requires index structures that facilitate efficient search operations. One such index structure is the K-d tree, which has already been used in music retrieval systems. As an alternative, we propose to use a modern graph-based index, denoted as Hierarchical Navigable Small World (HNSW) graph. As our main contribution, we explore its potential in the context of a cross-version music retrieval application. In particular, we report on systematic experiments comparing graph- and tree-based index structures in terms of the retrieval quality, disk space requirements, and runtimes. Despite the fact that the HNSW index provides only an approximate solution to the nearest neighbor search problem, we demonstrate that it has almost no negative impact on the retrieval quality in our application. As our main result, we show that the HNSW-based retrieval is several orders of magnitude faster. Furthermore, the graph structure also works well with high-dimensional index items, unlike the tree-based structure. Given these merits, we highlight the practical relevance of the HNSW graph for music information retrieval (MIR) applications.

Download Full-text

Optimized execution method for queries with materialized views: Design and implementation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202821 ◽

2021 ◽

pp. 1-15

Author(s):

Abhijeet R. Raipurkar ◽

Manoj B. Chandak

Keyword(s):

Simulated Annealing ◽

Data Storage ◽

Response Times ◽

Materialized Views ◽

Power Calculation ◽

Energy Aware ◽

On Line ◽

Database Operations ◽

The Impact ◽

And Storage

A query application for On-Line Analytical Processing (OLAP) examines various kinds of data stored in a Data Warehouse (DW). There have been no systematic studies that look at the impact of query optimizations on performance and energy consumption in relational and NoSQL databases. Indeed, due to a lack of precise power calculation techniques in various databases and queries, the energy activity of several basic database operations is mostly unknown, as are the queries themselves, which are very complicated, extensive, and exploratory. As a result of the rapidly growing size of the DW system, query response times are regularly increasing. To improve decision-making performance, the response time of such queries should be as short as possible. To resolve these issues, multiple materialized views from individual database tables have been collected, and queries have been handled. Similarly, due to overall maintenance and storage expenses, as well as the selection of an optimal view set to increase the data storage facility’s efficacy, materializing all conceivable views is not viable. Thus, to overcome these issues, this paper proposed the method of energy-aware query optimization and processing, on materialized views using enhanced simulated annealing (EAQO-ESA). This work was carried out in four stages. First, a Simulated Annealing (SA) based meta-heuristic approach was used to pre-process the query and optimize the scheduling performance. Second, the optimal sets of views were materialized, resulting in enhanced query response efficiency. Third, the authors assessed the performance of the query execution time and computational complexity with and without optimization. Finally, based on processing time, efficiency, and computing cost, the system’s performance was validated and compared to the traditional technique.

Download Full-text

On-line Tropical Epidemiology -A Case-Study from The Gambia

Methods of Information in Medicine ◽

10.1055/s-0038-1635490 ◽

1987 ◽

Vol 26 (02) ◽

pp. 73-76 ◽

Cited By ~ 3

Author(s):

Kathryn Rowan ◽

P. Byass ◽

R. W. Snow

Keyword(s):

Data Collection ◽

Field Trial ◽

Error Rates ◽

Database Management System ◽

Bed Nets ◽

Completion Rates ◽

On Line ◽

Access To Data ◽

Rapid Processing

SummaryThis paper reports on a computerised approach to the management of an epidemiological field trial, which aimed at determining the effects of insecticide-impregnated bed nets on the incidence of malaria in children. The development of a data system satisfying the requirements of the project and its implementation using a database management system are discussed. The advantages of this method of management in terms of rapid processing of and access to data from the study are described, together with the completion rates and error rates observed in data collection.

Download Full-text

Selective caching: a persistent memory approach for multi-dimensional index structures

Distributed and Parallel Databases ◽

10.1007/s10619-021-07327-0 ◽

2021 ◽

Author(s):

Muhammad Attahir Jibril ◽

Philipp Götze ◽

David Broneske ◽

Kai-Uwe Sattler

Keyword(s):

Main Memory ◽

Index Structure ◽

Index Structures ◽

Cloud Infrastructure ◽

General Technique ◽

Persistent Memory ◽

The Cost ◽

Cloud Applications ◽

Memory Layout ◽

Analytical Index

AbstractAfter the introduction of Persistent Memory in the form of Intel’s Optane DC Persistent Memory on the market in 2019, it has found its way into manifold applications and systems. As Google and other cloud infrastructure providers are starting to incorporate Persistent Memory into their portfolio, it is only logical that cloud applications have to exploit its inherent properties. Persistent Memory can serve as a DRAM substitute, but guarantees persistence at the cost of compromised read/write performance compared to standard DRAM. These properties particularly affect the performance of index structures, since they are subject to frequent updates and queries. However, adapting each and every index structure to exploit the properties of Persistent Memory is tedious. Hence, we require a general technique that hides this access gap, e.g., by using DRAM caching strategies. To exploit Persistent Memory properties for analytical index structures, we propose selective caching. It is based on a mixture of dynamic and static caching of tree nodes in DRAM to reach near-DRAM access speeds for index structures. In this paper, we evaluate selective caching on the OLAP-optimized main-memory index structure Elf, because its memory layout allows for an easy caching. Our experiments show that if configured well, selective caching with a suitable replacement strategy can keep pace with pure DRAM storage of Elf while guaranteeing persistence. These results are also reflected when selective caching is used for parallel workloads.

Download Full-text

Electronic Access to Factual Materials Information: The State of the Art

MRS Bulletin ◽

10.1557/s0883769400045097 ◽

1995 ◽

Vol 20 (8) ◽

pp. 40-48 ◽

Cited By ~ 5

Author(s):

J.H. Westbrook ◽

J.G. Kaufman ◽

F. Cverna

Keyword(s):

State Of The Art ◽

The State ◽

Easy Access ◽

End User ◽

Cd Rom ◽

Component Design ◽

Property Data ◽

On Line ◽

Electronic Access ◽

Access To Data

Over the past 30 years we have seen a strong but uncoordinated effort to both increase the availability of numeric materials-property data in electronic media and to make the resultant mass of data more readily accessible and searchable for the end-user engineer. The end user is best able to formulate the question and to judge the utility of the answer for numeric property data inquiries, in contrast to textual or bibliographic data for which information specialists can expeditiously carry out searches.Despite the best efforts of several major programs, there remains a shortfall with respect to comprehensiveness and a gap between the goal of easy access to all the world's numeric databases and what can presently be achieved. The task has proven thornier and therefore much more costly than anyone envisioned, and computer access to data for materials scientists and engineers is still inadequate compared, for example, to the situation for molecular biologists or astronomers. However, progress has been made. More than 100 materials databases are listed and categorized by Wawrousek et al. that address several types of applications including: fundamental research, materials selection, component design, process control, materials identification and equivalency, expert systems, and education. Standardization is improving and access has been made more easy.In the discussion that follows, we will examine several characteristics of available information and delivery systems to assess their impact on the successes and limitations of the available products. The discussion will include the types and uses of the data, issues around data reliability and quality, the various formats in which data need to be accessed, and the various media available for delivery. Then we will focus on the state of the art by giving examples of the three major media through which broad electronic access to numeric properties has emerged: on-line systems, workstations, and disks, both floppy and CD-ROM. We will also cite some resources of where to look for numeric property data.

Download Full-text

Sentence processing in language-impaired children under conditions of filtering and time compression

Applied Psycholinguistics ◽

10.1017/s0142716400007050 ◽

1995 ◽

Vol 16 (2) ◽

pp. 137-154 ◽

Cited By ~ 37

Author(s):

Rachel E. Stark ◽

James W. Montgomery

Keyword(s):

Sentence Processing ◽

High Frequency ◽

Response Times ◽

Language Impaired ◽

Monitoring Task ◽

Time Compression ◽

Lower Accuracy ◽

On Line ◽

Impaired Children ◽

Monitoring Performance

ABSTRACTNineteen language-impaired (LI) and 20 language-normal (LN) children participated in an on-line word-monitoring task. Words were presented in lists and in sentences readily comprehended by younger children. The sentences were unaltered, tow-pass filtered, and time- compressed. Both groups had shorter mean response times (MRTs), but lower accuracy, for words in sentences than words in lists. The LI children had significantly longer MRTs under sentence conditions and lower accuracy overall than the LN children. Filtering had an adverse effect upon accuracy and MRT for both subject groups. Time compression did not, suggesting that the reduction in high-frequency information and the rate of presentation exert different effects. Subject differences in attention, as well as in linguistic competence and motor control, may have influenced word-monitoring performance.

Download Full-text

Adapting materialized views after redefinitions: techniques and a performance study

Information Systems ◽

10.1016/s0306-4379(01)00024-2 ◽

2001 ◽

Vol 26 (5) ◽

pp. 323-362 ◽

Cited By ~ 24

Author(s):

Ashish Gupta ◽

Inderpal S. Mumick ◽

Jun Rao ◽

Kenneth A. Ross

Keyword(s):

Materialized Views ◽

Performance Study ◽

A Performance

Download Full-text

Extraction-Transformation-Loading Processes

Encyclopedia of Database Technologies and Applications ◽

10.4018/978-1-59140-560-3.ch041 ◽

2005 ◽

pp. 240-245 ◽

Cited By ~ 1

Author(s):

Alkis Simitsis ◽

Panos Vassiliadis ◽

Timos Sellis

Keyword(s):

Lower Layer ◽

Knowledge Worker ◽

Database Systems ◽

Historical Record ◽

Data Sources ◽

Aggregated Data ◽

Executive Manager ◽

On Line ◽

Multidimensional Data Structures ◽

Relational Database Systems

A data warehouse (DW) is a collection of technologies aimed at enabling the knowledge worker (executive, manager, analyst, etc.) to make better and faster decisions. The architecture of a DW exhibits various layers of data in which data from one layer are derived from data of the lower layer (see Figure 1). The operational databases, also called data sources, form the starting layer. They may consist of structured data stored in open database and legacy systems, or even in files. The central layer of the architecture is the global DW. The global DW keeps a historical record of data that result from the transformation, integration, and aggregation of detailed data found in the data sources. An auxiliary area of volatile data, data staging area (DSA) is employed for the purpose of data transformation, reconciliation, and cleaning. The next layer of data involves client warehouses, which contain highly aggregated data, directly derived from the global warehouse. There are various kinds of local warehouses, such as data mart or on-line analytical processing (OLAP) databases, which may use relational database systems or specific multidimensional data structures. The whole environment is described in terms of its components, metadata, and processes in a central metadata repository, located at the DW site.

Download Full-text

Efficient Computation of Data Cubes and Aggregate Views

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch080 ◽

2011 ◽

pp. 421-426

Author(s):

Leonardo Tininini

Keyword(s):

Response Times ◽

Materialized Views ◽

Data Warehouses ◽

Efficient Computation ◽

Huge Amount ◽

Data Cubes ◽

Multidimensional Database ◽

Efficient Calculation ◽

Fundamental Requirement

This paper reviews the main techniques for the efficient calculation of aggregate multidimensional views and data cubes, possibly using specifically designed indexing structures. The efficient evaluation of aggregate multidimensional queries is obviously one of the most important aspects in data warehouses (OLAP systems). In particular, a fundamental requirement of such systems is the ability to perform multidimensional analyses in online response times. As multidimensional queries usually involve a huge amount of data to be aggregated, the only way to achieve this is by pre-computing some queries, storing the answers permanently in the database and reusing these almost exclusively when evaluating queries in the multidimensional database. These pre-computed queries are commonly referred to as materialized views and carry several related issues, particularly how to efficiently compute them (the focus of this paper), but also which views to materialize and how to maintain them.

Download Full-text

Language context effects on interlingual homograph recognition: evidence from event-related potentials and response times in semantic priming

Bilingualism Language and Cognition ◽

10.1017/s1366728901000256 ◽

2001 ◽

Vol 4 (2) ◽

pp. 155-168 ◽

Cited By ~ 49

Author(s):

Ellen R. A. de Bruijn ◽

Ton Dijkstra ◽

Dorothee J. Chwilla ◽

Herbert J. Schriefers

Keyword(s):

Semantic Priming ◽

Context Effects ◽

Response Times ◽

Event Related Potentials ◽

Semantic Integration ◽

Decision Task ◽

Priming Effects ◽

Bottom Up ◽

Related Potentials ◽

On Line

Dutch–English bilinguals performed a generalized lexical decision task on triplets of items, responding with “yes” if all three items were correct Dutch and/or English words, and with “no” if one or more of the items was not a word in either language. Sometimes the second item in a triplet was an interlingual homograph whose English meaning was semantically related to the third item of the triplet (e.g., HOUSE – ANGEL – HEAVEN, where ANGEL means “sting” in Dutch). In such cases, the first item was either an exclusively English (HOUSE) or an exclusively Dutch (ZAAK) word. Semantic priming effects were found in on-line response times. Event-related potentials that were recorded simultaneously showed N400 priming effects thought to reflect semantic integration processes. The response time and N400 priming effects were not affected by the language of the first item in the triplets, providing evidence in support of a strong bottom-up role with respect to bilingual word recognition. The results are interpreted in terms of the Bilingual Interactive Activation model, a language nonselective access model assuming bottom-up priority.

Download Full-text