Index and Materialized View Selection in Data Warehouses

Database management systems (DBMSs) require an administrator whose principal tasks are data management, both at the logical and physical levels, as well as performance optimization. With the wide development of databases and data warehouses, minimizing the administration function is crucial. This function includes the selection of suitable physical structures to improve system performance. View materialization and indexing are presumably some of the most effective optimization techniques adopted in relational implementations of data warehouses. Materialized views are physical structures that improve data access time by precomputing intermediary results. Therefore, end-user queries can be efficiently processed through data stored in views and do not need to access the original data. Indexes are also physical structures that allow direct data access. They avoid sequential scans and thereby reduce query response time. Nevertheless, these solutions require additional storage space and entail maintenance overhead. The issue is then to select an appropriate configuration of materialized views and indexes that minimizes both query response time and maintenance cost given a limited storage space. This problem is NP hard (Gupta & Mumick, 2005).

Download Full-text

Query Performance Optimization in XML Data Warehouses

E-Strategies for Resource Management Systems ◽

10.4018/978-1-61692-016-6.ch014 ◽

2010 ◽

pp. 232-253

Author(s):

Hadj Mahboubi ◽

Jérôme Darmont

Keyword(s):

Decision Support ◽

Data Warehouse ◽

Performance Optimization ◽

Database Management ◽

Optimization Techniques ◽

Materialized Views ◽

Complex Data ◽

Data Warehouses ◽

Xml Data ◽

Xml Database

XML data warehouses form an interesting basis for decision-support applications that exploit complex data. However, native-XML database management systems (DBMSs) currently bear limited performances and it is necessary to research for ways to optimize them. In this chapter, the authors present two such techniques. First, they propose an XML join index that is specifically adapted to the multidimensional architecture of XML warehouses. It eliminates join operations while preserving the information contained in the original warehouse. Second, the authors present a strategy for selecting XML materialized views by clustering the query workload. To validate these proposals, the authors measure the response time of a set of decision-support XQueries over an XML data warehouse, with and without using their optimization techniques. The authors’ experimental results demonstrate their efficiency, even when queries are complex and data are voluminous.

Download Full-text

Cost Models for Selecting Materialized Views in Public Clouds

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2014100101 ◽

2014 ◽

Vol 10 (4) ◽

pp. 1-25 ◽

Cited By ~ 5

Author(s):

Romain Perriot ◽

Jérémy Pfeifer ◽

Laurent d'Orazio ◽

Bruno Bachelet ◽

Sandro Bimonte ◽

...

Keyword(s):

Response Time ◽

Data Structures ◽

Performance Optimization ◽

Optimization Problem ◽

Materialized Views ◽

Cost Models ◽

Public Cloud ◽

Total Response ◽

Query Response Time ◽

Total Response Time

Data warehouse performance is usually achieved through physical data structures such as indexes or materialized views. In this context, cost models can help select a relevant set of such performance optimization structures. Nevertheless, selection becomes more complex in the cloud. The criterion to optimize is indeed at least two-dimensional, with monetary cost balancing overall query response time. This paper introduces new cost models that fit into the pay-as-you-go paradigm of cloud computing. Based on these cost models, an optimization problem is defined to discover, among candidate views, those to be materialized to minimize both the overall cost of using and maintaining the database in a public cloud and the total response time of a given query workload. It experimentally shows that maintaining materialized views is always advantageous, both in terms of performance and cost.

Download Full-text

Materialized View Selection in the Data Warehouse

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.29-32.1133 ◽

2010 ◽

Vol 29-32 ◽

pp. 1133-1138 ◽

Cited By ~ 1

Author(s):

Li Juan Zhou ◽

Hai Jun Geng ◽

Ming Sheng Xu

Keyword(s):

Decision Support ◽

Data Warehouse ◽

Materialized Views ◽

Storage Space ◽

View Selection ◽

Materialized View ◽

Query Response Time ◽

Materialized View Selection ◽

Optimal Efficiency ◽

The Cost

A data warehouse stores materialized views of data from one or more sources, with the purpose of efficiently implementing decision-support or OLAP queries. Materialized view selection is one of the crucial decisions in designing a data warehouse for optimal efficiency. The goal is to select an appropriate set of views that minimizes sum of the query response time and the cost of maintaining the selected views, given a limited amount of resource, e.g., materialization time, storage space, etc. In this article, we present an improved PGA algorithm to accomplish the view selection problem; the experiments show that our proposed algorithm shows it’s superior.

Download Full-text

Thermal Conductivity Measurements and Modeling of Phase-Change GST Materials

ASME/JSME 2007 Thermal Engineering Heat Transfer Summer Conference, Volume 1 ◽

10.1115/ht2007-32830 ◽

2007 ◽

Author(s):

Yizhang Yang ◽

Taehee Jeong ◽

Hendrik F. Hamann ◽

Jimmy Zhu ◽

Mehdi Asheghi

Keyword(s):

Thermal Conductivity ◽

Phase Change ◽

Data Storage ◽

Flash Memory ◽

Performance Optimization ◽

Phase Change Materials ◽

Memory Cell ◽

Data Access ◽

Optical Recording ◽

Access Time

Phase-change technology has been widely used in rewritable disks for optical recording applications. Recently, it has also received attention as a candidate for future high storage density non-volatile random access memory, due to its much longer cycle life (∼1013) and fast data access time (∼100ns) compared with the existing Flash memory technology. In this paper, we present thermal conductivity data and models for phase-change GeSbTe material that would be helpful in performance optimization and improvement in the reliability (i.e., enhancement of data rate, cyclability, control of mark-edge jitter) of phase-change-based data storage devices and systems. We perform the thermal characterization of Ge4Sb1Te5 and Ge2Sb2Te5 phase-change materials for the application of optical recording and phase-change memory cell using the techniques of thermoreflectance and electrical resistance thermometry. The limits of lattice and electronic thermal conductivities are investigated to determine their relative contributions as a function of tellurium concentration at different crystalline structures.

Download Full-text

MATERIALIZED VIEWS QUANTUM OPTIMIZED PICKING for INDEPENDENT DATA MARTS QUALITY

Iraqi Journal of Information & Communications Technology ◽

10.31987/ijict.3.1.88 ◽

2020 ◽

Vol 3 (1) ◽

pp. 26-39

Author(s):

Refed Adnan ◽

Talib M. J. Abbas

Keyword(s):

Response Time ◽

Data Warehouse ◽

Maintenance Cost ◽

Materialized Views ◽

Stochastic Algorithm ◽

Independent Data ◽

Materialized View ◽

Query Response Time ◽

Data Marts ◽

Better Than

Particular and timely unified information along with quick and effective query response times is the basic fundamental requirement for the success of any collection of independent data marts (data warehouse) which forms Fact Constellation Schema or Galaxy Schema. Because of the materialized view storage area, the materialization of all views is practically impossible thus suitable materialized views (MVs) picking is one of the intelligent decisions in designing a Fact Constellation Schema to get optimal efficiency. This study presents a framework for picking best-materialized view using Quantum Particle Swarm Optimization (QPSO) algorithm where it is one of the stochastic algorithm in order to achieve the effective combination of good query response time, low query handling cost and low view maintenance cost. The results reveals that the proposed method for picking best-materialized view using QPSO algorithm is better than other techniques via computing the ratio of query response time and compare it to the response time of the same queries on the materialized views. Ratio of implementing the query on the base table takes five times more time than the query implementation on the materialized views. Where the response time of queries through MVs access were found 0.084 seconds while by direct access queries were found 0.422 seconds. This outlines that the performance of query through materialized views access is 402.38% better than those directly access via data warehouse-logical.

Download Full-text

On the Efficiency of Querying and Storing RDF Documents

Advances in Data Mining and Database Management - Graph Data Management ◽

10.4018/978-1-61350-053-8.ch016 ◽

2011 ◽

pp. 354-385

Author(s):

Maria-Esther Vidal ◽

Amadís Martínez ◽

Edna Ruckhaus ◽

Tomas Lampo ◽

Javier Sierra

Keyword(s):

Query Processing ◽

Execution Time ◽

Time Complexity ◽

Low Cost ◽

Data Access ◽

Optimization Techniques ◽

Experimental Results ◽

Access Time ◽

Alternative Representation ◽

Rdf Data

In the context of the Semantic Web, different approaches have been defined to represent RDF documents, and the selected representation affects storage and time complexity of the RDF data recovery and query processing tasks. This chapter addresses the problem of efficiently querying and storing RDF documents, and presents an alternative representation of RDF data, Bhyper, which is based on hypergraphs. Additionally, access and optimization techniques to efficiently execute queries with low cost, are defined on top of this hypergraph based representation. The chapter’s authors have empirically studied the performance of the Bhyper based techniques, and their experimental results show that the proposed hypergraph based formalization reduces the RDF data access time as well as the space needed to store the Bhyper structures, while the query execution time of state-the-of-art RDF engines can be sped up by up to two orders of magnitude.

Download Full-text

Multi-Objective Big Data View Materialization Using NSGA-II

Information Resources Management Journal ◽

10.4018/irmj.2021040101 ◽

2021 ◽

Vol 34 (2) ◽

pp. 1-28

Author(s):

Akshay Kumar ◽

T. V. Vijay Kumar

Keyword(s):

Big Data ◽

Response Time ◽

Unstructured Data ◽

Materialized Views ◽

Nsga Ii ◽

View Selection ◽

Multi Objective ◽

Query Response Time ◽

View Materialization ◽

Data View

Big data views, in the context of distributed file system (DFS), are defined over structured, semi-structured and unstructured data that are voluminous in nature with the purpose to reduce the response time of queries over Big data. As the size of semi-structured and unstructured data in Big data is very large compared to structured data, a framework based on query attributes on Big data can be used to identify Big data views. Materializing Big data views can enhance the query response time and facilitate efficient distribution of data over the DFS based application. Given all the Big data views cannot be materialized, therefore, a subset of Big data views should be selected for materialization. The purpose of view selection for materialization is to improve query response time subject to resource constraints. The Big data view materialization problem was defined as a bi-objective problem with the two objectives- minimization of query evaluation cost and minimization of the update processing cost, with a constraint on the total size of the materialized views. This problem is addressed in this paper using multi-objective genetic algorithm NSGA-II. The experimental results show that proposed NSGA-II based Big data view selection algorithm is able to select reasonably good quality views for materialization.

Download Full-text

Optimasi Kinerja Web Menggunakan Application-Level Cache di Sisi Server dan Browser

Respati ◽

10.35842/jtir.v13i1.217 ◽

2018 ◽

Vol 13 (1) ◽

Author(s):

Widhiarta Widhiarta ◽

Arief Setyanto ◽

Ferry Wahyu Wibowo

Keyword(s):

Response Time ◽

Performance Optimization ◽

Web Application ◽

Web Server ◽

Access Time ◽

Web Performance ◽

Server Side ◽

Web Cache ◽

Cache Configuration ◽

Load Average

INTISARIPenelitian ini bertujuan untuk melakukan optimasi kinerja web menggunakan application-level cache di sisi server dan browser. Penelitian ini disusun menggunakan 2 buah VPS 1 core, memori RAM 512MB, harddisk 40GB masing-masing untuk server web dan basis data, web server Apache 2.4 dengan PHP 7.1, basis data MariaDB v.10 dengan rekayasa 20 tabel dan 10 juta tupel. Pengambilan sampel menggunakan perulangan 5x dengan kombinasi tingkat kueri dan tingkat konkurensi yang berbeda. Data dikumpulkan menggunakan aplikasi Apica Zebra Tester. Hasil analisis data menunjukkan kombinasi konfigurasi cache memiliki pengaruh yang berbeda terhadap kinerja web. Tanpa cache, kecepatan waktu akses web melambat drastis hingga 27.078,91 milidetik pada 50 konkurensi akses dan perulangan 100 kueri dengan hasil 100.000 data/kueri dengan jeda waktu 5 detik per konkurensi.Hasil penelitian membuktikan bahwa konfigurasi cache di sisi browser memiliki pengaruh peningkatan kecepatan waktu akses rata-rata 79,61% dan penurunan beban CPU 80,83% tidak stabil ketika konkurensi akses dilakukan dengan profil browser berbeda. Konfigurasi cache di sisi server memiliki pengaruh peningkatan kecepatan waktu akses rata-rata 79,83% dan penurunan beban CPU 79,88%, stabil ketika konkurensi akses dilakukan dengan profil browser berbeda. Konfigurasi cache di sisi server dan browser memiliki peningkatan pengaruh kecepatan waktu akses rata-rata tertinggi 80,07% dan penurunan beban CPU tertinggi 82,64%, sangat stabil ketika konkurensi akses dilakukan dengan profil browser berbeda. Hasil uji membuktikan, konfigurasi application-level cache paling optimal menggunakan gabungan konfigurasi cache di sisi server dan browser. Kata Kunci : optimasi kinerja web, application-level cache, web cache, cache di sisi browser, cache di sisi serverABSTRACTThis research intends to optimizing web performance using application-level cache on server-side and browser-side. This research was arranged using 2 VPS with 1 core processor, 512MB RAM, 40GB SSD, Apache 2.4 web server with PHP 7.1, MariaDB v.10 database with 20 tables and 10 million tuples. Sampling in this research using 5x loop with various query-level dan qonqurrency level.. Data were collected using Apica Zebra Tester application. Data analysis result shows the combination of cache configurations have different effects on web performance. Without cache, web access time speeds slowed dramatically to 27,078.91 milliseconds on 50 access concurrencies and 100 queries recurring with 100,000 data/query with of 5 seconds delay per concurrency. The results show the browser-side cache configuration effect has 79,61% increasing response time access average and 80,83% decrease CPU load average, unstable when the concurrency access is done with different browser profiles. The server-side cache configuration effect has 79,83% increasing response time access average and 79,88% decrease CPU load average, stable when concurrency access is made with different browser profiles. The server-side and browser-side cache configuration effect has 80,07% increasing response time access average and 82,64% decrease CPU load average, very stable when concurrency access is performed with different browser profiles. The test results prove optimal application-level cache configuration uses a combination of server-side and browser-side. Keyword : web performance optimization, application-level cache, web cache, browser-side cache, server-side cache

Download Full-text

Decreasing the Miss Rate and Eliminating the Performance Penalty of a Data Filter Cache

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3449043 ◽

2021 ◽

Vol 18 (3) ◽

pp. 1-22

Author(s):

Michael Stokes ◽

David Whalley ◽

Soner Onder

Keyword(s):

Energy Efficient ◽

Data Access ◽

Performance Degradation ◽

Access Time ◽

Data Cache ◽

Energy Usage ◽

Single Cycle ◽

Performance Penalty

While data filter caches (DFCs) have been shown to be effective at reducing data access energy, they have not been adopted in processors due to the associated performance penalty caused by high DFC miss rates. In this article, we present a design that both decreases the DFC miss rate and completely eliminates the DFC performance penalty even for a level-one data cache (L1 DC) with a single cycle access time. First, we show that a DFC that lazily fills each word in a DFC line from an L1 DC only when the word is referenced is more energy-efficient than eagerly filling the entire DFC line. For a 512B DFC, we are able to eliminate loads of words into the DFC that are never referenced before being evicted, which occurred for about 75% of the words in 32B lines. Second, we demonstrate that a lazily word filled DFC line can effectively share and pack data words from multiple L1 DC lines to lower the DFC miss rate. For a 512B DFC, we completely avoid accessing the L1 DC for loads about 23% of the time and avoid a fully associative L1 DC access for loads 50% of the time, where the DFC only requires about 2.5% of the size of the L1 DC. Finally, we present a method that completely eliminates the DFC performance penalty by speculatively performing DFC tag checks early and only accessing DFC data when a hit is guaranteed. For a 512B DFC, we improve data access energy usage for the DTLB and L1 DC by 33% with no performance degradation.

Download Full-text

Materialized views and data warehouses

ACM SIGMOD Record ◽

10.1145/273244.273253 ◽

1998 ◽

Vol 27 (1) ◽

pp. 21-26 ◽

Cited By ~ 48

Author(s):

Nick Roussopoulos

Keyword(s):

Materialized Views ◽

Data Warehouses

Download Full-text