scholarly journals Index and Materialized View Selection in Data Warehouses

Author(s):  
Kamel Aouiche ◽  
Jérôme Darmont

Database management systems (DBMSs) require an administrator whose principal tasks are data management, both at the logical and physical levels, as well as performance optimization. With the wide development of databases and data warehouses, minimizing the administration function is crucial. This function includes the selection of suitable physical structures to improve system performance. View materialization and indexing are presumably some of the most effective optimization techniques adopted in relational implementations of data warehouses. Materialized views are physical structures that improve data access time by precomputing intermediary results. Therefore, end-user queries can be efficiently processed through data stored in views and do not need to access the original data. Indexes are also physical structures that allow direct data access. They avoid sequential scans and thereby reduce query response time. Nevertheless, these solutions require additional storage space and entail maintenance overhead. The issue is then to select an appropriate configuration of materialized views and indexes that minimizes both query response time and maintenance cost given a limited storage space. This problem is NP hard (Gupta & Mumick, 2005).

Author(s):  
Hadj Mahboubi ◽  
Jérôme Darmont

XML data warehouses form an interesting basis for decision-support applications that exploit complex data. However, native-XML database management systems (DBMSs) currently bear limited performances and it is necessary to research for ways to optimize them. In this chapter, the authors present two such techniques. First, they propose an XML join index that is specifically adapted to the multidimensional architecture of XML warehouses. It eliminates join operations while preserving the information contained in the original warehouse. Second, the authors present a strategy for selecting XML materialized views by clustering the query workload. To validate these proposals, the authors measure the response time of a set of decision-support XQueries over an XML data warehouse, with and without using their optimization techniques. The authors’ experimental results demonstrate their efficiency, even when queries are complex and data are voluminous.


2014 ◽  
Vol 10 (4) ◽  
pp. 1-25 ◽  
Author(s):  
Romain Perriot ◽  
Jérémy Pfeifer ◽  
Laurent d'Orazio ◽  
Bruno Bachelet ◽  
Sandro Bimonte ◽  
...  

Data warehouse performance is usually achieved through physical data structures such as indexes or materialized views. In this context, cost models can help select a relevant set of such performance optimization structures. Nevertheless, selection becomes more complex in the cloud. The criterion to optimize is indeed at least two-dimensional, with monetary cost balancing overall query response time. This paper introduces new cost models that fit into the pay-as-you-go paradigm of cloud computing. Based on these cost models, an optimization problem is defined to discover, among candidate views, those to be materialized to minimize both the overall cost of using and maintaining the database in a public cloud and the total response time of a given query workload. It experimentally shows that maintaining materialized views is always advantageous, both in terms of performance and cost.


2010 ◽  
Vol 29-32 ◽  
pp. 1133-1138 ◽  
Author(s):  
Li Juan Zhou ◽  
Hai Jun Geng ◽  
Ming Sheng Xu

A data warehouse stores materialized views of data from one or more sources, with the purpose of efficiently implementing decision-support or OLAP queries. Materialized view selection is one of the crucial decisions in designing a data warehouse for optimal efficiency. The goal is to select an appropriate set of views that minimizes sum of the query response time and the cost of maintaining the selected views, given a limited amount of resource, e.g., materialization time, storage space, etc. In this article, we present an improved PGA algorithm to accomplish the view selection problem; the experiments show that our proposed algorithm shows it’s superior.


Author(s):  
Yizhang Yang ◽  
Taehee Jeong ◽  
Hendrik F. Hamann ◽  
Jimmy Zhu ◽  
Mehdi Asheghi

Phase-change technology has been widely used in rewritable disks for optical recording applications. Recently, it has also received attention as a candidate for future high storage density non-volatile random access memory, due to its much longer cycle life (∼1013) and fast data access time (∼100ns) compared with the existing Flash memory technology. In this paper, we present thermal conductivity data and models for phase-change GeSbTe material that would be helpful in performance optimization and improvement in the reliability (i.e., enhancement of data rate, cyclability, control of mark-edge jitter) of phase-change-based data storage devices and systems. We perform the thermal characterization of Ge4Sb1Te5 and Ge2Sb2Te5 phase-change materials for the application of optical recording and phase-change memory cell using the techniques of thermoreflectance and electrical resistance thermometry. The limits of lattice and electronic thermal conductivities are investigated to determine their relative contributions as a function of tellurium concentration at different crystalline structures.


2020 ◽  
Vol 3 (1) ◽  
pp. 26-39
Author(s):  
Refed Adnan ◽  
Talib M. J. Abbas

Particular and timely unified information along with quick and effective query response times is the basic fundamental requirement for the success of any collection of independent data marts (data warehouse) which forms Fact Constellation Schema or Galaxy Schema. Because of the materialized view storage area, the materialization of all views is practically impossible thus suitable materialized views (MVs) picking is one of the intelligent decisions in designing a Fact Constellation Schema to get optimal efficiency. This study presents a framework for picking best-materialized view using Quantum Particle Swarm Optimization (QPSO) algorithm where it is one of the stochastic algorithm in order to achieve the effective combination of good query response time, low query handling cost and low view maintenance cost. The results reveals that the proposed method for picking best-materialized view using QPSO algorithm is better than other techniques via computing the ratio of query response time and compare it to the response time of the same queries on the materialized views. Ratio of implementing the query on the base table takes five times more time than the query implementation on the materialized views. Where the response time of queries through MVs access were found 0.084 seconds while by direct access queries were found 0.422 seconds. This outlines that the performance of query through materialized views access is 402.38% better than those directly access via data warehouse-logical.


Author(s):  
Maria-Esther Vidal ◽  
Amadís Martínez ◽  
Edna Ruckhaus ◽  
Tomas Lampo ◽  
Javier Sierra

In the context of the Semantic Web, different approaches have been defined to represent RDF documents, and the selected representation affects storage and time complexity of the RDF data recovery and query processing tasks. This chapter addresses the problem of efficiently querying and storing RDF documents, and presents an alternative representation of RDF data, Bhyper, which is based on hypergraphs. Additionally, access and optimization techniques to efficiently execute queries with low cost, are defined on top of this hypergraph based representation. The chapter’s authors have empirically studied the performance of the Bhyper based techniques, and their experimental results show that the proposed hypergraph based formalization reduces the RDF data access time as well as the space needed to store the Bhyper structures, while the query execution time of state-the-of-art RDF engines can be sped up by up to two orders of magnitude.


2021 ◽  
Vol 34 (2) ◽  
pp. 1-28
Author(s):  
Akshay Kumar ◽  
T. V. Vijay Kumar

Big data views, in the context of distributed file system (DFS), are defined over structured, semi-structured and unstructured data that are voluminous in nature with the purpose to reduce the response time of queries over Big data. As the size of semi-structured and unstructured data in Big data is very large compared to structured data, a framework based on query attributes on Big data can be used to identify Big data views. Materializing Big data views can enhance the query response time and facilitate efficient distribution of data over the DFS based application. Given all the Big data views cannot be materialized, therefore, a subset of Big data views should be selected for materialization. The purpose of view selection for materialization is to improve query response time subject to resource constraints. The Big data view materialization problem was defined as a bi-objective problem with the two objectives- minimization of query evaluation cost and minimization of the update processing cost, with a constraint on the total size of the materialized views. This problem is addressed in this paper using multi-objective genetic algorithm NSGA-II. The experimental results show that proposed NSGA-II based Big data view selection algorithm is able to select reasonably good quality views for materialization.


Respati ◽  
2018 ◽  
Vol 13 (1) ◽  
Author(s):  
Widhiarta Widhiarta ◽  
Arief Setyanto ◽  
Ferry Wahyu Wibowo

INTISARIPenelitian ini bertujuan untuk melakukan optimasi kinerja web menggunakan application-level cache di sisi server dan browser. Penelitian ini disusun menggunakan 2 buah VPS 1 core, memori RAM 512MB, harddisk 40GB masing-masing untuk server web dan basis data, web server Apache 2.4 dengan PHP 7.1, basis data MariaDB v.10 dengan rekayasa 20 tabel dan 10 juta tupel. Pengambilan sampel menggunakan perulangan 5x dengan kombinasi tingkat kueri dan tingkat konkurensi yang berbeda. Data dikumpulkan menggunakan aplikasi Apica Zebra Tester. Hasil analisis data menunjukkan kombinasi konfigurasi cache memiliki pengaruh yang berbeda terhadap kinerja web. Tanpa cache, kecepatan waktu akses web melambat drastis hingga 27.078,91 milidetik pada 50 konkurensi akses dan perulangan 100 kueri dengan hasil 100.000 data/kueri dengan jeda waktu 5 detik per konkurensi.Hasil penelitian membuktikan bahwa konfigurasi cache di sisi browser memiliki pengaruh peningkatan kecepatan waktu akses rata-rata 79,61% dan penurunan beban CPU 80,83% tidak stabil ketika konkurensi akses dilakukan dengan profil browser berbeda. Konfigurasi cache di sisi server memiliki pengaruh peningkatan kecepatan waktu akses rata-rata 79,83% dan penurunan beban CPU 79,88%, stabil ketika konkurensi akses dilakukan dengan profil browser berbeda. Konfigurasi cache di sisi server dan browser memiliki peningkatan pengaruh kecepatan waktu akses rata-rata tertinggi 80,07% dan penurunan beban CPU tertinggi 82,64%, sangat stabil ketika konkurensi akses dilakukan dengan profil browser berbeda. Hasil uji membuktikan, konfigurasi application-level cache paling optimal menggunakan gabungan konfigurasi cache di sisi server dan browser.  Kata Kunci : optimasi kinerja web, application-level cache, web cache, cache di sisi browser, cache di sisi serverABSTRACTThis research intends to optimizing web performance using application-level cache on server-side and browser-side. This research was arranged using 2 VPS with 1 core processor, 512MB RAM, 40GB SSD, Apache 2.4 web server with PHP 7.1, MariaDB v.10 database with 20 tables and 10 million tuples. Sampling in this research using  5x loop with various query-level dan qonqurrency level.. Data were collected using Apica Zebra Tester application. Data analysis result shows the combination of cache configurations have different effects on web performance. Without cache, web access time speeds slowed dramatically to 27,078.91 milliseconds on 50 access concurrencies and 100 queries recurring with 100,000 data/query with of 5 seconds delay per concurrency. The results show the browser-side cache configuration effect has 79,61% increasing response time access average and 80,83% decrease CPU load average, unstable when the concurrency access is done with different browser profiles. The server-side cache configuration effect has 79,83% increasing response time access average and 79,88% decrease CPU load average, stable when concurrency access is made with different browser profiles. The server-side and browser-side cache configuration effect has 80,07% increasing response time access average and 82,64% decrease CPU load average, very stable when concurrency access is performed with different browser profiles. The test results prove optimal application-level cache configuration uses a combination of server-side and browser-side. Keyword : web performance optimization, application-level cache, web cache, browser-side cache, server-side cache


2021 ◽  
Vol 18 (3) ◽  
pp. 1-22
Author(s):  
Michael Stokes ◽  
David Whalley ◽  
Soner Onder

While data filter caches (DFCs) have been shown to be effective at reducing data access energy, they have not been adopted in processors due to the associated performance penalty caused by high DFC miss rates. In this article, we present a design that both decreases the DFC miss rate and completely eliminates the DFC performance penalty even for a level-one data cache (L1 DC) with a single cycle access time. First, we show that a DFC that lazily fills each word in a DFC line from an L1 DC only when the word is referenced is more energy-efficient than eagerly filling the entire DFC line. For a 512B DFC, we are able to eliminate loads of words into the DFC that are never referenced before being evicted, which occurred for about 75% of the words in 32B lines. Second, we demonstrate that a lazily word filled DFC line can effectively share and pack data words from multiple L1 DC lines to lower the DFC miss rate. For a 512B DFC, we completely avoid accessing the L1 DC for loads about 23% of the time and avoid a fully associative L1 DC access for loads 50% of the time, where the DFC only requires about 2.5% of the size of the L1 DC. Finally, we present a method that completely eliminates the DFC performance penalty by speculatively performing DFC tag checks early and only accessing DFC data when a hit is guaranteed. For a 512B DFC, we improve data access energy usage for the DTLB and L1 DC by 33% with no performance degradation.


1998 ◽  
Vol 27 (1) ◽  
pp. 21-26 ◽  
Author(s):  
Nick Roussopoulos

Sign in / Sign up

Export Citation Format

Share Document