scholarly journals Preservation of Data Warehouses

Author(s):  
Carlos Aldeias ◽  
Gabriel David ◽  
Cristina Ribeiro

Data warehouses are used in many application domains, and there is no established method for their preservation. A data warehouse can be implemented in multidimensional structures or in relational databases that represent the dimensional model concepts in the relational model. The focus of this work is on describing the dimensional model of a data warehouse and migrating it to an XML model, in order to achieve a long-term preservation format. This chapter presents the definition of the XML structure that extends the SIARD format used for the description and archive of relational databases, enriching it with a layer of metadata for the data warehouse components. Data Warehouse Extensible Markup Language (DWXML) is the XML language proposed to describe the data warehouse. An application that combines the SIARD format and the DWXML metadata layer supports the XML language and helps to acquire the relevant metadata for the warehouse and to build the archival format.

2021 ◽  
Author(s):  
Naveen Kunnathuvalappil Hariharan

Financial data volumes are increasing, and this appears to be a long-term trend, implying that data managementdevelopment will be crucial over the next few decades. Because financial data is sometimes real-time data, itis constantly generated, resulting in a massive amount of financial data produced in a short period of time.The volume, diversity, and velocity of Big Financial Data are highlighting the significant limitations oftraditional Data Warehouses (DWs). Their rigid relational model, high scalability costs, and sometimesinefficient performance pave the way for new methods and technologies. The majority of the technologiesused in background processing and storage research were previously the subject of research in their earlystages. The Apache Foundation and Google are the two most important initiatives. For dealing with largefinancial data, three techniques outperform relational databases and traditional ETL processing: NoSQL andNewSQL storage, and MapReduce processing.


2008 ◽  
pp. 2364-2370
Author(s):  
Janet Delve

Data Warehousing is now a well-established part of the business and scientific worlds. However, up until recently, data warehouses were restricted to modeling essentially numerical data – examples being sales figures in the business arena (e.g. Wal-Mart’s data warehouse) and astronomical data (e.g. SKICAT) in scientific research, with textual data providing a descriptive rather than a central role. The lack of ability of data warehouses to cope with mainly non-numeric data is particularly problematic for humanities1 research utilizing material such as memoirs and trade directories. Recent innovations have opened up possibilities for non-numeric data warehouses, making them widely accessible to humanities research for the first time. Due to its irregular and complex nature, humanities research data is often difficult to model and manipulating time shifts in a relational database is problematic as is fitting such data into a normalized data model. History and linguistics are exemplars of areas where relational databases are cumbersome and which would benefit from the greater freedom afforded by data warehouse dimensional modeling.


2009 ◽  
pp. 702-724
Author(s):  
Colleen Cunningham ◽  
Il-Yeol Song ◽  
Peter P. Chen

CRM is a strategy that integrates concepts of knowledge management, data mining, and data warehousing in order to support an organization’s decision-making process to retain long-term and profitable relationships with its customers. This research is part of a long-term study to examine systematically CRM factors that affect design decisions for CRM data warehouses in order to build a taxonomy of CRM analyses and to determine the impact of those analyses on CRM data warehousing design decisions. This article presents the design implications that CRM poses to data warehousing and then proposes a robust multidimensional starter model that supports CRM analyses. Additional research contributions include the introduction of two new measures, percent success ratio and CRM suitability ratio by which CRM models can be evaluated, the identification of and classification of CRM queries, and a preliminary heuristic for designing data warehouses to support CRM analyses.


Author(s):  
Janelle Jenstad ◽  
Tracey El Hajj

In late 2018, the Internet Shakespeare Editions (ISE) software experienced catastrophic code failure. In this paper, we describe the boutique markup language used by the ISE (known as IML for ISE Markup Language), various fundamental differences between IML and TEI, and the challenging work of converting and remediating the ISEʼs IML-encoded files. Our central question is how to do this work in a principled, efficient, well documented, replicable, and transferable way. We conclude with recommendations for re-encoding legacy projects and stabilizing them for long-term preservation.


2011 ◽  
pp. 731-752
Author(s):  
Colleen Cunningham ◽  
Il-Yeol Song ◽  
Peter P. Chen

CRM is a strategy that integrates concepts of knowledge management, data mining, and data warehousing in order to support an organization’s decision-making process to retain long-term and profitable relationships with its customers. This research is part of a long-term study to examine systematically CRM factors that affect design decisions for CRM data warehouses in order to build a taxonomy of CRM analyses and to determine the impact of those analyses on CRM data warehousing design decisions. This article presents the design implications that CRM poses to data warehousing and then proposes a robust multidimensional starter model that supports CRM analyses. Additional research contributions include the introduction of two new measures, percent success ratio and CRM suitability ratio by which CRM models can be evaluated, the identification of and classification of CRM queries, and a preliminary heuristic for designing data warehouses to support CRM analyses.


Author(s):  
Elzbieta Malinowski ◽  
Esteban Zimányi

The advantages of using conceptual models for database design are well known. In particular, they facilitate the communication between users and designers since they do not require the knowledge of specific features of the underlying implementation platform. Further, schemas developed using conceptual models can be mapped to different logical models, such as the relational, objectrelational, or object-oriented models, thus simplifying technological changes. Finally, the logical model is translated into a physical one according to the underlying implementation platform. Nevertheless, the domain of conceptual modeling for data warehouse applications is still at a research stage. The current state of affairs is that logical models are used for designing data warehouses, i.e., using star and snowflake schemas in the relational model. These schemas provide a multidimensional view of data where measures (e.g., quantity of products sold) are analyzed from different perspectives or dimensions (e.g., by product) and at different levels of detail with the help of hierarchies. On-line analytical processing (OLAP) systems allow users to perform automatic aggregations of measures while traversing hierarchies: the roll-up operation transforms detailed measures into aggregated values (e.g., daily into monthly sales) while the drill-down operation does the contrary. Star and snowflake schemas have several disadvantages, such as the inclusion of implementation details and the inadequacy of representing different kinds of hierarchies existing in real-world applications. In order to facilitate users to express their analysis needs, it is necessary to represent data requirements for data warehouses at the conceptual level. A conceptual multidimensional model should provide a graphical support (Rizzi, 2007) and allow representing facts, measures, dimensions, and different kinds of hierarchies.


2008 ◽  
pp. 408-428
Author(s):  
Manuel Serrano ◽  
Coral Calero ◽  
Mario Piattini

Data warehouses are large repositories that integrate data from several sources for analysis and decision support. Data warehouse quality is crucial, because a bad data warehouse design may lead to the rejection of the decision support system or may result in non-productive decisions. In the last years, we have been working on the definition and validation of software metrics in order to assure data warehouse quality. Some of the metrics are adapted directly from previous ones defined for relational databases, and others are specific for data warehouses. In this paper, we present part of the empirical work we have developed in order to know if the proposed metrics can be used as indicators of data warehouse quality. Previously, we have developed an experiment and its replication, and in this paper, we present the second replication we have made with the purpose of assessing data warehouse maintainability. As a result of the whole empirical work, we have obtained a subset of the proposed metrics that seem to be good indicators of data warehouse quality.


Author(s):  
Janet Delve

Data Warehousing is now a well-established part of the business and scientific worlds. However, up until recently, data warehouses were restricted to modeling essentially numerical data – examples being sales figures in the business arena (in say Wal-Mart’s data warehouse (Westerman, 2000)) and astronomical data (for example SKICAT) in scientific research, with textual data providing a descriptive rather than a central analytic role. The lack of ability of data warehouses to cope with mainly non-numeric data is particularly problematic for humanities1 research utilizing material such as memoirs and trade directories. Recent innovations have opened up possibilities for ‘non-numeric’ data warehouses, making them widely accessible to humanities research for the first time. Due to its irregular and complex nature, humanities research data is often difficult to model, and manipulating time shifts in a relational database is problematic as is fitting such data into a normalized data model. History and linguistics are exemplars of areas where relational databases are cumbersome and which would benefit from the greater freedom afforded by data warehouse dimensional modeling.


Author(s):  
Wilfred Ng ◽  
Mark Levene

Data warehousing is a corporate strategy that needs to integrate information from several sources of separately developed Database Management Systems (DBMSs). A future DBMS of a data warehouse should provide adequate facilities to manage a wide range of information arising from such integration. We propose that the capabilities of database languages should be enhanced to manipulate user-defined data orderings, since business queries in an enterprise usually involve order. We extend the relational model to incorporate partial orderings into data domains and describe the ordered relational model. We have already defined and implemented a minimal extension of SQL, called OSQL, which allows querying over ordered relational databases. One of the important facilities provided by OSQL is that it allows users to capture the underlying semantics of the ordering of the data for a given application. Herein we demonstrate that OSQL aided with a package discipline can be an effective means to manage the inter-related operations and the underlying data domains of a wide range of advanced applications that are vital in data warehousing, such as temporal, incomplete and fuzzy information. We present the details of the generic operations arising from these applications in the form of three OSQL packages called: OSQL_TIME, OSQL_INCOMP and OSQL_FUZZY.


Sign in / Sign up

Export Citation Format

Share Document