Core Methodologies in Data Warehouse Design and Development

Author(s):  
James Yao ◽  
John Wang ◽  
Qiyang Chen ◽  
Ruben Xing

Data warehouse is a system which can integrate heterogeneous data sources to support the decision making process. Data warehouse design is a lengthy, time-consuming, and costly process. There has been a high failure in data warehouse development projects. Thus how to design and develop a data warehouse have become important issues for information systems designers and developers. This paper reviews and discusses some of the core data warehouse design and development methodologies in information system development. The paper presents in particular the most recent and much heated hybrid approach which is a combination of data-driven and requirement-driven approaches.

2020 ◽  
Vol 6 (1) ◽  
pp. 111-120
Author(s):  
Rahmat Tri Yunandar ◽  
Amir Amir ◽  
Khairul Rizal

As an Educational institutions need to have more knowledge, in evaluating, designing and making decisions. Where from such knowledge can be obtained from the data stored in the operational activities of educational institutions databases into the data warehose, so it can be used as a support in the decision making process. Data Warehouse has a major role in the provision of strategic information that can be used to meet the needs of management in a business context. This study examines the development of the data warehouse to the data of new admissions to the STIE Binaniaga Bogor, which can dig up important information that can help retrieval strategic decision to support promotional activities at the STIE Binaniaga Bogor. The final goal of this study was to produce a design of a data warehouse that can support the needs of management in making decisions by providing strategic information on new admissions, which produces a summary of information that is accurate and useful as input to determine strategies for promotion.


Author(s):  
Oscar Romero ◽  
Alberto Abelló

In the last years, data warehousing systems have gained relevance to support decision making within organizations. The core component of these systems is the data warehouse and nowadays it is widely assumed that the data warehouse design must follow the multidimensional paradigm. Thus, many methods have been presented to support the multidimensional design of the data warehouse.The first methods introduced were requirement-driven but the semantics of the data warehouse (since the data warehouse is the result of homogenizing and integrating relevant data of the organization in a single, detailed view of the organization business) require to also consider the data sources during the design process. Considering the data sources gave rise to several data-driven methods that automate the data warehouse design process, mainly, from relational data sources. Currently, research on multidimensional modeling is still a hot topic and we have two main research lines. On the one hand, new hybrid automatic methods have been introduced proposing to combine data-driven and requirement-driven approaches. These methods focus on automating the whole process and improving the feedback retrieved by each approach to produce better results. On the other hand, some new approaches focus on considering alternative scenarios than relational sources. These methods also consider (semi)-structured data sources, such as ontologies or XML, that have gained relevance in the last years. Thus, they introduce innovative solutions for overcoming the heterogeneity of the data sources. All in all, we discuss the current scenario of multidimensional modeling by carrying out a survey of multidimensional design methods. We present the most relevant methods introduced in the literature and a detailed comparison showing the main features of each approach.


2008 ◽  
pp. 3116-3141
Author(s):  
Shi-Ming Huang ◽  
David C. Yen ◽  
Hsiang-Yuan Hsueh

The materialized view approach is widely adopted in implementations of data warehouse systems in or-der for efficiency purposes. In terms of the construction of a materialized data warehouse system, some managerial problems still exist to most developers and users in the view resource maintenance area in particular. Resource redundancy and data inconsistency among materialized views in a data warehouse system is a problem that many developers and users struggle with. In this article, a space-efficient protocol for materialized view maintenance with a global data view on data warehouses with embedded proxies is proposed. In the protocol set, multilevel proxy-based protocols with a data compensating mechanism are provided to certify the consistency and uniqueness of materialized data among data resources and materialized views. The authors also provide a set of evaluation experiences and derivations to verify the feasibility of proposed protocols and mechanisms. With such protocols as proxy services, the performance and space utilization of the materialized view approach will be improved. Furthermore, the consistency issue among materialized data warehouses and heterogeneous data sources can be properly accomplished by applying a dynamic compensating and synchronization mechanism. The trade-off between efficiency, storage consumption, and data validity for view maintenance tasks can be properly balanced.


Author(s):  
Nawfal El Moukhi ◽  
Ikram El Azami ◽  
Abdelaaziz Mouloudi ◽  
Abdelali Elmounadi

The data warehouse design is currently recognized as the most important and complicated phase in any project of decision support system implementation. Its complexity is primarily due to the proliferation of data source types and the lack of a standardized and well-structured method, hence the increasing interest from researchers who have tried to develop new methods for the automation and standardization of this critical stage of the project. In this paper, the authors present the set of developed methods that follows the data-driven paradigm, and they propose a new data-driven method called X-ETL. This method aims to automating the data warehouse design by generating star models from relational data. This method is mainly based on a set of rules derived from the related works, the Model-Driven Architecture (MDA) and the XML language.


Author(s):  
Cécile Favre ◽  
Fadila Bentayeb ◽  
Omar Boussaid

A data warehouse allows the integration of heterogeneous data sources for analysis purposes. One of the key points for the success of the data warehousing process is the design of the model according to the available data sources and the analysis needs (Nabli, Soussi, Feki, Ben-Abdallah & Gargouri, 2005). However, as the business environment evolves, several changes in the content and structure of the underlying data sources may occur. In addition to these changes, analysis needs may also evolve, requiring an adaptation to the existing data warehouse’s model. In this chapter, we provide an overall view of the state of the art in data warehouse model evolution. We present a set of comparison criteria and compare the various works. Moreover, we discuss the future trends in data warehouse model evolution.


2018 ◽  
Vol 44 (2) ◽  
pp. 16-26 ◽  
Author(s):  
Alaa Hamoud ◽  
Ali Hashim ◽  
Wid Awadh

Clinical decisions are crucial because they are related to human lives. Thus, managers and decision makers inthe clinical environment seek new solutions that can support their decisions. A clinical data warehouse (CDW) is animportant solution that is used to achieve clinical stakeholders’ goals by merging heterogeneous data sources in a centralrepository and using this repository to find answers related to the strategic clinical domain, thereby supporting clinicaldecisions. CDW implementation faces numerous obstacles, starting with the data sources and ending with the tools thatview the clinical information. This paper presents a systematic overview of purpose of CDWs as well as the characteristics;requirements; data sources; extract, transform and load (ETL) process; security and privacy concerns; design approach;architecture; and challenges and difficulties related to implementing a successful CDW. PubMed and Google Scholarare used to find papers related to CDW. Among the total of 784 papers, only 42 are included in the literature review. Thesepapers are classified based on five perspectives, namely methodology, data, system, ETL tool and purpose, to findinsights related to aspects of CDW. This review can contribute answers to questions related to CDW and providerecommendations for implementing a successful CDW.


Author(s):  
Ivan Bojicic ◽  
Zoran Marjanovic ◽  
Nina Turajlic ◽  
Marko Petrovic ◽  
Milica Vuckovic ◽  
...  

In order for a data warehouse to be able to adequately fulfill its integrative and historical purpose, its data model must enable the appropriate and consistent representation of the different states of a system. In effect, a DW data model, representing the physical structure of the DW, must be general enough, to be able to consume data from heterogeneous data sources and reconcile the semantic differences of the data source models, and, at the same time, be resilient to the constant changes in the structure of the data sources. One of the main problems related to DW development is the absence of a standardized DW data model. In this paper a comparative analysis of the four most prominent DW data models (namely the relational/normalized model, data vault model, anchor model and dimensional model) will be given. On the basis of the results of [1]a, the new DW data model (the Domain/Mapping model- DMM) which would more adequately fulfill the posed requirements is presented.


Water ◽  
2018 ◽  
Vol 10 (10) ◽  
pp. 1342 ◽  
Author(s):  
Yong Qiu ◽  
Ji Li ◽  
Xia Huang ◽  
Hanchang Shi

Achieving low costs and high efficiency in wastewater treatment plants (WWTPs) is a common challenge in developing countries, although many optimizing tools on process design and operation have been well established. A data-driven optimal strategy without the prerequisite of expensive instruments and skilled engineers is thus attractive in practice. In this study, a data mining system was implemented to optimize the process design and operation in WWTPs in China, following an integral procedure including data collection and cleaning, data warehouse, data mining, and web user interface. A data warehouse was demonstrated and analyzed using one-year process data in 30 WWTPs in China. Six sludge removal loading rates on water quality indices, such as chemical oxygen demand (COD), total nitrogen (TN), and total phosphorous (TP), were calculated as derived parameters and organized into fact sheets. A searching algorithm was programmed to find out the five records most similar to the target scenario. A web interface was developed for users to input scenarios, view outputs, and update the database. Two case WWTPs were investigated to verify the data mining system. The results indicated that effluent quality of Case-1 WWTP was improved to meet the discharging criteria through optimal operations, and the process design of Case-2 WWTP could be refined in a feedback loop. A discussion on the gaps, potential, and challenges of data mining in practice was provided. The data mining system in this study is a good candidate for engineers to understand and control their processes in WWTPs.


Author(s):  
John M. Artz

Although data warehousing theory and technology have been around for well over a decade, they may well be the next hot technologies. How can it be that a technology sleeps for so long and then begins to move rapidly to the foreground? This question can have several answers. Perhaps the technology had not yet caught up to the theory or that computer technology 10 years ago did not have the capacity to delivery what the theory promised. Perhaps the ideas and the products were just ahead of their time. All these answers are true to some extent. But the real answer, I believe, is that data warehousing is in the process of undergoing a radical theoretical and paradigmatic shift, and that shift will reposition data warehousing to meet future demands.


Sign in / Sign up

Export Citation Format

Share Document