Core Methodologies in Data Warehouse Design and Development

Data warehouse is a system which can integrate heterogeneous data sources to support the decision making process. Data warehouse design is a lengthy, time-consuming, and costly process. There has been a high failure in data warehouse development projects. Thus how to design and develop a data warehouse have become important issues for information systems designers and developers. This paper reviews and discusses some of the core data warehouse design and development methodologies in information system development. The paper presents in particular the most recent and much heated hybrid approach which is a combination of data-driven and requirement-driven approaches.

Download Full-text

Perancangan Data Warehouse Untuk Informasi Strategi Studi Kasus Penerimaan Siswa Baru STIE Binaniaga Bogor

Jurnal Teknik Komputer ◽

10.31294/jtk.v6i1.6861 ◽

2020 ◽

Vol 6 (1) ◽

pp. 111-120

Author(s):

Rahmat Tri Yunandar ◽

Amir Amir ◽

Khairul Rizal

Keyword(s):

Decision Making ◽

Data Warehouse ◽

Strategic Decision ◽

Educational Institutions ◽

Decision Making Process ◽

Process Data ◽

Strategic Information ◽

Business Context ◽

Final Goal ◽

Operational Activities

As an Educational institutions need to have more knowledge, in evaluating, designing and making decisions. Where from such knowledge can be obtained from the data stored in the operational activities of educational institutions databases into the data warehose, so it can be used as a support in the decision making process. Data Warehouse has a major role in the provision of strategic information that can be used to meet the needs of management in a business context. This study examines the development of the data warehouse to the data of new admissions to the STIE Binaniaga Bogor, which can dig up important information that can help retrieval strategic decision to support promotional activities at the STIE Binaniaga Bogor. The final goal of this study was to produce a design of a data warehouse that can support the needs of management in making decisions by providing strategic information on new admissions, which produces a summary of information that is accurate and useful as input to determine strategies for promotion.

Download Full-text

Multidimensional Design Methods for Data Warehousing

Integrations of Data Warehousing, Data Mining and Database Technologies ◽

10.4018/978-1-60960-537-7.ch005 ◽

2011 ◽

pp. 78-105 ◽

Cited By ~ 1

Author(s):

Oscar Romero ◽

Alberto Abelló

Keyword(s):

Data Warehouse ◽

Design Process ◽

Data Warehousing ◽

Design Methods ◽

Data Sources ◽

Data Driven ◽

Main Research ◽

Multidimensional Modeling ◽

Warehouse Design ◽

Warehousing Systems

In the last years, data warehousing systems have gained relevance to support decision making within organizations. The core component of these systems is the data warehouse and nowadays it is widely assumed that the data warehouse design must follow the multidimensional paradigm. Thus, many methods have been presented to support the multidimensional design of the data warehouse.The first methods introduced were requirement-driven but the semantics of the data warehouse (since the data warehouse is the result of homogenizing and integrating relevant data of the organization in a single, detailed view of the organization business) require to also consider the data sources during the design process. Considering the data sources gave rise to several data-driven methods that automate the data warehouse design process, mainly, from relational data sources. Currently, research on multidimensional modeling is still a hot topic and we have two main research lines. On the one hand, new hybrid automatic methods have been introduced proposing to combine data-driven and requirement-driven approaches. These methods focus on automating the whole process and improving the feedback retrieved by each approach to produce better results. On the other hand, some new approaches focus on considering alternative scenarios than relational sources. These methods also consider (semi)-structured data sources, such as ontologies or XML, that have gained relevance in the last years. Thus, they introduce innovative solutions for overcoming the heterogeneity of the data sources. All in all, we discuss the current scenario of multidimensional modeling by carrying out a survey of multidimensional design methods. We present the most relevant methods introduced in the literature and a detailed comparison showing the main features of each approach.

Download Full-text

A Space-Efficient Protocol for Consistency of External View Maintenance on Data Warehouse Systems

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch198 ◽

2008 ◽

pp. 3116-3141

Author(s):

Shi-Ming Huang ◽

David C. Yen ◽

Hsiang-Yuan Hsueh

Keyword(s):

Data Warehouse ◽

Heterogeneous Data ◽

Materialized Views ◽

Data Warehouses ◽

View Maintenance ◽

Synchronization Mechanism ◽

Materialized View ◽

Heterogeneous Data Sources ◽

Data Validity ◽

Data View

The materialized view approach is widely adopted in implementations of data warehouse systems in or-der for efficiency purposes. In terms of the construction of a materialized data warehouse system, some managerial problems still exist to most developers and users in the view resource maintenance area in particular. Resource redundancy and data inconsistency among materialized views in a data warehouse system is a problem that many developers and users struggle with. In this article, a space-efficient protocol for materialized view maintenance with a global data view on data warehouses with embedded proxies is proposed. In the protocol set, multilevel proxy-based protocols with a data compensating mechanism are provided to certify the consistency and uniqueness of materialized data among data resources and materialized views. The authors also provide a set of evaluation experiences and derivations to verify the feasibility of proposed protocols and mechanisms. With such protocols as proxy services, the performance and space utilization of the materialized view approach will be improved. Furthermore, the consistency issue among materialized data warehouses and heterogeneous data sources can be properly accomplished by applying a dynamic compensating and synchronization mechanism. The trade-off between efficiency, storage consumption, and data validity for view maintenance tasks can be properly balanced.

Download Full-text

X-ETL: A Data-Driven Approach for Designing Star Schemas

International Journal of Recent Contributions from Engineering Science & IT (iJES) ◽

10.3991/ijes.v7i1.10009 ◽

2019 ◽

Vol 7 (1) ◽

pp. 4

Author(s):

Nawfal El Moukhi ◽

Ikram El Azami ◽

Abdelaaziz Mouloudi ◽

Abdelali Elmounadi

Keyword(s):

Decision Support ◽

Data Warehouse ◽

Data Driven ◽

Critical Stage ◽

Model Driven ◽

New Methods ◽

Star Models ◽

Data Driven Approach ◽

Data Source ◽

Warehouse Design

The data warehouse design is currently recognized as the most important and complicated phase in any project of decision support system implementation. Its complexity is primarily due to the proliferation of data source types and the lack of a standardized and well-structured method, hence the increasing interest from researchers who have tried to develop new methods for the automation and standardization of this critical stage of the project. In this paper, the authors present the set of developed methods that follows the data-driven paradigm, and they propose a new data-driven method called X-ETL. This method aims to automating the data warehouse design by generating star models from relational data. This method is mainly based on a set of rules derived from the related works, the Model-Driven Architecture (MDA) and the XML language.

Download Full-text

A Survey of Data Warehouse Model Evolution

Handbook of Research on Innovations in Database Technologies and Applications ◽

10.4018/978-1-60566-242-8.ch015 ◽

2009 ◽

pp. 129-136

Author(s):

Cécile Favre ◽

Fadila Bentayeb ◽

Omar Boussaid

Keyword(s):

Data Warehouse ◽

Data Warehousing ◽

Business Environment ◽

Heterogeneous Data ◽

Data Sources ◽

Future Trends ◽

Model Evolution ◽

Heterogeneous Data Sources ◽

Key Points ◽

Existing Data

A data warehouse allows the integration of heterogeneous data sources for analysis purposes. One of the key points for the success of the data warehousing process is the design of the model according to the available data sources and the analysis needs (Nabli, Soussi, Feki, Ben-Abdallah & Gargouri, 2005). However, as the business environment evolves, several changes in the content and structure of the underlying data sources may occur. In addition to these changes, analysis needs may also evolve, requiring an adaptation to the existing data warehouse’s model. In this chapter, we provide an overall view of the state of the art in data warehouse model evolution. We present a set of comparison criteria and compare the various works. Moreover, we discuss the future trends in data warehouse model evolution.

Download Full-text

Semantic Knowledge Discovery and Data-Driven Logical Reasoning from Heterogeneous Data Sources

Uncertainty Reasoning for the Semantic Web III - Lecture Notes in Computer Science ◽

10.1007/978-3-319-13413-0_9 ◽

2014 ◽

pp. 163-183 ◽

Cited By ~ 1

Author(s):

Claudia d’Amato ◽

Volha Bryl ◽

Luciano Serafini

Keyword(s):

Knowledge Discovery ◽

Heterogeneous Data ◽

Semantic Knowledge ◽

Logical Reasoning ◽

Data Sources ◽

Data Driven ◽

Heterogeneous Data Sources

Download Full-text

CLINICAL DATA WAREHOUSE: A REVIEW

Iraqi Journal for Computers and Informatics ◽

10.25195/ijci.v44i2.53 ◽

2018 ◽

Vol 44 (2) ◽

pp. 16-26 ◽

Cited By ~ 1

Author(s):

Alaa Hamoud ◽

Ali Hashim ◽

Wid Awadh

Keyword(s):

Data Warehouse ◽

Clinical Data ◽

Clinical Information ◽

Heterogeneous Data ◽

Data Sources ◽

Security And Privacy ◽

Clinical Environment ◽

Privacy Concerns ◽

Clinical Data Warehouse ◽

Heterogeneous Data Sources

Clinical decisions are crucial because they are related to human lives. Thus, managers and decision makers inthe clinical environment seek new solutions that can support their decisions. A clinical data warehouse (CDW) is animportant solution that is used to achieve clinical stakeholders’ goals by merging heterogeneous data sources in a centralrepository and using this repository to find answers related to the strategic clinical domain, thereby supporting clinicaldecisions. CDW implementation faces numerous obstacles, starting with the data sources and ending with the tools thatview the clinical information. This paper presents a systematic overview of purpose of CDWs as well as the characteristics;requirements; data sources; extract, transform and load (ETL) process; security and privacy concerns; design approach;architecture; and challenges and difficulties related to implementing a successful CDW. PubMed and Google Scholarare used to find papers related to CDW. Among the total of 784 papers, only 42 are included in the literature review. Thesepapers are classified based on five perspectives, namely methodology, data, system, ETL tool and purpose, to findinsights related to aspects of CDW. This review can contribute answers to questions related to CDW and providerecommendations for implementing a successful CDW.

Download Full-text

Domain/Mapping Model: A Novel Data Warehouse Data Mode

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2017.2.2876 ◽

2017 ◽

Vol 12 (2) ◽

pp. 166 ◽

Cited By ~ 2

Author(s):

Ivan Bojicic ◽

Zoran Marjanovic ◽

Nina Turajlic ◽

Marko Petrovic ◽

Milica Vuckovic ◽

...

Keyword(s):

Data Warehouse ◽

Data Model ◽

Heterogeneous Data ◽

Physical Structure ◽

Data Sources ◽

Mapping Model ◽

Heterogeneous Data Sources ◽

Domain Mapping ◽

Source Models ◽

Data Source

In order for a data warehouse to be able to adequately fulfill its integrative and historical purpose, its data model must enable the appropriate and consistent representation of the different states of a system. In effect, a DW data model, representing the physical structure of the DW, must be general enough, to be able to consume data from heterogeneous data sources and reconcile the semantic differences of the data source models, and, at the same time, be resilient to the constant changes in the structure of the data sources. One of the main problems related to DW development is the absence of a standardized DW data model. In this paper a comparative analysis of the four most prominent DW data models (namely the relational/normalized model, data vault model, anchor model and dimensional model) will be given. On the basis of the results of [1]a, the new DW data model (the Domain/Mapping model- DMM) which would more adequately fulfill the posed requirements is presented.

Download Full-text

A Feasible Data-Driven Mining System to Optimize Wastewater Treatment Process Design and Operation

Water ◽

10.3390/w10101342 ◽

2018 ◽

Vol 10 (10) ◽

pp. 1342 ◽

Cited By ~ 4

Author(s):

Yong Qiu ◽

Ji Li ◽

Xia Huang ◽

Hanchang Shi

Keyword(s):

Data Mining ◽

Wastewater Treatment ◽

Data Warehouse ◽

Process Design ◽

High Efficiency ◽

Data Driven ◽

Process Data ◽

Mining System ◽

Wastewater Treatment Process ◽

Data Mining System

Achieving low costs and high efficiency in wastewater treatment plants (WWTPs) is a common challenge in developing countries, although many optimizing tools on process design and operation have been well established. A data-driven optimal strategy without the prerequisite of expensive instruments and skilled engineers is thus attractive in practice. In this study, a data mining system was implemented to optimize the process design and operation in WWTPs in China, following an integral procedure including data collection and cleaning, data warehouse, data mining, and web user interface. A data warehouse was demonstrated and analyzed using one-year process data in 30 WWTPs in China. Six sludge removal loading rates on water quality indices, such as chemical oxygen demand (COD), total nitrogen (TN), and total phosphorous (TP), were calculated as derived parameters and organized into fact sheets. A searching algorithm was programmed to find out the five records most similar to the target scenario. A web interface was developed for users to input scenarios, view outputs, and update the database. Two case WWTPs were investigated to verify the data mining system. The results indicated that effluent quality of Case-1 WWTP was improved to meet the discharging criteria through optimal operations, and the process design of Case-2 WWTP could be refined in a feedback loop. A discussion on the gaps, potential, and challenges of data mining in practice was provided. The data mining system in this study is a good candidate for engineers to understand and control their processes in WWTPs.

Download Full-text

Data Driven vs. Metric Driven Data Warehouse Design

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch043 ◽

2011 ◽

pp. 223-227 ◽

Cited By ~ 2

Author(s):

John M. Artz

Keyword(s):

Data Warehouse ◽

Computer Technology ◽

Data Warehousing ◽

Data Driven ◽

Paradigmatic Shift ◽

The Real ◽

Warehouse Design ◽

Future Demands

Although data warehousing theory and technology have been around for well over a decade, they may well be the next hot technologies. How can it be that a technology sleeps for so long and then begins to move rapidly to the foreground? This question can have several answers. Perhaps the technology had not yet caught up to the theory or that computer technology 10 years ago did not have the capacity to delivery what the theory promised. Perhaps the ideas and the products were just ahead of their time. All these answers are true to some extent. But the real answer, I believe, is that data warehousing is in the process of undergoing a radical theoretical and paradigmatic shift, and that shift will reposition data warehousing to meet future demands.

Download Full-text