A Proposed DDS Enabled Model for Data Warehouses with  Real Time Updates

<p>Data warehouse generally contains both types of data i.e. historical & current data from various data sources. Data warehouse in world of computing can be defined as system created for analysis and reporting of these both types of data. These analysis report is then used by an organization to make decisions which helps them in their growth. Construction of data warehouse appears to be simple, collection of data from data sources into one place (after extraction, transform and loading). But construction involves several issues such as inconsistent data, logic conflicts, user acceptance, cost, quality, security, stake holder’s contradictions, REST alignment etc. These issues need to be overcome otherwise will lead to unfortunate consequences affecting the organization growth. Proposed model tries to solve these issues such as REST alignment, stake holder’s contradiction etc. by involving experts of various domains such as technical, analytical, decision makers, management representatives etc. during initialization phase to better understand the requirements and mapping these requirements to data sources during design phase of data warehouse.</p>

Download Full-text

Modeling XML Warehouses for Complex Data

Open and Novel Issues in XML Database Applications ◽

10.4018/978-1-60566-308-1.ch013 ◽

2010 ◽

pp. 287-307 ◽

Cited By ~ 1

Author(s):

Doulkifli Boukraa ◽

Riadh Ben Messaoud ◽

Omar Boussaid

Keyword(s):

Data Warehouse ◽

Scientific Community ◽

Data Warehousing ◽

Complex Structure ◽

Numerical Data ◽

Current Data ◽

Decision Makers ◽

Complex Data ◽

Data Warehouses ◽

Analyze Data

Current data warehouses deal for the most part with numerical data. However, decision makers need to analyze data presented in all formats which one can qualify as complex data. Warehousing complex data is a new challenge for the scientific community. Indeed, it requires revisiting the whole warehousing process in order to take into account the complex structure of data; therefore, many concepts of data warehousing will need to be redefined. In particular, modeling complex data in a unique format for analysis purposes is a challenge. In this chapter, the authors present a complex data warehouse model at both conceptual and logical levels. They show how XML is suitable for capturing the main concepts of their model, and present the main issues related to these data warehouses.

Download Full-text

Data Warehousing

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch171 ◽

2008 ◽

pp. 2749-2761

Author(s):

Hugh J. Watson ◽

Barbara H. Wixom ◽

Dale L. Goodhue

Keyword(s):

Decision Support ◽

Real Time ◽

Data Warehouse ◽

Data Warehousing ◽

Top Management ◽

Data Warehouses ◽

Web Based ◽

Good Data ◽

Customer Services ◽

Enterprise Data Warehouse

Data warehouses are helping resolve a major problem that has plagued decision support applications over the years — a lack of good data. Top management at 3M realized that the company had to move from being product-centric to being customer savvy. In response, 3M built a terabyte data warehouse (global enterprise data warehouse) that provides thousands of 3M employees with real-time access to accurate, global, detailed information. The data warehouse underlies new Web-based customer services that are dynamically generated based on warehouse information. There are useful lessons that were learned at 3M during their years of developing the data warehouse.

Download Full-text

Multidimensional Modeling of Complex Data

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch210 ◽

2011 ◽

pp. 1358-1364 ◽

Cited By ~ 1

Author(s):

Omar Boussaid ◽

Doulkifli Boukraa

Keyword(s):

Data Warehouse ◽

Numerical Data ◽

Current Data ◽

Structured Data ◽

Complex Data ◽

Conceptual Level ◽

Data Warehouses ◽

Multidimensional Modeling ◽

Complex Relationships ◽

Scientific Challenge

While the classical databases aimed in data managing within enterprises, data warehouses help them to analyze data in order to drive their activities (Inmon, 2005). The data warehouses have proven their usefulness in the decision making process by presenting valuable data to the user and allowing him/her to analyze them online (Rafanelli, 2003). Current data warehouse and OLAP tools deal, for their most part, with numerical data which is structured usually using the relational model. Therefore, considerable amounts of unstructured or semi-structured data are left unexploited. We qualify such data as “complex data” because they originate in different sources; have multiple forms, and have complex relationships amongst them. Warehousing and exploiting such data raise many issues. In particular, modeling a complex data warehouse using the traditional star schema is no longer adequate because of many reasons (Boussaïd, Ben Messaoud, Choquet, & Anthoard, 2006; Ravat, Teste, Tournier, & Zurfluh, 2007b). First, the complex structure of data needs to be preserved rather than to be structured linearly as a set of attributes. Secondly, we need to preserve and exploit the relationships that exist between data when performing the analysis. Finally, a need may occur to operate new aggregation modes (Ben Messaoud, Boussaïd, & Loudcher, 2006; Ravat, Teste, Tournier, & Zurfluh, 2007a) that are based on textual rather than on numerical data. The design and modeling of decision support systems based on complex data is a very exciting scientific challenge (Pedersen & Jensen, 1999; Jones & Song, 2005; Luján-Mora, Trujillo, & Song; 2006). Particularly, modeling a complex data warehouse at the conceptual level then at a logical level are not straightforward activities. Little work has been done regarding these activities. At the conceptual level, most of the proposed models are object-oriented (Ravat et al, 2007a; Nassis, Rajugan, Dillon, & Rahayu 2004) and some of them make use of UML as a notation language. At the logical level, XML has been used in many models because of its adequacy for modeling both structured and semi structured data (Pokorný, 2001; Baril & Bellahsène, 2003; Boussaïd et al., 2006). In this chapter, we propose an approach of multidimensional modeling of complex data at both the conceptual and logical levels. Our conceptual model answers some modeling requirements that we believe not fulfilled by the current models. These modeling requirements are exemplified by the Digital Bibliography & Library Project case study (DBLP).

Download Full-text

On-Demand ELT Architecture for Right-Time BI

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2013040102 ◽

2013 ◽

Vol 9 (2) ◽

pp. 21-38 ◽

Cited By ~ 13

Author(s):

Florian Waas ◽

Robert Wrembel ◽

Tobias Freudenreich ◽

Maik Thiele ◽

Christian Koncilia ◽

...

Keyword(s):

Real Time ◽

Data Warehouse ◽

Data Cleaning ◽

Data Sources ◽

Materialized Views ◽

Event Processing ◽

On Demand ◽

Database Technology ◽

Processing Mechanisms ◽

Operational Data

In a typical BI infrastructure, data, extracted from operational data sources, is transformed, cleansed, and loaded into a data warehouse by a periodic ETL process, typically executed on a nightly basis, i.e., a full day’s worth of data is processed and loaded during off-hours. However, it is desirable to have fresher data for business insights at near real-time. To this end, the authors propose to leverage a data warehouse’s capability to directly import raw, unprocessed records and defer the transformation and data cleaning until needed by pending reports. At that time, the database’s own processing mechanisms can be deployed to process the data on-demand. Event-processing capabilities are seamlessly woven into our proposed architecture. Besides outlining an overall architecture, the authors also developed a roadmap for implementing a complete prototype using conventional database technology in the form of hierarchical materialized views.

Download Full-text

A Proposal of Methodology for Designing Big Data Warehouses

10.20944/preprints201806.0219.v1 ◽

2018 ◽

Cited By ~ 1

Author(s):

Francesco Di Tria ◽

Ezio Lefons ◽

Filippo Tangorra

Keyword(s):

Big Data ◽

Data Warehouse ◽

Design Methodology ◽

Data Sources ◽

Massive Data ◽

Data Warehouses ◽

Short Intervals ◽

New Class ◽

New Business ◽

Business Requirements

Big Data warehouses are a new class of databases that largely use unstructured and volatile data for analytical purpose. Examples of this kind of data sources are those coming from the Web, such as social networks and blogs, or from sensor networks, where huge amounts of data may be available only for short intervals of time. In order to manage massive data sources, a strategy must be adopted to define multidimensional schemas in presence of fast-changing situations or even undefined business requirements. In the paper, we propose a design methodology that adopts agile and automatic approaches, in order to reduce the time necessary to integrate new data sources and to include new business requirements on the fly. The data are immediately available for analyses, since the underlying architecture is based on a virtual data warehouse that does not require the importing phase. Examples of application of the methodology are presented along the paper in order to show the validity of this approach compared to a traditional one.

Download Full-text

Enhancing COVID-19 Epidemics Forecasting Accuracy by Combining Real-time and Historical Data from Social Media, Online News Articles, and Search Queries (Preprint)

10.2196/preprints.35266 ◽

2021 ◽

Author(s):

Jingwei Li ◽

Wei Huang ◽

Choon Ling Sia ◽

Zhuo Chen ◽

Tailai Wu ◽

...

Keyword(s):

Public Health ◽

Real Time ◽

Historical Data ◽

Mainland China ◽

Hubei Province ◽

Online News ◽

Data Sources ◽

Resources Allocation ◽

Forecasting Accuracy ◽

Proposed Model

BACKGROUND The SARS-COV-2 virus and its variants are posing extraordinary challenges for public health worldwide. More timely and accurate forecasting of COVID-19 epidemics is the key to maintaining timely interventions and policies and efficient resources allocation. Internet-based data sources have shown great potential to supplement traditional infectious disease surveillance, and the combination of different Internet-based data sources has shown greater power to enhance epidemic forecasting accuracy than using a single Internet-based data source. However, existing methods incorporating multiple Internet-based data sources only used real-time data from these sources as exogenous inputs, but didn’t take all the historical data into account. Moreover, the predictive power of different Internet-based data sources in providing early warning for COVID-19 outbreaks has not been fully explored. OBJECTIVE The main aim of our study is to explore whether combining real-time and historical data from multiple Internet-based sources could improve the COVID-19 forecasting accuracy over the existing baseline models. A secondary aim is to explore the COVID-19 forecasting timeliness based on different Internet-based data sources. METHODS We first used core terms and symptoms related keywords-based methods to extract COVID-19 related Internet-based data from December 21, 2019, to February 29, 2020. The Internet-based data we explored included 90,493,912 online news articles, 37,401,900 microblogs, and all the Baidu search query data during that period. We then proposed an autoregressive model with exogenous inputs, incorporating the real-time and historical data from multiple Internet-based sources. Our proposed model was compared with baseline models, and all the models were tested during the first wave of COVID-19 epidemics in Hubei province and the rest of mainland China separately. We also used the lagged Pearson correlations for the COVID-19 forecasting timeliness analysis. RESULTS Our proposed model achieved the highest accuracy in all the five accuracy measures, compared with all the baseline models in both Hubei province and the rest of mainland China. In mainland China except Hubei, the COVID-19 epidemics forecasting accuracy differences between our proposed model (model i) and all the other baseline models were statistically significant (model 1, t=–8.722, P<.001; model 2, t=–5.000, P<.001, model 3, t=–1.882, P =0.063, model 4, t=–4.644, P<.001; model 5, t=–4.488, P<.001). In Hubei province, our proposed model's forecasting accuracy improved significantly compared with the baseline model using historical COVID-19 new confirmed case counts only (model 1, t=–1.732, P=0.086). Our results also showed that Internet-based sources could provide a 2-6 days earlier warning for COVID-19 outbreaks. CONCLUSIONS Our approach incorporating real-time and historical data from multiple Internet-based sources could improve forecasting accuracy for COVID-19 epidemics and its variants, which may help improve public health agencies' interventions and resources allocation in mitigating and controlling new waves of COVID-19 or other epidemics.

Download Full-text

Temporal Semistructured Data Models and Data Warehouses

Data Warehouses and OLAP ◽

10.4018/987-1-59904-364-7.ch012 ◽

2011 ◽

pp. 277-297 ◽

Cited By ~ 2

Author(s):

Carlo Combi ◽

Barbara Oliboni

Keyword(s):

Data Warehouse ◽

Data Model ◽

Heterogeneous Data ◽

Data Models ◽

Semistructured Data ◽

Data Sources ◽

Time Varying ◽

Data Warehouses ◽

Time Dimension ◽

Heterogeneous Data Sources

This chapter describes a graph-based approach to represent information stored in a data warehouse, by means of a temporal semistructured data model. We consider issues related to the representation of semistructured data warehouses, and discuss the set of constraints needed to manage in a correct way the warehouse time, i.e. the time dimension considered storing data in the data warehouse itself. We use a temporal semistructured data model because a data warehouse can contain data coming from different and heterogeneous data sources. This means that data stored in a data warehouse are semistructured in nature, i.e. in different documents the same information can be represented in different ways, and moreover, the document schemata can be available or not. Moreover, information stored into a data warehouse is often time varying, thus as for semistructured data, also in the data warehouse context, it could be useful to consider time.

Download Full-text

Maintaining Dimension's History in Data Warehouses Effectively

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2019070103 ◽

2019 ◽

Vol 15 (3) ◽

pp. 46-62

Author(s):

Canan Eren Atay ◽

Georgia Garani

Keyword(s):

Data Warehouse ◽

Historical Data ◽

Research Work ◽

Real Data ◽

Temporal Data ◽

Data Warehouses ◽

Data Set ◽

Proposed Model ◽

Object Relational ◽

History Of

A data warehouse is considered a key aspect of success for any decision support system. Research on temporal databases have produced important results in this field, and data warehouses, which store historical data, can clearly benefit from such studies. A slowly changing dimension is a dimension in which any of its attributes in a data warehouse can change infrequently over time. Although different solutions have been proposed, each has its own particular disadvantages. The authors propose the Object-Relational Temporal Data Warehouse (O-RTDW) model for the slowly changing dimensions in this research work. Using this approach, it is possible to keep track of the whole history of an object in a data warehouse efficiently. The proposed model has been implemented on a real data set and tested successfully. Several limitations implied in other solutions, such as redundancy, surrogate keys, incomplete historical data, and creation of additional tables are not present in our solution.

Download Full-text

Towards Comparative Analysis of Resumption Techniques in ETL

Indonesian Journal of Information Systems ◽

10.24002/ijis.v3i2.3776 ◽

2021 ◽

Vol 3 (2) ◽

pp. 82

Author(s):

Mohammed Muddasir ◽

Raghuveer K ◽

Dayanand R

Keyword(s):

Comparative Analysis ◽

Real Time ◽

Data Warehouse ◽

Data Warehouses ◽

Data Bases ◽

Loading Process ◽

Application Analysis ◽

Block Based ◽

Operational Data ◽

Real Time Application

Data warehouses are loaded with data from sources such as operational data bases. Failure of loading process or failure of any of the process such as extraction or transformation is expensive because of the non-availability of data for analysis. With the advent of e-commerce and many real time application analysis of data in real time becomes a norm and hence any misses while the data is being loaded into data warehouse needs to be handled in an efficient and optimized way. The techniques to handle failure of process to populate the data are very much important as the actual loading process. Alternative arrangement needs to be made for in case of failure so that processes of populating the data warehouse are done in time. This paper explores the various ways through which a failed process of populating the data warehouse could be resumed. Various resumption techniques are compared and a novel block based technique is proposed to improve one of the existing resumption techniques.

Download Full-text

A Survey of Managing the Evolution of Data Warehouses

Business Information Systems ◽

10.4018/978-1-61520-969-9.ch055 ◽

2010 ◽

pp. 894-928 ◽

Cited By ~ 1

Author(s):

Robert Wrembel

Keyword(s):

Data Warehouse ◽

Real World ◽

Data Sources ◽

Index Structures ◽

Data Warehouses ◽

The Real ◽

External Data ◽

Structure Changes

Methods of designing a data warehouse (DW) usually assume that its structure is static. In practice, however, a DW structure changes among others as the result of the evolution of external data sources and changes of the real world represented in a DW. The most advanced research approaches to this problem are based on temporal extensions and versioning techniques. This article surveys challenges in designing, building, and managing data warehouses whose structure and content evolve in time. The survey is based on the so-called Multiversion Data Warehouse (MVDW). In details, this article presents the following issues: the concept of the MVDW, a language for querying the MVDW, a framework for detecting changes in data sources, a structure for sharing data in the MVDW, index structures for indexing data in the MVDW.

Download Full-text