Weather Data Warehouse: An Agent-Based Data Warehousing System

Author(s):  
G. Kalra ◽  
D. Steiner
2003 ◽  
Vol 12 (03) ◽  
pp. 325-363 ◽  
Author(s):  
Joseph Fong ◽  
Qing Li ◽  
Shi-Ming Huang

Data warehouse contains vast amount of data to support complex queries of various Decision Support Systems (DSSs). It needs to store materialized views of data, which must be available consistently and instantaneously. Using a frame metadata model, this paper presents an architecture of a universal data warehousing with different data models. The frame metadata model represents the metadata of a data warehouse, which structures an application domain into classes, and integrates schemas of heterogeneous databases by capturing their semantics. A star schema is derived from user requirements based on the integrated schema, catalogued in the metadata, which stores the schema of relational database (RDB) and object-oriented database (OODB). Data materialization between RDB and OODB is achieved by unloading source database into sequential file and reloading into target database, through which an object relational view can be defined so as to allow the users to obtain the same warehouse view in different data models simultaneously. We describe our procedures of building the relational view of star schema by multidimensional SQL query, and the object oriented view of the data warehouse by Online Analytical Processing (OLAP) through method call, derived from the integrated schema. To validate our work, an application prototype system has been developed in a product sales data warehousing domain based on this approach.


2016 ◽  
Vol 12 (3) ◽  
pp. 32-50
Author(s):  
Xiufeng Liu ◽  
Nadeem Iftikhar ◽  
Huan Huo ◽  
Per Sieverts Nielsen

In data warehousing, the data from source systems are populated into a central data warehouse (DW) through extraction, transformation and loading (ETL). The standard ETL approach usually uses sequential jobs to process the data with dependencies, such as dimension and fact data. It is a non-trivial task to process the so-called early-/late-arriving data, which arrive out of order. This paper proposes a two-level data staging area method to optimize ETL. The proposed method is an all-in-one solution that supports processing different types of data from operational systems, including early-/late-arriving data, and fast-/slowly-changing data. The introduced additional staging area decouples loading process from data extraction and transformation, which improves ETL flexibility and minimizes intervention to the data warehouse. This paper evaluates the proposed method empirically, which shows that it is more efficient and less intrusive than the standard ETL method.


2001 ◽  
Vol 10 (03) ◽  
pp. 377-397 ◽  
Author(s):  
LUCA CABIBBO ◽  
RICCARDO TORLONE

We report on the design of a novel architecture for data warehousing based on the introduction of an explicit "logical" layer to the traditional data warehousing framework. This layer serves to guarantee a complete independence of OLAP applications from the physical storage structure of the data warehouse and thus allows users and applications to manipulate multidimensional data ignoring implementation details. For example, it makes possible the modification of the data warehouse organization (e.g. MOLAP or ROLAP implementation, star scheme or snowflake scheme structure) without influencing the high level description of multidimensional data and programs that use the data. Also, it supports the integration of multidimensional data stored in heterogeneous OLAP servers. We propose [Formula: see text], a simple data model for multidimensional databases, as the reference for the logical layer. [Formula: see text] provides an abstract formalism to describe the basic concepts that can be found in any OLAP system (fact, dimension, level of aggregation, and measure). We show that [Formula: see text] databases can be implemented in both relational and multidimensional storage systems. We also show that [Formula: see text] can be profitably used in OLAP applications as front-end. We finally describe the design of a practical system that supports the above logical architecture; this system is used to show in practice how the architecture we propose can hide implementation details and provides a support for interoperability between different and possibly heterogeneous data warehouse applications.


Organization ◽  
2018 ◽  
Vol 26 (4) ◽  
pp. 537-552 ◽  
Author(s):  
Helene Ratner ◽  
Christopher Gad

Organization is increasingly entwined with databased governance infrastructures. Developing the idea of ‘infrastructure as partial connection’ with inspiration from Marilyn Strathern and Science and Technology Studies, this article proposes that database infrastructures are intrinsic to processes of organizing intra- and inter-organizational relations. Seeing infrastructure as partial connection brings our attention to the ontological experimentation with knowing organizations through work of establishing and cutting relations. We illustrate this claim through a multi-sited ethnographic study of ‘The Data Warehouse’. ‘The Data Warehouse’ is an important infrastructural component in the current reorganization of Danish educational governance which makes schools’ performance public and comparable. We suggest that ‘The Data Warehouse’ materializes different, but overlapping, infrastructural experiments with governing education at different organizational sites enacting a governmental hierarchy. Each site can be seen as belonging to the same governance infrastructure but also as constituting ‘centres’ in its own right. ‘The Data Warehouse’ participates in the always-unfinished business of organizational world making and is made to (partially) relate to different organizational concerns and practices. This argument has implications for how we analyze the organizational effects of pervasive databased governance infrastructures and invites exploring their multiple organizing effects.


In the standard ETL (Extract Processing Load), the data warehouse refreshment must be performed outside of peak hours. i It implies i that the i functioning and i analysis has stopped in their iall actions. iIt causes the iamount of icleanness of i data from the idata Warehouse which iisn't suggesting ithe latest i operational transections. This i issue is i known as i data i latency. The data warehousing is iemployed to ibe a iremedy for ithis iissue. It updates the idata warehouse iat a inear real-time iFashion, instantly after data found from the data source. Therefore, data i latency could i be reduced. Hence the near real time data warehousing was having issues which was not identified in traditional ETL. This paper claims to communicate the issues and accessible options at every point iin the i near real-time i data warehousing, i.e. i The i issues and Available alternatives iare based ion ia literature ireview by additional iStudy that ifocus ion near real-time data iwarehousing issue


2021 ◽  
Author(s):  
Diana Suleimenova ◽  
Alireza Jahani ◽  
Hamid Arabnejad ◽  
Derek Groen

<p>There are nearly 80 million people forcibly displaced worldwide, of which 26 million are refugees and 45 million are internally displaced people (IDPs) (UNHCR, 2020). It is difficult to foresee and accurately forecast forced migration trends due to the severity and instability of conflicts or crises. However, it is possible to capture relevant aspects of this complex phenomenon and propose an approach forecasting future migration trends. Hence, we present an agent-based modelling approach, namely FLEE, that predicts the distribution of incoming refugees from a conflict origin to neighbouring countries (Suleimenova et al., 2017). Our aim is to assist governments, organisations and NGOs to efficiently allocate humanitarian resources, manage crises and save lives.</p><p>To construct a forced migration model, we obtain relevant data from three sources: the United Nations High Commissioner for Refugees (UNHCR, https://data2.unhcr.org) providing the number of forcibly displaced people in the conflict, the camp locations in neighbouring countries and their population capacities; the Armed Conflict Location and Event Data Project (ACLED, https://acled-data.com) for conflict locations and dates of battles; and the OpenStreetMaps platform (https://openstreetmap.org) to geospatially interconnect camp and conflict locations with other major settlements that reside en-route between these locations. Consequently, we simulate the constructed model using the FLEE code (https://github.com/djgroen/flee-release) and obtain the distribution of incoming forced displacement across destination camps. We were able to reproduce key trends in refugee counts found in the UNHCR data across Burundi, Central African Republic and Mali (Suleimenova et al., 2017), as well as investigated the impact of policy decisions, such as camp and border closures, in the South Sudan conflict (Suleimenova and Groen, 2020).</p><p>In our recent collaboration with Save the Children, we focus on an ongoing conflict in Ethiopia’s Tigray region and forecast IDP numbers within the region and refugee arrival counts in Sudan. We found that the number of arrivals in Sudan seem to depend strongly on whether the conflict will erupt in the east or in the west of Tigray. This seems to be a larger factor than the actual intensity of the conflict.</p><p>Moreover, our modelling approach allows us to investigate possible effects of weather conditions on forcibly displaced people by coupling FLEE with precipitation data, seasonal flood and river discharge levels. The purpose of coupling with the European Centre for Medium-Range Weather Forecasts (ECMWF) data is to identify the effect of weather conditions on the behaviour and movement speed of forced migrants.</p><p>The overall strategy is the static coupling of weather data where we have analysed 40 years of precipitation data for South Sudan to identify the precipitation range (minimum and maximum levels) as triggers which by the agents’ movement speed changes accordingly. Besides, we have used daily river discharge data from Global flood forecasting system (GloFAS) to explore the threshold for closing the link considering values of river discharge for return periods of 2, 5 and 20 years. Currently, we only use a simple rule with one threshold to define the river distance for a given link, which we aim to investigate further.</p><p><strong>References</strong><br>1. UNHCR (2020). Figures at a Glance, Available at: https://www.unhcr.org/figures-at-a-glance.html.<br>2. Suleimenova D., Bell D. and Groen D. (2017) “A generalized simulation development approach for predicting refugee destinations”. Scientific Reports 7:13377. (https://doi.org/10.1038/s41598-017-13828-9).<br>3. Suleimenova D. and Groen D. (2020) “How policy decisions affect refugee journeys in SouthSudan: A study using automated ensemble simulations”. Journal of Artificial Societies and Social Simulation 23(1)2, pp. 1-17. (https://doi.org/10.18564/jasss.4193).</p>


Author(s):  
Nenad Jukic ◽  
Miguel Velasco

Defining data warehouse requirements is widely recognized as one of the most important steps in the larger data warehouse system development process. This paper examines the potential risks and pitfalls within the data warehouse requirement collection and definition process. A real scenario of a large-scale data warehouse implementation is given, and details of this project, which ultimately failed due to inadequate requirement collection and definition process, are described. The presented case underscores and illustrates the impact of the requirement collection and definition process on the data warehouse implementation, while the case is analyzed within the context of the existing approaches, methodologies, and best practices for prevention and avoidance of typical data warehouse requirement errors and oversights.


Author(s):  
Oscar Romero ◽  
Alberto Abelló

In the last years, data warehousing systems have gained relevance to support decision making within organizations. The core component of these systems is the data warehouse and nowadays it is widely assumed that the data warehouse design must follow the multidimensional paradigm. Thus, many methods have been presented to support the multidimensional design of the data warehouse.The first methods introduced were requirement-driven but the semantics of the data warehouse (since the data warehouse is the result of homogenizing and integrating relevant data of the organization in a single, detailed view of the organization business) require to also consider the data sources during the design process. Considering the data sources gave rise to several data-driven methods that automate the data warehouse design process, mainly, from relational data sources. Currently, research on multidimensional modeling is still a hot topic and we have two main research lines. On the one hand, new hybrid automatic methods have been introduced proposing to combine data-driven and requirement-driven approaches. These methods focus on automating the whole process and improving the feedback retrieved by each approach to produce better results. On the other hand, some new approaches focus on considering alternative scenarios than relational sources. These methods also consider (semi)-structured data sources, such as ontologies or XML, that have gained relevance in the last years. Thus, they introduce innovative solutions for overcoming the heterogeneity of the data sources. All in all, we discuss the current scenario of multidimensional modeling by carrying out a survey of multidimensional design methods. We present the most relevant methods introduced in the literature and a detailed comparison showing the main features of each approach.


Sign in / Sign up

Export Citation Format

Share Document