On-Demand ELT Architecture for Right-Time BI

2013 ◽  
Vol 9 (2) ◽  
pp. 21-38 ◽  
Author(s):  
Florian Waas ◽  
Robert Wrembel ◽  
Tobias Freudenreich ◽  
Maik Thiele ◽  
Christian Koncilia ◽  
...  

In a typical BI infrastructure, data, extracted from operational data sources, is transformed, cleansed, and loaded into a data warehouse by a periodic ETL process, typically executed on a nightly basis, i.e., a full day’s worth of data is processed and loaded during off-hours. However, it is desirable to have fresher data for business insights at near real-time. To this end, the authors propose to leverage a data warehouse’s capability to directly import raw, unprocessed records and defer the transformation and data cleaning until needed by pending reports. At that time, the database’s own processing mechanisms can be deployed to process the data on-demand. Event-processing capabilities are seamlessly woven into our proposed architecture. Besides outlining an overall architecture, the authors also developed a roadmap for implementing a complete prototype using conventional database technology in the form of hierarchical materialized views.

Author(s):  
Kheri Arionadi Shobirin ◽  
Adi Panca Saputra Iskandar ◽  
Ida Bagus Alit Swamardika

A data warehouse are central repositories of integrated data from one or more disparate sources from operational data in On-Line Transaction Processing (OLTP) system to use in decision making strategy and business intelligent using On-Line Analytical Processing (OLAP) techniques. Data warehouses support OLAP applications by storing and maintaining data in multidimensional format. Multidimensional data models as an integral part of OLAP designed to solve complex query analysis in real time.


2014 ◽  
Vol 11 (1) ◽  
pp. 1-29 ◽  
Author(s):  
Xiuquan Qiao ◽  
Budan Wu ◽  
Yulong Liu ◽  
Zhao Xue ◽  
Junliang Chen

With the management refinement of heating supply, the district heating system deploys a large number of meters or sensors to monitor and control the operating status of the heating network. It often needs to process real-time streaming data and coordinate the related enterprise business systems to make low-latency, intelligent decisions on the changes of heating network in time. Therefore, the automatic collection, on-demand dissemination and fusion of real-time sensing data play an increasingly important role in district heating systems. This article proposed an event-driven SOA based district heating system architecture with complex event processing capability, which can easily support the on-demand dissemination and aggregation of monitoring information and realize the event-driven service coordination cross different service domains. Finally, a deployed District Heating Control and Information Service System (DHCISS) in Beijing validates the effectiveness of our approach.


Sensors ◽  
2018 ◽  
Vol 18 (12) ◽  
pp. 4125 ◽  
Author(s):  
Pedro Clemente ◽  
Adolfo Lozano-Tello

Nowadays, data are being produced like never before because the use of the Internet of Things, social networks, and communication in general are increasing exponentially. Many of these data, especially those from public administrations, are freely offered using the open data concept where data are published to improve their reutilisation and transparency. Initially, the data involved information that is not updated continuously such as budgets, tourist information, office information, pharmacy information, etc. This kind of information does not change during large periods of time, such as days, weeks or months. However, when open data are produced near to real-time such as air quality sensors or people counters, suitable methodologies and tools are lacking to identify, consume, and analyse them. This work presents a methodology to tackle the analysis of open data sources using Model-Driven Development (MDD) and Complex Event Processing (CEP), which help users to raise the abstraction level utilised to manage and analyse open data sources. That means that users can manage heterogeneous and complex technology by using domain concepts defined by a model that could be used to generate specific code. Thus, this methodology is supported by a domain-specific language (DSL) called OpenData2CEP, which includes a metamodel, a graphical concrete syntax, and a model-to-text transformation to specific platforms, such as complex event processing engines. Finally, the methodology and the DSL have been applied to two near real-time contexts: the analysis of air quality for citizens’ proposals and the analysis of earthquake data.


2021 ◽  
Vol 3 (2) ◽  
pp. 82
Author(s):  
Mohammed Muddasir ◽  
Raghuveer K ◽  
Dayanand R

Data warehouses are loaded with data from sources such as operational data bases. Failure of loading process or failure of any of the process such as extraction or transformation is expensive because of the non-availability of data for analysis. With the advent of e-commerce and many real time application analysis of data in real time becomes a norm and hence any misses while the data is being loaded into data warehouse needs to be handled in an efficient and optimized way. The techniques to handle failure of process to populate the data are very much important as the actual loading process. Alternative arrangement needs to be made for in case of failure so that processes of populating the data warehouse are done in time. This paper explores the various ways through which a failed process of populating the data warehouse could be resumed. Various resumption techniques are compared and a novel block based technique is proposed to improve one of the existing resumption techniques.


2008 ◽  
pp. 1355-1375
Author(s):  
Christie I. Ezeife ◽  
Timothy E. Ohanekwu

Identifying integrated records that represent the same real-world object in numerous ways is just one form of data disparity (dirt) to be resolved in a data warehouse. Data cleaning is a complex process, which uses multidisciplinary techniques to resolve conflicts in data drawn from different data sources. There is a need for initial cleaning at the time a data warehouse is built, and incremental cleaning whenever new records are brought into the data warehouse during refreshing. Existing work on data cleaning have used pre-specified record match thresholds and multiple scanning of records to determine matching records in integrated data. Little attention has been paid to incremental matching of records. Determining optimal record match score threshold in a domain is hard. Also, direct long record string comparison is highly inefficient and intolerant to typing errors. Thus, this article proposes two algorithms, the first of which uses smart tokens defined from integrated records to match and identify duplicate records during initial warehouse cleaning. The second algorithm uses these tokens for fast, incremental cleaning during warehouse refreshing. Every attribute value forms either a special token like birth date or an ordinary token, which can be alphabetic, numeric, or alphanumeric. Rules are applied for forming tokens belonging to each of these four classes. These tokens are sorted and used for record match. The tokens also form very good warehouse identifiers for future faster incremental warehouse cleaning. This approach eliminates the need for match threshold and multiple passes at data. Experiments show that using tokens for record comparison produces a far better result than using the entire or greater part of a record.


Author(s):  
Munesh Chandra Trivedi ◽  
Virendra Kumar Yadav ◽  
Avadhesh Kumar Gupta

<p>Data warehouse generally contains both types of data i.e. historical &amp; current data from various data sources. Data warehouse in world of computing can be defined as system created for analysis and reporting of these both types of data. These analysis report is then used by an organization to make decisions which helps them in their growth. Construction of data warehouse appears to be simple, collection of data from data sources into one place (after extraction, transform and loading). But construction involves several issues such as inconsistent data, logic conflicts, user acceptance, cost, quality, security, stake holder’s contradictions, REST alignment etc. These issues need to be overcome otherwise will lead to unfortunate consequences affecting the organization growth. Proposed model tries to solve these issues such as REST alignment, stake holder’s contradiction etc. by involving experts of various domains such as technical, analytical, decision makers, management representatives etc. during initialization phase to better understand the requirements and mapping these requirements to data sources during design phase of data warehouse.</p>


2011 ◽  
Vol E94-B (2) ◽  
pp. 569-572
Author(s):  
Soochang PARK ◽  
Euisin LEE ◽  
Juhyun JUNG ◽  
Sang-Ha KIM

Sign in / Sign up

Export Citation Format

Share Document