HYBRIDJOIN for Near-Real-Time Data Warehousing

M. Asif Naeem; Gillian Dobbie; Gerald Weber

doi:10.4018/jdwm.2011100102

HYBRIDJOIN for Near-Real-Time Data Warehousing

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2011100102 ◽

2011 ◽

Vol 7 (4) ◽

pp. 21-42 ◽

Cited By ~ 13

Author(s):

M. Asif Naeem ◽

Gillian Dobbie ◽

Gerald Weber

Keyword(s):

Real Time ◽

Data Stream ◽

Time Integration ◽

Synthetic Data ◽

Performance Measurements ◽

Time Data ◽

Join Algorithm ◽

Real Time Data ◽

Set Up ◽

Nested Loop

An important component of near-real-time data warehouses is the near-real-time integration layer. One important element in near-real-time data integration is the join of a continuous input data stream with a disk-based relation. For high-throughput streams, stream-based algorithms, such as Mesh Join (MESHJOIN), can be used. However, in MESHJOIN the performance of the algorithm is inversely proportional to the size of disk-based relation. The Index Nested Loop Join (INLJ) can be set up so that it processes stream input, and can deal with intermittences in the update stream but it has low throughput. This paper introduces a robust stream-based join algorithm called Hybrid Join (HYBRIDJOIN), which combines the two approaches. A theoretical result shows that HYBRIDJOIN is asymptotically as fast as the fastest of both algorithms. The authors present performance measurements of the implementation. In experiments using synthetic data based on a Zipfian distribution, HYBRIDJOIN performs significantly better for typical parameters of the Zipfian distribution, and in general performs in accordance with the theoretical model while the other two algorithms are unacceptably slow under different settings.

Download Full-text

HYBRIDJOIN for Near-Real-Time Data Warehousing

Developments in Data Extraction, Management, and Analysis ◽

10.4018/978-1-4666-2148-0.ch013 ◽

2013 ◽

pp. 280-302

Author(s):

M. Asif Naeem ◽

Gillian Dobbie ◽

Gerald Weber

Keyword(s):

Real Time ◽

Input Data ◽

Time Integration ◽

Synthetic Data ◽

Performance Measurements ◽

Time Data ◽

Join Algorithm ◽

Real Time Data ◽

Set Up ◽

Nested Loop

Download Full-text

Real-time data stream analysis and entire process quality monitoring based on plant information

Journal of Computer Applications ◽

10.3724/sp.j.1087.2012.02935 ◽

2013 ◽

Vol 32 (10) ◽

pp. 2935-2939

Author(s):

Xiao-yong BIAN ◽

Xiao-long ZHANG ◽

Hai YU

Keyword(s):

Real Time ◽

Data Stream ◽

Process Quality ◽

Quality Monitoring ◽

Time Data ◽

Entire Process ◽

Real Time Data ◽

Data Stream Analysis

Download Full-text

Real-time Data Stream Processing - Challenges and Perspectives

International Journal of Computer Science Issues ◽

10.20943/01201705.612 ◽

2017 ◽

Vol 14 (5) ◽

pp. 6-12 ◽

Cited By ~ 3

Keyword(s):

Real Time ◽

Data Stream ◽

Stream Processing ◽

Time Data ◽

Data Stream Processing ◽

Real Time Data

Download Full-text

Implementing a real-time data stream for time-series stellar photometry

10.1117/12.2232248 ◽

2016 ◽

Author(s):

M. Bogosavljevic ◽

Z. Ioannou

Keyword(s):

Time Series ◽

Real Time ◽

Data Stream ◽

Time Data ◽

Stellar Photometry ◽

Real Time Data

Download Full-text

Dynamic Load Balancing and Channel Strategy for Apache Flume Collecting Real-Time Data Stream

2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC) ◽

10.1109/ispa/iucc.2017.00089 ◽

2017 ◽

Cited By ~ 1

Author(s):

Buqing Shu ◽

Haopeng Chen ◽

Meng Sun

Keyword(s):

Load Balancing ◽

Real Time ◽

Dynamic Load ◽

Data Stream ◽

Dynamic Load Balancing ◽

Time Data ◽

Channel Strategy ◽

Real Time Data

Download Full-text

Performance Improvement IoT Applications Through Multimedia Analytics Using Big Data Stream Computing Platforms

Exploring the Convergence of Big Data and the Internet of Things - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-2947-7.ch015 ◽

2018 ◽

pp. 200-221

Author(s):

Rizwan Patan ◽

Rajasekhara Babu M ◽

Suresh Kallam

Keyword(s):

Big Data ◽

Real Time ◽

Performance Improvement ◽

Data Stream ◽

Real Data ◽

Stream Computing ◽

Time Data ◽

Real Time Data ◽

Computing Platforms ◽

Time And Energy

A Big Data Stream Computing (BDSC) Platform handles real-time data from various applications such as risk management, marketing management and business intelligence. Now a days Internet of Things (IoT) deployment is increasing massively in all the areas. These IoTs engender real-time data for analysis. Existing BDSC is inefficient to handle Real-data stream from IoTs because the data stream from IoTs is unstructured and has inconstant velocity. So, it is challenging to handle such real-time data stream. This work proposes a framework that handles real-time data stream through device control techniques to improve the performance. The frame work includes three layers. First layer deals with Big Data platforms that handles real data streams based on area of importance. Second layer is performance layer which deals with performance issues such as low response time, and energy efficiency. The third layer is meant for Applying developed method on existing BDSC platform. The experimental results have been shown a performance improvement 20%-30% for real time data stream from IoT application.

Download Full-text

An Architecture for the Real-Time Data Stream Monitoring in IoT

Intelligent Systems Reference Library - Multimedia Big Data Computing for IoT Applications ◽

10.1007/978-981-13-8759-3_3 ◽

2019 ◽

pp. 59-100 ◽

Cited By ~ 2

Author(s):

Mario José Diván ◽

María Laura Sánchez Reynoso

Keyword(s):

Real Time ◽

Data Stream ◽

Time Data ◽

The Real ◽

Stream Monitoring ◽

Real Time Data

Download Full-text

An Approach to Handle Overload in Real-Time Data Stream Management System

2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery ◽

10.1109/fskd.2008.17 ◽

2008 ◽

Cited By ~ 1

Author(s):

Li Ma ◽

Xin Li ◽

Yongyan Wang ◽

Hong-an Wang

Keyword(s):

Real Time ◽

Management System ◽

Data Stream ◽

Time Data ◽

Data Stream Management ◽

Stream Management ◽

Real Time Data ◽

Data Stream Management System

Download Full-text

Feature-based high-availability mechanism for quantile tasks in real-time data stream processing

Software Practice and Experience ◽

10.1002/spe.2244 ◽

2013 ◽

Vol 44 (7) ◽

pp. 855-871 ◽

Cited By ~ 10

Author(s):

Weilong Ding ◽

Yanbo Han ◽

Jing Wang ◽

Zhuofeng Zhao

Keyword(s):

Real Time ◽

Data Stream ◽

Stream Processing ◽

High Availability ◽

Time Data ◽

Data Stream Processing ◽

Real Time Data ◽

Feature Based

Download Full-text

Implementation of Real-time Data Stream Processing for Predictive Maintenance of Offshore Plants

Journal of KIISE ◽

10.5626/jok.2015.42.7.840 ◽

2015 ◽

Vol 42 (7) ◽

pp. 840-845

Author(s):

Sung-Soo Kim ◽

Jongho Won

Keyword(s):

Real Time ◽

Data Stream ◽

Stream Processing ◽

Predictive Maintenance ◽

Time Data ◽

Data Stream Processing ◽

Real Time Data ◽

Offshore Plants

Download Full-text