Toward a Grid-Based Zero-Latency Data Warehousing Implementation for Continuous Data Streams Processing

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch048 ◽

2008 ◽

pp. 755-786

Author(s):

Tho Manh Nguyen ◽

Peter Brezany ◽

A. Min Tjoa ◽

Edgar Weippl

Keyword(s):

Knowledge Base ◽

Data Streams ◽

Data Stream ◽

Resource Limitation ◽

Large Data ◽

High Volume ◽

Continuous Data ◽

Latency Data ◽

Dynamic Collaboration ◽

Grid Based

Continuous data streams are information sources in which data arrives in high volume in unpredictable rapid bursts. Processing data streams is a challenging task due to (1) the problem of random access to fast and large data streams using present storage technologies and (2) the exact answers from data streams often being too expensive. A framework of building a Grid-based Zero-Latency Data Stream Warehouse (GZLDSWH) to overcome the resource limitation issues in data stream processing without using approximation approaches is specified. The GZLDSWH is built upon a set of Open Grid Service Infrastructure (OGSI)-based services and Globus Toolkit 3 (GT3) with the capability of capturing and storing continuous data streams, performing analytical processing, and reacting autonomously in near real time to some kinds of events based on a well-established knowledge base. The requirements of a GZLDSWH, its Grid-based conceptual architecture, and the operations of its service are described in this paper. Furthermore, several challenges and issues in building a GZLDSWH, such as the Dynamic Collaboration Model between the Grid services, the Analytical Model, and the Design and Evaluation aspects of the Knowledge Base Rules are discussed and investigated.

Download Full-text

Toward a Grid-Based Zero-Latency Data Warehousing Implementation for Continuous Data Streams Processing

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2005100102 ◽

2005 ◽

Vol 1 (4) ◽

pp. 22-55 ◽

Cited By ~ 11

Author(s):

Tho Manh Nguyen ◽

Peter Brezany ◽

A. Min Tjoa ◽

Edgar Weippl

Keyword(s):

Data Streams ◽

Data Warehousing ◽

Continuous Data ◽

Latency Data ◽

Grid Based

Download Full-text

Resolving longitudinal amplitude and phase information of two continuous data streams for high-speed and real-time processing

Advances in Radio Science ◽

10.5194/ars-7-133-2009 ◽

2009 ◽

Vol 7 ◽

pp. 133-137 ◽

Cited By ~ 1

Author(s):

A. Guntoro ◽

M. Glesner

Keyword(s):

Data Streams ◽

Data Stream ◽

High Speed ◽

Hardware Implementation ◽

Software Implementation ◽

Phase Detector ◽

Continuous Data ◽

Process Data ◽

Real Time Processing ◽

High Speed Data

Abstract. Although there is an increase of performance in DSPs, due to its nature of execution a DSP could not perform high-speed data processing on a continuous data stream. In this paper we discuss the hardware implementation of the amplitude and phase detector and the validation block on a FPGA. Contrary to the software implementation which can only process data stream as high as 1.5 MHz, the hardware approach is 225 times faster and introduces much less latency.

Download Full-text

DETECTION AND CLASSIFICATION OF CHANGES IN EVOLVING DATA STREAMS

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622006002179 ◽

2006 ◽

Vol 05 (04) ◽

pp. 659-670 ◽

Cited By ~ 26

Author(s):

MOHAMED MEDHAT GABER ◽

PHILIP S. YU

Keyword(s):

Data Streams ◽

Data Stream ◽

Weather Conditions ◽

High Volume ◽

Streaming Data ◽

Wide Range ◽

Change Characteristics ◽

History Of ◽

Scientific Phenomena

Data stream mining has attracted considerable attention over the past few years owing to the significance of its applications. Streaming data is often evolving over time. Capturing changes could be used for detecting an event or a phenomenon in various applications. Weather conditions, economical changes, astronomical, and scientific phenomena are among a wide range of applications. Because of the high volume and speed of data streams, it is computationally hard to capture these changes from raw data in real-time. In this paper, we propose a novel algorithm that we term as STREAM-DETECT to capture these changes in data stream distribution and/or domain using clustering result deviation. STREAM-DETECT is followed by a process of offline classification CHANGE-CLASS. This classification is concerned with the association of the history of change characteristics with the observed event or phenomenon. Experimental results show the efficiency of the proposed framework in both detecting the changes and classification accuracy.

Download Full-text

A State-of-the-Art Review of Data Stream Anonymization Schemes

Information Security in Diverse Computing Environments - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-4666-6158-5.ch003 ◽

2014 ◽

pp. 24-50 ◽

Cited By ~ 1

Author(s):

Aderonke B. Sakpere ◽

Anne V. D. M. Kayem

Keyword(s):

Real Time ◽

Data Streams ◽

Data Stream ◽

Data Privacy ◽

Streaming Data ◽

Continuous Data ◽

Privacy Concerns ◽

Useful Knowledge ◽

Transient Nature ◽

Address Data

Streaming data emerges from different electronic sources and needs to be processed in real time with minimal delay. Data streams can generate hidden and useful knowledge patterns when mined and analyzed. In spite of these benefits, the issue of privacy needs to be addressed before streaming data is released for mining and analysis purposes. In order to address data privacy concerns, several techniques have emerged. K-anonymity has received considerable attention over other privacy preserving techniques because of its simplicity and efficiency in protecting data. Yet, k-anonymity cannot be directly applied on continuous data (data streams) because of its transient nature. In this chapter, the authors discuss the challenges faced by k-anonymity algorithms in enforcing privacy on data streams and review existing privacy techniques for handling data streams.

Download Full-text

Logistic Regression and Data Analysis on Privacy Methods on Data Streams

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.12.16117 ◽

2018 ◽

Vol 7 (3.12) ◽

pp. 411

Author(s):

P Chandrakanth ◽

Anbarasi M.S

Keyword(s):

Data Streams ◽

Data Stream ◽

Data Privacy ◽

Large Data ◽

Ensemble Classifier ◽

Large Data Sets ◽

Data Sets ◽

High Dimension Data ◽

Static Data ◽

Problem Data

The problem data privacy in streams is completely put in a myopic view by hitherto researchers. Research and experimentations have been well fortified on static data, in which predominantly spelled easy with approaches based on perturbation using random data values. Approaches based on large data sets and high dimension data sets are not adequate consequences. By using the phenomenon of autocorrelation of multivariable streams and their leveraging structures, identifying the suitable areas to add noise maximally preserves privacy and in a irreversible manner. Drift checking and ensemble classifier building is the basic requirements for privacy preserving data stream, which makes clear in experimentation with the support of sensitivity analysis. In this paper we present the results of experimentation at all the stages.

Download Full-text

Data Stream Classification Based on the Gamma Classifier

Mathematical Problems in Engineering ◽

10.1155/2015/939175 ◽

2015 ◽

Vol 2015 ◽

pp. 1-17 ◽

Cited By ~ 7

Author(s):

Abril Valeria Uriarte-Arcia ◽

Itzamá López-Yáñez ◽

Cornelio Yáñez-Márquez ◽

João Gama ◽

Oscar Camacho-Nieto

Keyword(s):

Data Streams ◽

Time Management ◽

Data Stream ◽

High Speed ◽

Concept Drift ◽

Synthetic Data ◽

Continuous Data ◽

Data Generation ◽

Underlying Distribution ◽

Data Stream Classification

The ever increasing data generation confronts us with the problem of handling online massive amounts of information. One of the biggest challenges is how to extract valuable information from these massive continuous data streams during single scanning. In a data stream context, data arrive continuously at high speed; therefore the algorithms developed to address this context must be efficient regarding memory and time management and capable of detecting changes over time in the underlying distribution that generated the data. This work describes a novel method for the task of pattern classification over a continuous data stream based on an associative model. The proposed method is based on the Gamma classifier, which is inspired by the Alpha-Beta associative memories, which are both supervised pattern recognition models. The proposed method is capable of handling the space and time constrain inherent to data stream scenarios. The Data Streaming Gamma classifier (DS-Gamma classifier) implements a sliding window approach to provide concept drift detection and a forgetting mechanism. In order to test the classifier, several experiments were performed using different data stream scenarios with real and synthetic data streams. The experimental results show that the method exhibits competitive performance when compared to other state-of-the-art algorithms.

Download Full-text

Towards service collaboration model in grid-based zero latency data stream warehouse (GZLDSWH)

IEEE International Conference onServices Computing, 2004. (SCC 2004). Proceedings. 2004 ◽

10.1109/scc.2004.1358025 ◽

2004 ◽

Cited By ~ 1

Author(s):

Tho Manh Nguyen ◽

A.M. Tjoa ◽

G. Kickinger ◽

P. Brezany

Keyword(s):

Data Stream ◽

Latency Data ◽

Collaboration Model ◽

Service Collaboration ◽

Grid Based

Download Full-text

Fault tolerant state management for high-volume low-latency data stream workloads

2014 International Conference on Data Science & Engineering (ICDSE) ◽

10.1109/icdse.2014.6974606 ◽

2014 ◽

Author(s):

K. B. Muralidharan ◽

G. Santhosh Kumar ◽

M. Bhasi

Keyword(s):

Data Stream ◽

Fault Tolerant ◽

High Volume ◽

Low Latency ◽

Latency Data ◽

State Management ◽

Tolerant State

Download Full-text

Stream-Based Lossless Data Compression

10.1007/978-981-16-4095-7_16 ◽

2021 ◽

pp. 391-410

Author(s):

Shinichi Yamagiwa

Keyword(s):

Data Compression ◽

Data Streams ◽

Data Stream ◽

Optimization Techniques ◽

Continuous Data ◽

Massive Data ◽

Compression Mechanism ◽

Lossless Data Compression ◽

Optimal Implementation

AbstractIn this chapter, we introduce aspects of applying data-compression techniques. First, we study the background of recent communication data paths. The focus of this chapter is a fast lossless data-compression mechanism that handles data streams completely. A data stream comprises continuous data with no termination of the massive data generated by sources such as movies and sensors. In this chapter, we introduce LCA-SLT and LCA-DLT, which accept the data streams, as well as several implementations of these stream-based compression techniques. We also show optimization techniques for optimal implementation in hardware.

Download Full-text