CONTINUOUS MULTIPLE OLAP QUERIES FOR DATA STREAMS

2012 ◽  
Vol 21 (02) ◽  
pp. 141-164 ◽  
Author(s):  
N. PARIMALA ◽  
S. BHAWNA

Querying data stream data continuously has been addressed mostly as transactional queries with some attempts at analytical processing. But, in most of the proposals a single query is executed for a given window of data. In this paper, we propose to continuously execute multiple related OLAP queries (CMOLAP) for the data chosen from a data stream. The chosen data defines the context. The context data is temporarily stored in the form of a multidimensional cube to perform OLAP operations. Three sets of operations are defined. The first converts the data in a stream to a context, the second allows altering the context and the third set is analytical which operates on the context and produces an output stream. More than one related analytic operation can be performed for the data in a context. The sequence of operations, referred to as context queries, is continuously executed for a time-based window. As a result it is possible to do enhanced related analysis of data. We have also developed a GUI interface where the queries can be expressed in a user friendly manner.

Author(s):  
Parimala N.

A data stream is a real-time continuous sequence that may be comprised of data or events. Data stream processing is different from static data processing which resides in a database. The data stream data is seen only once. It is too voluminous to store statically. A small portion of data called a window is considered at a time for querying, computing aggregates, etc. In this chapter, the authors explain the different types of window movement over incoming data. A query on a stream is repeatedly executed on the new data created by the movement of the window. SQL extensions to handle continuous queries is addressed in this chapter. Streams that contain transactional data as well as those that contain events are considered.


2020 ◽  
Vol 1 (1) ◽  
pp. 1-21
Author(s):  
Devesh Kumar Lal ◽  
Ugrasen Suman

The processing of real-time data streams is complex with large number of volume and variety. The volume and variety of data streams enhances a number of processing units to run in real time. The required number of processing units used for processing data streams are lowered by using a windowing mechanism. Therefore, the appropriate size of window selection is vital for stream data processing. The coarse size window will directly affect the overall processing time. On the other hand, a finely sized window has to deal with an increased number of management costs. In order to manage such streams of data, we have proposed a SBASH architecture, which can be helpful for determining a unipartite size of a sheer window. The sheer window reduces the overall latency of data stream processing by a certain extent. The time complexity to process such sheer window is equivalent to w log n w. These windows are allocated and retrieved in a stack-based manner, where stacks ≥ n, which is helpful in reducing the number of comparisons made during retrieval.


Entropy ◽  
2021 ◽  
Vol 23 (7) ◽  
pp. 859
Author(s):  
Abdulaziz O. AlQabbany ◽  
Aqil M. Azmi

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.


2020 ◽  
Vol 2 (1) ◽  
pp. 26-37
Author(s):  
Dr. Pasumponpandian

The progress of internet of things at a rapid pace and simultaneous development of the technologies and the processing capabilities has paved way for the development of decentralized systems that are relying on cloud services. Though the decentralized systems are founded on cloud complexities still prevail in transferring all the information’s that are been sensed through the IOT devices to the cloud. This because of the huge streams of information’s gathered by certain applications and the expectation to have a timely response, incurring minimized delay, computing energy and enhanced reliability. So this kind of decentralization has led to the development of middle layer between the cloud and the IOT, and was termed as the Edge layer, meaning bringing down the service of the cloud to the user edge. The paper puts forth the analysis of the data stream processing in the edge layer taking in the complexities involved in the computing the data streams of IOT in an edge layer and puts forth the real time analytics in the edge layer to examine the data streams of the internet of things offering a data- driven insight for parking system in the smart cities.


Author(s):  
Prasanna Lakshmi Kompalli

Data coming from different sources is referred to as data streams. Data stream mining is an online learning technique where each data point must be processed as the data arrives and discarded as the processing is completed. Progress of technologies has resulted in the monitoring these data streams in real time. Data streams has created many new challenges to the researchers in real time. The main features of this type of data are they are fast flowing, large amounts of data which are continuous and growing in nature, and characteristics of data might change in course of time which is termed as concept drift. This chapter addresses the problems in mining data streams with concept drift. Due to which, isolating the correct literature would be a grueling task for researchers and practitioners. This chapter tries to provide a solution as it would be an amalgamation of all techniques used for data stream mining with concept drift.


Author(s):  
Rodrigo Salvador Monteiro ◽  
Geraldo Zimbrão ◽  
Holger Schwarz ◽  
Bernhard Mitschang ◽  
Jano Moreira de Souza

Calendar-based pattern mining aims at identifying patterns on specific calendar partitions. Potential calendar partitions are for example: every Monday, every first working day of each month, every holiday. Providing flexible mining capabilities for calendar-based partitions is especially challenging in a data stream scenario. The calendar partitions of interest are not known a priori and at each point in time only a subset of the detailed data is available. The authors show how a data warehouse approach can be applied to this problem. The data warehouse that keeps track of frequent itemsets holding on different partitions of the original stream has low storage requirements. Nevertheless, it allows to derive sets of patterns that are complete and precise. Furthermore, the authors demonstrate the effectiveness of their approach by a series of experiments.


2011 ◽  
Vol 24 (3) ◽  
pp. 45-60
Author(s):  
Ben Ali ◽  
Samar Mouakket

E-business domains have been considered killer domains for different data analysis techniques. Most researchers have examined data mining (DM) techniques to analyze the databases behind E-business websites. DM has shown interesting results, but this technique presents some restrictions concerning the content of the database and the level of expertise of the users interpreting the results. In this paper, the authors show that successful and more sophisticated results can be obtained using other analysis techniques, such as Online Analytical Processing (OLAP) and Spatial OLAP (SOLAP). Thus, the authors propose a framework that fuses or integrates OLAP with SOLAP techniques in an E-business domain to perform easier and more user-friendly data analysis (non-spatial and spatial) and improve decision making. In addition, the authors apply the framework to an E-business website related to online job seekers in the United Arab Emirates (UAE). The results can be used effectively by decision makers to make crucial decisions in the job market of the UAE.


Author(s):  
Muhammad Abdul Tawab Khalil ◽  
Saifullah Jan ◽  
Wajid Ali ◽  
Adnan Khan

Pregnancy, as a matter of fact, is always physically and emotionally challenging for women. Rapid physical changes with baby's growth in the womb exposes the mother to severe mood swings from short spell of merriment to long spells of anxiety and depression about upcoming child's health, its wellbeing, and so on. Most of the third world countries with their struggling economies have patriarchal social fabric, a fact that makes it worse for women of these societies to healthily tackle or seek help during gestation. The main goal of the proposed application, MothersCare, is to help the expecting mothers when they need it most. It will help them choose the right physician and request appointments from the comfort of homes, barring cumbersome wait for turn in long queues in rush hours for appointments with doctors at hospitals. This app is absolutely user-friendly in terms of simplicity of use and wide spectrum of maternal healthcare services it offers.


2013 ◽  
Vol 284-287 ◽  
pp. 3507-3511 ◽  
Author(s):  
Edgar Chia Han Lin

Due to the great progress of computer technology and mature development of network, more and more data are generated and distributed through the network, which is called data streams. During the last couple of years, a number of researchers have paid their attention to data stream management, which is different from the conventional database management. At present, the new type of data management system, called data stream management system (DSMS), has become one of the most popular research areas in data engineering field. Lots of research projects have made great progress in this area. Since the current DSMS does not support queries on sequence data, this project will study the issues related to two types of data. First, we will focus on the content filtering on single-attribute streams, such as sensor data. Second, we will focus on multi-attribute streams, such as video films. We will discuss the related issues such as how to build an efficient index for all queries of different streams and the corresponding query processing mechanisms.


Sign in / Sign up

Export Citation Format

Share Document