Recurring concept memory management in data streams: exploiting data stream concept evolution to improve performance and transparency

Author(s):  
Ben Halstead ◽  
Yun Sing Koh ◽  
Patricia Riddle ◽  
Russel Pears ◽  
Mykola Pechenizkiy ◽  
...  
Author(s):  
MOHAMMAD G. DEZFULI ◽  
MOSTAFA S. HAGHJOO

Inherent imprecision of data in many applications motivates us to support uncertainty as a first-class concept. Data stream and probabilistic data have been recently considered noticeably in isolation. However, there are many applications including sensor data management systems and object monitoring systems which need both issues in tandem. Our main contribution is designing a probabilistic data stream management system, called Sarcheshmeh, for continuous querying over probabilistic data streams. Sarcheshmeh supports uncertainty from input data to final query results. In this paper, after reviewing requirements and applications of probabilistic data streams, we present our new data model for probabilistic data streams and define our main logical operators formally. Then, we present query language and physical operators. In addition, we introduce the architecture of Sarcheshmeh and also describe some major challenges like memory management and our floating precision mechanism toward designing a more robust system. Finally, we report evaluation of our system and the effect of floating precision on the tradeoff between accuracy and efficiency.


2020 ◽  
Vol 2 (1) ◽  
pp. 26-37
Author(s):  
Dr. Pasumponpandian

The progress of internet of things at a rapid pace and simultaneous development of the technologies and the processing capabilities has paved way for the development of decentralized systems that are relying on cloud services. Though the decentralized systems are founded on cloud complexities still prevail in transferring all the information’s that are been sensed through the IOT devices to the cloud. This because of the huge streams of information’s gathered by certain applications and the expectation to have a timely response, incurring minimized delay, computing energy and enhanced reliability. So this kind of decentralization has led to the development of middle layer between the cloud and the IOT, and was termed as the Edge layer, meaning bringing down the service of the cloud to the user edge. The paper puts forth the analysis of the data stream processing in the edge layer taking in the complexities involved in the computing the data streams of IOT in an edge layer and puts forth the real time analytics in the edge layer to examine the data streams of the internet of things offering a data- driven insight for parking system in the smart cities.


Author(s):  
Prasanna Lakshmi Kompalli

Data coming from different sources is referred to as data streams. Data stream mining is an online learning technique where each data point must be processed as the data arrives and discarded as the processing is completed. Progress of technologies has resulted in the monitoring these data streams in real time. Data streams has created many new challenges to the researchers in real time. The main features of this type of data are they are fast flowing, large amounts of data which are continuous and growing in nature, and characteristics of data might change in course of time which is termed as concept drift. This chapter addresses the problems in mining data streams with concept drift. Due to which, isolating the correct literature would be a grueling task for researchers and practitioners. This chapter tries to provide a solution as it would be an amalgamation of all techniques used for data stream mining with concept drift.


Author(s):  
Rodrigo Salvador Monteiro ◽  
Geraldo Zimbrão ◽  
Holger Schwarz ◽  
Bernhard Mitschang ◽  
Jano Moreira de Souza

Calendar-based pattern mining aims at identifying patterns on specific calendar partitions. Potential calendar partitions are for example: every Monday, every first working day of each month, every holiday. Providing flexible mining capabilities for calendar-based partitions is especially challenging in a data stream scenario. The calendar partitions of interest are not known a priori and at each point in time only a subset of the detailed data is available. The authors show how a data warehouse approach can be applied to this problem. The data warehouse that keeps track of frequent itemsets holding on different partitions of the original stream has low storage requirements. Nevertheless, it allows to derive sets of patterns that are complete and precise. Furthermore, the authors demonstrate the effectiveness of their approach by a series of experiments.


2013 ◽  
Vol 284-287 ◽  
pp. 3507-3511 ◽  
Author(s):  
Edgar Chia Han Lin

Due to the great progress of computer technology and mature development of network, more and more data are generated and distributed through the network, which is called data streams. During the last couple of years, a number of researchers have paid their attention to data stream management, which is different from the conventional database management. At present, the new type of data management system, called data stream management system (DSMS), has become one of the most popular research areas in data engineering field. Lots of research projects have made great progress in this area. Since the current DSMS does not support queries on sequence data, this project will study the issues related to two types of data. First, we will focus on the content filtering on single-attribute streams, such as sensor data. Second, we will focus on multi-attribute streams, such as video films. We will discuss the related issues such as how to build an efficient index for all queries of different streams and the corresponding query processing mechanisms.


2012 ◽  
Vol 433-440 ◽  
pp. 4457-4462 ◽  
Author(s):  
Jun Shan Tan ◽  
Zhu Fang Kuang ◽  
Guo Gui Yang

The design of synopses structure is an important issue of frequent patterns mining over data stream. A data stream synopses structure FPD-Graph which is based on directed graph is proposed in this paper. The FPD-Graph contains list head node FPDG-Head and list node FPDG-Node. The operations of FPD-Graph consist of insert operation and deletion operation. A frequent pattern mining algorithm DGFPM based on sliding window over data stream is proposed in this paper. The IBM synthesizes data generation which output customers shopping a data are adopted as experiment data. The DGFPM algorithm not only has high precision for mining frequent patterns, but also has low processing time.


Author(s):  
Ronald Stevens ◽  
Trysha Galloway ◽  
Ann Willemson-Dunlap

The information within the neurodynamic data streams of teams engaged in naturalistic decision making was separated into information unique to each team member, the information shared by two or more team members, and team-specific information related to interactions with the task and team members. Most of the team information consisted of the information contained in an individual’s neurodynamic data stream. The information in an individual’s data stream that was shared with another team member was highly variable being 1-60% of the total information in another person’s data stream. From the shared, individual, and team information it becomes possible to assign quantitative values to both the neurodynamics of each team member during the task, as well as the interactions among the members of the team.


2020 ◽  
Vol 8 (4) ◽  
pp. 63-73
Author(s):  
Sikha Bagui ◽  
Katie Jin

This survey performs a thorough enumeration and analysis of existing methods for data stream processing. It is a survey of the challenges facing streaming data. The challenges addressed are preprocessing of streaming data, detection and dealing with concept drifts in streaming data, data reduction in the face of data streams, approximate queries and blocking operations in streaming data.


2016 ◽  
Vol 2 (1) ◽  
pp. 39-52
Author(s):  
G. Holmes ◽  
B. Pfahringer ◽  
R. Kirkby

We present an architecture for data streams based on structures typically found in web cache hierarchies. The main idea is to build a meta level analyser from a number of levels constructed over time from a data stream. We present the general architecture for such a system and an application to classification. This architecture is an instance of the general wrapper idea allowing us to reuse standard batch learning algorithms in an inherently incremental learning environment. By artificially generating data sources we demonstrate that a hierarchy containing a mixture of models is able to adapt over time to the source of the data. In these experiments the hierarchies use an elementary performance based replacement policy and unweighted voting for making classification decisions.


Sign in / Sign up

Export Citation Format

Share Document