Clustering over uncertain data stream

Author(s):  
Li Tu ◽  
Peng Cui
Keyword(s):  
2019 ◽  
Vol 1189 ◽  
pp. 012025
Author(s):  
A Makhmutova ◽  
I Anikin

2016 ◽  
Vol 9 (9) ◽  
pp. 83-96
Author(s):  
Tang Xianghong ◽  
Yang Quanwei ◽  
Zheng Yang

2016 ◽  
Vol 13 (10) ◽  
pp. 7519-7525 ◽  
Author(s):  
Zhang Xing ◽  
Wang MeiLi ◽  
Zhang Yang ◽  
Ning Jifeng

To build a classifier for uncertain data stream, an Ensemble of Uncertain Decision Tree Algorithm (EDTU) is proposed. Firstly, the decision tree algorithm for uncertain data (DTU) was improved by changing the calculation method of its information gain and improving the efficiency of the algorithm so that it can process the high-speed flow of data streams; then, based on this basic classifier, dynamic classifier ensemble algorithm was used, and the classifiers presenting effective classification were selected to constitute ensemble classifiers. Experimental results on SEA and Forest Covertype Datasets demonstrate that the proposed EDTU algorithm is efficient in classifying data stream with uncertain attribute, and the performance is stable under the different parameters.


2021 ◽  
Vol 16 ◽  
pp. 261-269
Author(s):  
Raja Azhan Syah Raja Wahab ◽  
Siti Nurulain Mohd Rum ◽  
Hamidah Ibrahim ◽  
Fatimah Sidi ◽  
Iskandar Ishak

The data stream is a series of data generated at sequential time from different sources. Processing such data is very important in many contemporary applications such as sensor networks, RFID technology, mobile computing and many more. The huge amount data generated and frequent changes in a short time makes the conventional processing methods insufficient. The Sliding Window Model (SWM) was introduced by Datar et. al to handle this problem. Avoiding multiple scans of the whole data sets, optimizing memory usage, and processing only the most recent tuple are the main challenges. The number of possible world instances grows exponentially in uncertain data and it is highly difficult to comprehend what it takes to meet Top-k query processing in the shortest amount of time. Following the generation of rules and the probability theory of this model, a framework was anticipated to sustain top-k processing algorithm over the SWM approach until the candidates expired. Based on the literature review study, none of the existing work have been made to tackle the issue arises from the top-k query processing of the possible world instance of the uncertain data streams within the SWM. The major issue resulted from these scenarios need to be addressed especially in the computation redundancy area that contributed to the increases of computational cost within the SWM. Therefore, the main objective of this research work is to propose the top-k query processing methods over uncertain data streams in SWM utilizing the score and the Possible World (PW) setting. In this study, a novel expiration and object indexing method is introduced to address the computational redundancy issues. We believed the proposed method can reduce computational costs and by managing insertion and exit policy on the right tuple candidates within a specified window frame. This research work will contribute to the area of computational query processing.


Author(s):  
MOHAMMAD G. DEZFULI ◽  
MOSTAFA S. HAGHJOO

Inherent imprecision of data in many applications motivates us to support uncertainty as a first-class concept. Data stream and probabilistic data have been recently considered noticeably in isolation. However, there are many applications including sensor data management systems and object monitoring systems which need both issues in tandem. Our main contribution is designing a probabilistic data stream management system, called Sarcheshmeh, for continuous querying over probabilistic data streams. Sarcheshmeh supports uncertainty from input data to final query results. In this paper, after reviewing requirements and applications of probabilistic data streams, we present our new data model for probabilistic data streams and define our main logical operators formally. Then, we present query language and physical operators. In addition, we introduce the architecture of Sarcheshmeh and also describe some major challenges like memory management and our floating precision mechanism toward designing a more robust system. Finally, we report evaluation of our system and the effect of floating precision on the tradeoff between accuracy and efficiency.


2013 ◽  
Vol 380-384 ◽  
pp. 1529-1532
Author(s):  
Shuang Zhang ◽  
Shi Xiong Zhang

This paper presents a probabilistic data stream clustering method P-Stream. An effective clustering algorithm called P-Stream for probabilistic data stream is developed in this paper for the first time. For the uncertain tuples in the data stream, the concepts of strong cluster, transitional clusters and weak cluster are proposed in the P-Stream. With these concepts, an effective strategy of choosing candidate cluster is designed, which can find the sound cluster for every continuously arriving data point. In this paper, we systematically defined the dataspace, the uncertain data, and proposed a updated algorithm of queries on uncertain data based on Effective Clustering Algorithm.


Sign in / Sign up

Export Citation Format

Share Document