scholarly journals Delayed labelling evaluation for data streams

2019 ◽  
Vol 34 (5) ◽  
pp. 1237-1266 ◽  
Author(s):  
Maciej Grzenda ◽  
Heitor Murilo Gomes ◽  
Albert Bifet

AbstractA large portion of the stream mining studies on classification rely on the availability of true labels immediately after making predictions. This approach is well exemplified by the test-then-train evaluation, where predictions immediately precede true label arrival. However, in many real scenarios, labels arrive with non-negligible latency. This raises the question of how to evaluate classifiers trained in such circumstances. This question is of particular importance when stream mining models are expected to refine their predictions between acquiring instance data and receiving its true label. In this work, we propose a novel evaluation methodology for data streams when verification latency takes place, namely continuous re-evaluation. It is applied to reference data streams and it is used to differentiate between stream mining techniques in terms of their ability to refine predictions based on newly arriving instances. Our study points out, discusses and shows empirically the importance of considering the delay of instance labels when evaluating classifiers for data streams.

Author(s):  
Prasanna Lakshmi Kompalli

Data coming from different sources is referred to as data streams. Data stream mining is an online learning technique where each data point must be processed as the data arrives and discarded as the processing is completed. Progress of technologies has resulted in the monitoring these data streams in real time. Data streams has created many new challenges to the researchers in real time. The main features of this type of data are they are fast flowing, large amounts of data which are continuous and growing in nature, and characteristics of data might change in course of time which is termed as concept drift. This chapter addresses the problems in mining data streams with concept drift. Due to which, isolating the correct literature would be a grueling task for researchers and practitioners. This chapter tries to provide a solution as it would be an amalgamation of all techniques used for data stream mining with concept drift.


Author(s):  
Prasanna Lakshmi Kompalli

In recent years, advancement in technologies has made it possible for most of the present-day organizations to store and record large streams of data. Such data sets, which continuously and rapidly grow over time, are referred to as data streams. Mining of such data streams is a unique opportunity and also a challenging task. Data stream mining is a process of gaining knowledge from continuous and rapid records of data. Due to increased streaming information, data stream mining has attracted the research community in the recent past. There is voluminous literature that has been published in this domain over the past few years. Due to this, isolating the correct study would be grueling task for researchers and practitioners. While addressing a real-world problem, it would be difficult to find relevant information as it would be hidden in data streams. This chapter tries to provide solution as it is an amalgamation of all techniques used for data stream mining.


2020 ◽  
Author(s):  
Yuhao Zhao

Abstract With the advancement of network technology and large-scale computing, distributed data streams have been widely used in the application of financial risk analysis. However, while data mining reveals financial models, it also increasingly poses a threat to privacy. Therefore, how to prevent privacy leakage during the efficient mining process poses new challenges to the data mining technology. This article is mainly aimed at the current privacy data leakage in financial data mining, combined with existing data mining technology to study data mining and privacy protection. First, a data mining model for dual privacy protection is defined, which can better meet the characteristics of distributed data streams while achieving privacy protection effects. Secondly, a privacy-oriented data stream mining algorithm is proposed, which uses random interference technology to effectively protect the original sensitive data. Finally, the analysis and discussion of the algorithm in this paper through simulation experiments show that the algorithm is feasible and effective, and can better adapt to the distributed data flow distribution and dynamic characteristics, while achieving better privacy protection effects, effectively Reduced communication load.


2020 ◽  
Author(s):  
Yuhao Zhao

Abstract With the advancement of network technology and large-scale computing, distributed data streams have been widely used in the application of financial risk analysis. However, while data mining reveals financial models, it also increasingly poses a threat to privacy. Therefore, how to prevent privacy leakage during the efficient mining process poses new challenges to the data mining technology. This article is mainly aimed at the current privacy data leakage in financial data mining, combined with existing data mining technology to study data mining and privacy protection. First, a data mining model for dual privacy protection is defined, which can better meet the characteristics of distributed data streams while achieving privacy protection effects. Secondly, a privacy-oriented data stream mining algorithm is proposed, which uses random interference technology to effectively protect the original sensitive data. Finally, the analysis and discussion of the algorithm in this paper through simulation experiments show that the algorithm is feasible and effective, and can better adapt to the distributed data flow distribution and dynamic characteristics, while achieving better privacy protection effects, effectively Reduced communication load.


Author(s):  
Prasanna Lakshmi Kompalli

In recent years, advancement in technologies has made it possible for most of the present-day organizations to store and record large streams of data. Such data sets which continuously and rapidly grow over time are referred to as data streams. Mining of such data streams is a unique opportunity and also a challenging task. Data stream mining is a process of gaining knowledge from continuous and rapid records of data. Due to increased streaming information, data stream mining has attracted the research community in the recent past. There is voluminous of literature which has been published in this domain over the past few years. Due to this, isolating the correct literature would be a grueling task for researchers and practitioners. While addressing a real-world problem, it would be more difficult to find relevant information as it would be hidden in data streams. This chapter tries to provide solution as it would be an amalgamation of all techniques used for data stream mining.


SCITECH Nepal ◽  
2019 ◽  
Vol 14 (1) ◽  
pp. 36-43
Author(s):  
Rojina Deuja ◽  
Krishna Bikram Shah

Data stream mining is one of the realms gaining upper hand over traditional data mining methods. Transfinite volumes of data termed as Data Streams are often generated by Internet traffic, Communication networks, On-line bank or ATM transactions etc. The streams are dynamic and ever-shifting and need to be analysed online as they are obtained. Social media is one of the notable sources of such data streams. While social media streaming has received a lot of attention over the past decade, the ever-expanding streams of data presents huge challenges for learning and maintaining control. Dealing with billions of user’s data measured in pet bytes is a demanding task in itself. It is indeed a challenge to mine such dynamic data from social networks in an uninterrupted and competent way. This paper is purposed to introduce social data streams and the mining techniques involved in processing them. We analyse the most recent trends in social media data stream mining to translate to the detailed study of the matter. We also review innovative implementations of social media stream mining that are currently prevalent.


Author(s):  
Asha P. V. ◽  
Anju M. Sukumar

Data stream is a continuous sequence of data generated from various sources and continuously transferred from source to target. Streaming data needs to be processed without having access to all of the data. Some of the sources generating data streams are social networks, geospatial services, weather monitoring, e-commerce purchases, etc. Data stream mining is the process of acquiring knowledge structures from the continuously arriving data. Clustering is an unsupervised machine learning technique that can be used to extract knowledge patterns from the data stream. The mining of streaming data is challenging because the data is in huge amounts and arriving continuously. So the traditional algorithms are not suitable for mining data streams. Data stream mining requires fast processing algorithms using a single scan and a limited amount of memory. The micro clustering has a good role in this. In itself, density based micro clustering has its own unique place in data stream mining. This paper presents a survey on different data clustering algorithms, realizes and empowers the use of density-based micro clusters.


2017 ◽  
Vol 01 (01) ◽  
pp. 1630011
Author(s):  
Cem Tekin ◽  
Mihaela van der Schaar

As the world becomes more connected and instrumented, high dimensional, heterogeneous and time-varying data streams are collected and need to be analyzed on the fly to extract the actionable intelligence from the data streams and make timely decisions based on this knowledge. This requires that appropriate classifiers are invoked to process the incoming streams and find the relevant knowledge. Thus, a key challenge becomes choosing online, at run-time, which classifier should be deployed to make the best possible predictions on the incoming streams. In this paper, we survey a class of methods capable to perform online learning in stream-based semantic computing tasks: multi-armed bandits (MABs). Adopting MABs for stream mining poses, numerous new challenges requires many new innovations. Most importantly, the MABs will need to explicitly consider and track online the time-varying characteristics of the data streams and to learn fast what is the relevant information out of the vast, heterogeneous and possibly highly dimensional data streams. In this paper, we discuss contextual MAB methods, which use similarities in context (meta-data) information to make decisions, and discuss their advantages when applied to stream mining for semantic computing. These methods can be adapted to discover in real-time the relevant contexts guiding the stream mining decisions, and tract the best classifier in presence of concept drift. Moreover, we also discuss how stream mining of multiple data sources can be performed by deploying cooperative MAB solutions and ensemble learning. We conclude the paper by discussing the numerous other advantages of MABs that will benefit semantic computing applications.


Sign in / Sign up

Export Citation Format

Share Document