Sentiment Classification over Opinionated Data Streams Through Informed Model Adaptation

Author(s):  
Vasileios Iosifidis ◽  
Annina Oelschlager ◽  
Eirini Ntoutsi
Author(s):  
Sahil Garg ◽  
Irina Rish ◽  
Guillermo Cecchi ◽  
Aurelie Lozano

We address the problem of online model adaptation when learning representations from non-stationary data streams. Specifically, we focus here on online dictionary learning (i.e. sparse linear autoencoder), and propose a simple but effective online model selection approach involving “birth” (addition) and “death” (removal) of hidden units representing dictionary elements, in response to changing inputs; we draw inspiration from the adult neurogenesis phenomenon in the dentate gyrus of the hippocampus, known to be associated with better adaptation to new environments. Empirical evaluation on real-life datasets (images and text), as well as on synthetic data, demonstrates that the proposed approach can considerably outperform the state-of-art non-adaptive online sparse coding of [Mairal et al., 2009] in the presence of non-stationary data. Moreover, we identify certain data- and model properties associated with such improvements.


2021 ◽  
Vol 10 (6) ◽  
pp. 3361-3368
Author(s):  
Ibnu Daqiqil Id ◽  
Pardomuan Robinson Sihombing ◽  
Supratman Zakir

When predicting data streams, changes in data distribution may decrease model accuracy over time, thereby making the model obsolete. This phenomenon is known as concept drift. Detecting concept drifts and then adapting to them are critical operations to maintain model performance. However, model adaptation can only be made if labeled data is available. Labeling data is both costly and time-consuming because it has to be done by humans. Only part of the data can be labeled in the data stream because the data size is massive and appears at high speed. To solve these problems simultaneously, we apply a technique to update the model by employing both labeled and unlabeled instances to do so. The experiment results show that our proposed method can adapt to the concept drift with pseudo-labels and maintain its accuracy even though label availability is drastically reduced from 95% to 5%. The proposed method also has the highest overall accuracy and outperforms other methods in 5 of 10 datasets.


Author(s):  
LAKSHMI PRANEETHA

Now-a-days data streams or information streams are gigantic and quick changing. The usage of information streams can fluctuate from basic logical, scientific applications to vital business and money related ones. The useful information is abstracted from the stream and represented in the form of micro-clusters in the online phase. In offline phase micro-clusters are merged to form the macro clusters. DBSTREAM technique captures the density between micro-clusters by means of a shared density graph in the online phase. The density data in this graph is then used in reclustering for improving the formation of clusters but DBSTREAM takes more time in handling the corrupted data points In this paper an early pruning algorithm is used before pre-processing of information and a bloom filter is used for recognizing the corrupted information. Our experiments on real time datasets shows that using this approach improves the efficiency of macro-clusters by 90% and increases the generation of more number of micro-clusters within in a short time.


2018 ◽  
Vol 4 (26) ◽  
pp. 5534-5538
Author(s):  
Semra AKTAŞ POLAT

Sign in / Sign up

Export Citation Format

Share Document