Online Clustering Algorithms and Reinforcement Learning

Author(s):  
Wesam Ashour Barbakh ◽  
Ying Wu ◽  
Colin Fyfe
1999 ◽  
Vol 11 (8) ◽  
pp. 1915-1932 ◽  
Author(s):  
Aristidis Likas

A general technique is proposed for embedding online clustering algorithms based on competitive learning in a reinforcement learning framework. The basic idea is that the clustering system can be viewed as a reinforcement learning system that learns through reinforcements to follow the clustering strategy we wish to implement. In this sense, the reinforcement guided competitive learning (RGCL) algorithm is proposed that constitutes a reinforcement-based adaptation of learning vector quantization (LVQ) with enhanced clustering capabilities. In addition, we suggest extensions of RGCL and LVQ that are characterized by the property of sustained exploration and significantly improve the performance of those algorithms, as indicated by experimental tests on well-known data sets.


2011 ◽  
Vol 5 (4) ◽  
pp. 346-353 ◽  
Author(s):  
Gook-Pil Roh ◽  
Seung-Won Hwang

2021 ◽  
Author(s):  
Tiantian Zhang ◽  
Xueqian Wang ◽  
Bin Liang ◽  
Bo Yuan

The powerful learning ability of deep neural networks enables reinforcement learning (RL) agents to learn competent control policies directly from high-dimensional and continuous environments. In theory, to achieve stable performance, neural networks assume i.i.d. inputs, which unfortunately does no hold in the general RL paradigm where the training data is temporally correlated and non-stationary. This issue may lead to the phenomenon of "catastrophic interference" (a.k.a. "catastrophic forgetting") and the collapse in performance as later training is likely to overwrite and interfer with previously learned good policies. In this paper, we introduce the concept of "context" into the single-task RL and develop a novel scheme, termed as Context Division and Knowledge Distillation (CDaKD) driven RL, to divide all states experienced during training into a series of contexts. Its motivation is to mitigate the challenge of aforementioned catastrophic interference in deep RL, thereby improving the stability and plasticity of RL models. At the heart of CDaKD is a value function, parameterized by a neural network feature extractor shared across all contexts, and a set of output heads, each specializing on an individual context. In CDaKD, we exploit online clustering to achieve context division, and interference is further alleviated by a knowledge distillation regularization term on the output layers for learned contexts. In addition, to effectively obtain the context division in high-dimensional state spaces (e.g., image inputs), we perform clustering in the lower-dimensional representation space of a randomly initialized convolutional encoder, which is fixed throughout training. Our results show that, with various replay memory capacities, CDaKD can consistently improve the performance of existing RL algorithms on classic OpenAI Gym tasks and the more complex high-dimensional Atari tasks, incurring only moderate computational overhead.


Author(s):  
S.B. Muravyov ◽  
V.A. Efimova ◽  
V.V. Shalamov ◽  
A.A. Filchenkov ◽  
I.B. Smetannikov

2018 ◽  
Vol 6 (1) ◽  
pp. 1-21
Author(s):  
Yesta Medya Mahardhika ◽  
Amang Sudarsono ◽  
Ali Ridho Barakbah

Botnet is a malicious software that often occurs at this time, and can perform malicious activities, such as DDoS, spamming, phishing, keylogging, clickfraud, steal personal information and important data. Botnets can replicate themselves without user consent. Several systems of botnet detection has been done by using classification methods. Classification methods have high precision, but it needs more effort to determine appropiate classification model. In this paper, we propose reinforced  approach to detect botnet with On-line Clustering using Reinforcement Learning. Reinforcement Learning involving interaction with the environment and became new paradigm in machine learning. The reinforcement learning will be implemented with some rule detection, because botnet ISCX dataset is categorized as unbalanced dataset which have high range of each number of class. Therefore we implemented Reinforcement Learning to Detect Botnet using Pursuit Reinforcement Competitive Learning (PRCL) with additional rule detection which has reward and punisment rules to achieve the solution. Based on the experimental result, PRCL can detect botnet in real time with high  accuracy (100% for Neris, 99.9% for Rbot, 78% for SMTP_Spam, 80.9% for Nsis, 80.7% for Virut, and 96.0% for Zeus) and fast processing time up to 176 ms. Meanwhile the step of CPU and memory usage which are 78 % and 4.3 GB  for pre-processing, 34% and 3.18 GB for online clustering with PRCL, and  23% and 3.11 GB evaluation. The proposed method is one solution for network administrators to detect botnet which has unpredictable behavior in network traffic.


2005 ◽  
Vol 27 (8) ◽  
pp. 1185-1196 ◽  
Author(s):  
Jun Liu ◽  
J.P.Y. Lee ◽  
Lingjie Li ◽  
Zhi-Quan Luo ◽  
K.M. Wong

2008 ◽  
Vol 18 (03) ◽  
pp. 185-194 ◽  
Author(s):  
WESAM BARBAKH ◽  
COLIN FYFE

We introduce a set of clustering algorithms whose performance function is such that the algorithms overcome one of the weaknesses of K-means, its sensitivity to initial conditions which leads it to converge to a local optimum rather than the global optimum. We derive online learning algorithms and illustrate their convergence to optimal solutions which K-means fails to find. We then extend the algorithm by underpinning it with a latent space which enables a topology preserving mapping to be found. We show visualisation results on some standard data sets.


Author(s):  
Omar A. Ibrahim ◽  
Yiqing Wang ◽  
James M. Keller

Online clustering has attracted attention due to the explosion of ubiquitous continuous sensing. Streaming clustering algorithms need to look for new structures and adapt as the data evolves, such that outliers are detected, and that new emerging clusters are automatically formed. The performance of a streaming clustering algorithm needs to be monitored over time to understand the behavior of the streaming data in terms of new emerging clusters and number of outlier data points. Small datasets with 2 or 3 dimensions can be monitored by plotting the clustering results as data evolves. However, as the size and dimensions of streaming data increase, plotting the clustering result becomes unfeasible. Therefore, incremental internal Validity Indices (iCVIs) could be applied for monitoring the performance of an online clustering algorithm. In this paper, we study the internal incremental Davies-Bouldin (iDB) cluster validity index in the context of big streaming data analysis. Also, we study the effect of large number of samples on the values of the iCVI (iDB). Finally, we propose a way to project streaming data into a lower space for cases where the distance measure does not perform as expected in the high dimensional space.


2013 ◽  
Vol 5 (1) ◽  
pp. 5-15
Author(s):  
Tamás Németh ◽  
Sándor Nagy ◽  
Csanád Imreh

Abstract This paper proposes and evaluates solutions for an online clustering problem and gives a mathematical model for it. The problem at hand occurs often in the fusion of data streams for example in real time locating systems. The goal is to gather as much incoming information from several sources as possible but also minimize the delay before the next processing step can be executed. The key characteristic is that the data is available in a bursty fashion, in the special case of an RTLS according to the locating cycles. After an introduction of the background a general mathematical model for the problem is given, and then two basic algorithms referred to as NWT and CWT are analyzed by the method of competitive analysis. Each turning out to deliver an optimal solution under different constraints. Then an experimental evaluation follows based on a simulation involving the CWT and the algorithm referred to as VWT. The later is giving a configuration free solution for the problem.


Sign in / Sign up

Export Citation Format

Share Document