Unsupervised Learning and Clustering Algorithms

Unsupervised Learning: Using Clustering Algorithms to Detect Peer to Peer Botnet Flows

Advances in Intelligent Systems and Computing - Security with Intelligent Computing and Big-Data Services 2019 ◽

10.1007/978-3-030-46828-6_26 ◽

2020 ◽

pp. 299-311

Author(s):

Andrea E. Medina Paredes ◽

Hung-Min Sun

Keyword(s):

Unsupervised Learning ◽

Clustering Algorithms ◽

Peer To Peer

Download Full-text

Proficient Normalised Fuzzy K-Means With Initial Centroids Methodology

International Journal of Knowledge Discovery in Bioinformatics ◽

10.4018/ijkdb.2018010104 ◽

2018 ◽

Vol 8 (1) ◽

pp. 42-59

Author(s):

Deepali Virmani ◽

Nikita Jain ◽

Ketan Parikh ◽

Shefali Upadhyaya ◽

Abhishek Srivastav

Keyword(s):

Unsupervised Learning ◽

Real World ◽

Learning Algorithms ◽

Clustering Algorithms ◽

Real World Data ◽

World Data ◽

Clustering Problem ◽

Time Required ◽

Selection Of

This article describes how data is relevant and if it can be organized, linked with other data and grouped into a cluster. Clustering is the process of organizing a given set of objects into a set of disjoint groups called clusters. There are a number of clustering algorithms like k-means, k-medoids, normalized k-means, etc. So, the focus remains on efficiency and accuracy of algorithms. The focus is also on the time it takes for clustering and reducing overlapping between clusters. K-means is one of the simplest unsupervised learning algorithms that solves the well-known clustering problem. The k-means algorithm partitions data into K clusters and the centroids are randomly chosen resulting numeric values prohibits it from being used to cluster real world data containing categorical values. Poor selection of initial centroids can result in poor clustering. This article deals with a proposed algorithm which is a variant of k-means with some modifications resulting in better clustering, reduced overlapping and lesser time required for clustering by selecting initial centres in k-means and normalizing the data.

Download Full-text

CLUSTERING-BASED NETWORK INTRUSION DETECTION

International Journal of Reliability Quality and Safety Engineering ◽

10.1142/s0218539307002568 ◽

2007 ◽

Vol 14 (02) ◽

pp. 169-187 ◽

Cited By ~ 55

Author(s):

SHI ZHONG ◽

TAGHI M. KHOSHGOFTAAR ◽

NAEEM SELIYA

Keyword(s):

Data Mining ◽

Network Security ◽

Intrusion Detection ◽

Unsupervised Learning ◽

Network Traffic ◽

Clustering Algorithms ◽

Network Intrusion Detection ◽

Learning Methods ◽

High Detection Rate ◽

Network Intrusion

Recently data mining methods have gained importance in addressing network security issues, including network intrusion detection — a challenging task in network security. Intrusion detection systems aim to identify attacks with a high detection rate and a low false alarm rate. Classification-based data mining models for intrusion detection are often ineffective in dealing with dynamic changes in intrusion patterns and characteristics. Consequently, unsupervised learning methods have been given a closer look for network intrusion detection. We investigate multiple centroid-based unsupervised clustering algorithms for intrusion detection, and propose a simple yet effective self-labeling heuristic for detecting attack and normal clusters of network traffic audit data. The clustering algorithms investigated include, k-means, Mixture-Of-Spherical Gaussians, Self-Organizing Map, and Neural-Gas. The network traffic datasets provided by the DARPA 1998 offline intrusion detection project are used in our empirical investigation, which demonstrates the feasibility and promise of unsupervised learning methods for network intrusion detection. In addition, a comparative analysis shows the advantage of clustering-based methods over supervised classification techniques in identifying new or unseen attack types.

Download Full-text

Unsupervised learning of Swiss population spatial distribution

PLoS ONE ◽

10.1371/journal.pone.0246529 ◽

2021 ◽

Vol 16 (2) ◽

pp. e0246529

Author(s):

Mikhail Kanevski

Keyword(s):

Spatial Distribution ◽

Unsupervised Learning ◽

Population Distribution ◽

Expert Knowledge ◽

Clustering Algorithms ◽

Growth Curves ◽

Feature Space ◽

Point Patterns ◽

Swiss Population ◽

Spatially Distributed

The paper deals with the analysis of spatial distribution of Swiss population using fractal concepts and unsupervised learning algorithms. The research methodology is based on the development of a high dimensional feature space by calculating local growth curves, widely used in fractal dimension estimation and on the application of clustering algorithms in order to reveal the patterns of spatial population distribution. The notion “unsupervised” also means, that only some general criteria—density, dimensionality, homogeneity, are used to construct an input feature space, without adding any supervised/expert knowledge. The approach is very powerful and provides a comprehensive local information about density and homogeneity/fractality of spatially distributed point patterns.

Download Full-text

Unsupervised Learning Model for Fault Prediction Using Representative Clustering Algorithms

KIPS Transactions on Software and Data Engineering ◽

10.3745/ktsde.2014.3.2.57 ◽

2014 ◽

Vol 3 (2) ◽

pp. 57-64 ◽

Cited By ~ 2

Author(s):

Euyseok Hong ◽

Mikyeong Park

Keyword(s):

Unsupervised Learning ◽

Clustering Algorithms ◽

Learning Model ◽

Fault Prediction

Download Full-text

A review of clustering algorithms: Comparison of DBSCAN and K-mean with oversampling and t-SNE

Recent Patents on Engineering ◽

10.2174/1872212115666210208222231 ◽

2021 ◽

Vol 15 ◽

Author(s):

Eshan Bajal ◽

Vipin Katara ◽

Madhulika Bhatia ◽

Madhurima Hooda

Keyword(s):

Cluster Analysis ◽

Logistic Regression ◽

Unsupervised Learning ◽

Spatial Clustering ◽

Clustering Algorithms ◽

Controlled Environment ◽

Renal Adenocarcinoma ◽

Regression Algorithms ◽

Clustering And Classification

Abstract: The two most widely used and easily implementable algorithm for clustering and classification-based analysis of data in the unsupervised learning domain are Density-Based Spatial Clustering of Applications with Noise and K-mean cluster analysis. These two techniques can handle most cases effective when the data has a lot of randomness with no clear set to use as a parameter as in case of linear or logistic regression algorithms. However few papers exist that pit these two against each other in a controlled environment to observe which one reigns supreme and conditions required for the same. In this paper, a renal adenocarcinoma dataset is analyzed and thereafter both DBSCAN and K-mean are applied on the dataset with subsequent examination of the results. The efficacy of both the techniques in this study is compared and based on them the merits and demerits observed are enumerated. Further, the interaction of t-SNE with the generated clusters are explored.

Download Full-text

Single- and Multi-order Neurons for recursive unsupervised learning

Artificial Intelligence for Advanced Problem Solving Techniques ◽

10.4018/978-1-59904-705-8.ch008 ◽

2008 ◽

pp. 217-233

Author(s):

Kiruthika Ramanathan ◽

Sheng Uei Guan

Keyword(s):

Neural Networks ◽

Unsupervised Learning ◽

Clustering Algorithms ◽

Ensemble Clustering ◽

Clustering Ensemble ◽

Training Time ◽

Empirical Results

In this chapter we present a recursive approach to unsupervised learning. The algorithm proposed, while similar to ensemble clustering, does not need to execute several clustering algorithms and find consensus between them. On the contrary, grouping is done between two subsets of data at one time, thereby saving training time. Also, only two kinds of clustering algorithms are used in creating the recursive clustering ensemble, as opposed to the multitude of clusterers required by ensemble clusterers. In this chapter a recursive clusterer is proposed for both single and multi order neural networks. Empirical results show as much as 50% improvement in clustering accuracy when compared to benchmark clustering algorithms.

Download Full-text

Unsupervised Learning from Multi-Dimensional Data: A Fast Clustering Algorithm Utilizing Canopies and Statistical Information

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622018500141 ◽

2018 ◽

Vol 17 (03) ◽

pp. 841-856 ◽

Cited By ~ 4

Author(s):

Giyasettin Ozcan

Keyword(s):

Unsupervised Learning ◽

Performance Improvement ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Statistical Information ◽

Early Termination ◽

Statistical Techniques ◽

Intermediate Result ◽

Long Duration ◽

Speed Up

In this study, we consider unsupervised learning from multi-dimensional dataset problem. Particularly, we consider [Formula: see text]-means clustering which require long duration time during execution of multi-dimensional datasets. In order to speed up clustering in an accurate form, we introduce a new algorithm, that we term Canopy[Formula: see text]. The algorithm utilizes canopies and statistical techniques. Also, its efficient initiation and normalization methodologies contributes to the improvement. Furthermore, we consider early termination cases of clustering computation, provided that an intermediate result of the computation is accurate enough. We compared our algorithm with four popular clustering algorithms. Results denote that our algorithm speeds up the clustering computation by at least 2X. Also, we analyzed the contribution of early termination. Results present that further 2X improvement can be obtained while incurring 0.1% error rate. We also observe that our Canopy[Formula: see text] algorithm benefits from early termination and introduces extra 1.2X performance improvement.

Download Full-text

Advanced Exploratory Analysis of Air Pollution Multivariate Spatio-Temporal Data

10.5194/egusphere-egu2020-11461 ◽

2020 ◽

Author(s):

Mikhail Kanevski ◽

Federico Amato ◽

Fabian Guignard

Keyword(s):

Land Use ◽

Air Pollution ◽

Time Series ◽

Unsupervised Learning ◽

Multivariate Time Series ◽

Clustering Algorithms ◽

Monitoring Network ◽

Fractal Dimensions ◽

First Case ◽

Spatio Temporal

<p>The research deals with an application of advanced exploratory tools to study hourly spatio-temporal air pollution data collected by NABEL monitoring network in Switzerland. Data analyzed consist of several pollutants, mainly NO2, O3, PM2.5, measured during last two years at 16 stations distributed over the country. The data are considered in two different ways: 1) as multivariate time series measured at the same station (different pollutants and environmental variables, like temperature), 2) as a spatially distributed time series of the same pollutant. In the first case, it is interesting to study both univariate and multivariate time series and their complexity. In the second case, similarity between time series distributed in space can signify the similar underlying phenomena and environmental conditions giving rise to the pollution. An important aspect of the data is that they are collected at the places of different land use classes &#8211; urban, suburban, rural etc., which helps in understanding and interpretation of the results.</p><p>Nowadays, unsupervised learning algorithms are widely applied in intelligent exploratory data analysis. Well known tasks of unsupervised learning include manifold learning, dimensionality reduction and clustering. In the present research, intrinsic and fractal dimensions, measures characterizing the similarity and redundancy in data and machine learning clustering algorithms were adapted and applied. The results obtained give a new and important information on the air pollution spatio-temporal patterns. The following results, between others, can be mentioned: 1) some measures of similarity (e.g., complexity-independent distance) are efficient in discriminating between time series; 2) intrinsic dimension, characterizing the ensemble of monitoring data, is pollutant dependent; 3) clustering of time series observed can be interpreted using the available information on land use.&#160;&#160;</p>

Download Full-text

Prior knowledge and correlational structure in unsupervised learning.

Canadian Journal of Experimental Psychology/Revue canadienne de psychologie expérimentale ◽

10.1037/cjep20070012 ◽

2007 ◽

Vol 61 (2) ◽

pp. 109-127 ◽

Cited By ~ 3

Author(s):

John P. Clapper

Keyword(s):

Unsupervised Learning ◽

Prior Knowledge

Download Full-text