Document Clustering using Self-Organizing Maps

Muhammad Rafi; Muhammad Waqar; Hareem Ajaz; Umar Ayub; Muhammad Danish

doi:10.13164/mendel.2017.1.111

Document Clustering using Self-Organizing Maps

MENDEL ◽

10.13164/mendel.2017.1.111 ◽

2017 ◽

Vol 23 (1) ◽

pp. 111-118

Author(s):

Muhammad Rafi ◽

Muhammad Waqar ◽

Hareem Ajaz ◽

Umar Ayub ◽

Muhammad Danish

Keyword(s):

Document Clustering ◽

Feature Space ◽

Good Method ◽

Self Organization ◽

Self Organizing Maps ◽

Document Collections ◽

Document Collection ◽

Common Technique ◽

Low Dimensional ◽

Self Organizing

Cluster analysis of textual documents is a common technique for better ltering, navigation, under-standing and comprehension of the large document collection. Document clustering is an autonomous methodthat separate out large heterogeneous document collection into smaller more homogeneous sub-collections calledclusters. Self-organizing maps (SOM) is a type of arti cial neural network (ANN) that can be used to performautonomous self-organization of high dimension feature space into low-dimensional projections called maps. Itis considered a good method to perform clustering as both requires unsupervised processing. In this paper, weproposed a SOM using multi-layer, multi-feature to cluster documents. The paper implements a SOM usingfour layers containing lexical terms, phrases and sequences in bottom layers respectively and combining all atthe top layers. The documents are processed to extract these features to feed the SOM. The internal weightsand interconnections between these layers features(neurons) automatically settle through iterations with a smalllearning rate to discover the actual clusters. We have performed extensive set of experiments on standard textmining datasets like: NEWS20, Reuters and WebKB with evaluation measures F-Measure and Purity. Theevaluation gives encouraging results and outperforms some of the existing approaches. We conclude that SOMwith multi-features (lexical terms, phrases and sequences) and multi-layers can be very e ective in producinghigh quality clusters on large document collections.

Download Full-text

TurSOM: A paradigm bridging Turing's unorganized machines and self-organizing maps demonstrating dual self-organization

Neurocomputing ◽

10.1016/j.neucom.2011.04.028 ◽

2011 ◽

Vol 74 (17) ◽

pp. 3125-3141

Author(s):

Derek Beaton ◽

Iren Valova ◽

Daniel MacLean

Keyword(s):

Self Organization ◽

Self Organizing Maps ◽

Self Organizing

Download Full-text

Self-Organizing Maps of Large Document Collections

Visual Explorations in Finance - Springer Finance ◽

10.1007/978-1-4471-3913-3_12 ◽

1998 ◽

pp. 168-178 ◽

Cited By ~ 13

Author(s):

Timo Honkela ◽

Krista Lagus ◽

Samuel Kaski

Keyword(s):

Self Organizing Maps ◽

Document Collections ◽

Self Organizing

Download Full-text

Exploration of document collections with self-organizing maps: A novel approach to similarity representation

Principles of Data Mining and Knowledge Discovery - Lecture Notes in Computer Science ◽

10.1007/3-540-63223-9_110 ◽

1997 ◽

pp. 101-111 ◽

Cited By ~ 8

Author(s):

Dieter Merkl

Keyword(s):

Self Organizing Maps ◽

Document Collections ◽

Novel Approach ◽

Self Organizing

Download Full-text

Self-Organizing Maps of Very Large Document Collections: Justification for the WEBSOM Method

Classification, Data Analysis, and Data Highways - Studies in Classification, Data Analysis, and Knowledge Organization ◽

10.1007/978-3-642-72087-1_27 ◽

1998 ◽

pp. 245-252 ◽

Cited By ~ 25

Author(s):

T. Honkela ◽

S. Kaski ◽

T. Kohonen ◽

K. Lagus

Keyword(s):

Self Organizing Maps ◽

Document Collections ◽

Self Organizing

Download Full-text

Self-Organizing Representation Learning

10.36227/techrxiv.16826578.v1 ◽

2021 ◽

Author(s):

noureddine kermiche

Keyword(s):

Data Augmentation ◽

Representation Learning ◽

Self Organization ◽

Learning Methods ◽

Self Organizing Maps ◽

Output Layer ◽

Data Representations ◽

Augmentation Techniques ◽

Using Data ◽

Self Organizing

Using data augmentation techniques, unsupervised representation learning methods extract features from data by training artificial neural networks to recognize that different views of an object are just different instances of the same object. We extend current unsupervised representation learning methods to networks that can self-organize data representations into two-dimensional (2D) maps. The proposed method combines ideas from Kohonen’s original self-organizing maps (SOM) and recent development in unsupervised representation learning. A ResNet backbone with an added 2D <i>Softmax</i> output layer is used to organize the data representations. A new loss function with linear complexity is proposed to enforce SOM requirements of winner-take-all (WTA) and competition between neurons while explicitly avoiding collapse into trivial solutions. We show that enforcing SOM topological neighborhood requirement can be achieved by a fixed radial convolution at the 2D output layer without having to resort to actual radial activation functions which prevented the original SOM algorithm from being extended to nowadays neural network architectures. We demonstrate that when combined with data augmentation techniques, self-organization is a simple emergent property of the 2D output layer because of neighborhood recruitment combined with WTA competition between neurons. The proposed methodology is demonstrated on SVHN and CIFAR10 data sets. The proposed algorithm is the first end-to-end unsupervised learning method that combines data self-organization and visualization as integral parts of unsupervised representation learning.

Download Full-text

Fuzzy optimized self-organizing maps and their application to document clustering

Soft Computing ◽

10.1007/s00500-009-0468-3 ◽

2009 ◽

Vol 14 (8) ◽

pp. 857-867 ◽

Cited By ~ 6

Author(s):

Francisco P. Romero ◽

Arturo Peralta ◽

Andres Soto ◽

Jose A. Olivas ◽

Jesus Serrano-Guerrero

Keyword(s):

Document Clustering ◽

Self Organizing Maps ◽

Self Organizing

Download Full-text

Mining Text Documents for Thematic Hierarchies Using Self-Organizing Maps

Data Mining ◽

10.4018/978-1-59140-051-6.ch008 ◽

2011 ◽

pp. 199-219 ◽

Cited By ~ 2

Author(s):

Hsin-Chang Yang ◽

Chung-Hong Lee

Keyword(s):

Text Categorization ◽

Hierarchical Structures ◽

Self Organizing Map ◽

Feature Maps ◽

Text Documents ◽

Self Organizing Maps ◽

Test Corpus ◽

The Hierarchical Structure ◽

Document Collection ◽

Self Organizing

Recently, many approaches have been devised for mining various kinds of knowledge from texts. One important application of text mining is to identify themes and the semantic relations among these themes for text categorization. Traditionally, these themes were arranged in a hierarchical manner to achieve effective searching and indexing as well as easy comprehension for human beings. The determination of category themes and their hierarchical structures was mostly done by human experts. In this work, we developed an approach to automatically generate category themes and reveal the hierarchical structure among them. We also used the generated structure to categorize text documents. The document collection was trained by a self-organizing map to form two feature maps. We then analyzed these maps and obtained the category themes and their structure. Although the test corpus contains documents written in Chinese, the proposed approach can be applied to documents written in any language, and such documents can be transformed into a list of separated terms.

Download Full-text

Self Organizing Maps in NLP: Exploration of Coreference Feature Space

Advances in Self-Organizing Maps - Lecture Notes in Computer Science ◽

10.1007/978-3-642-21566-7_23 ◽

2011 ◽

pp. 228-237 ◽

Cited By ~ 1

Author(s):

Andre Burkovski ◽

Wiltrud Kessler ◽

Gunther Heidemann ◽

Hamidreza Kobdani ◽

Hinrich Schütze

Keyword(s):

Feature Space ◽

Self Organizing Maps ◽

Self Organizing

Download Full-text

Exploring diseases based biomedical document clustering and visualization using self-organizing maps

2017 IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom) ◽

10.1109/healthcom.2017.8210791 ◽

2017 ◽

Cited By ~ 2

Author(s):

Setu Shah ◽

Xiao Luo

Keyword(s):

Document Clustering ◽

Self Organizing Maps ◽

Self Organizing

Download Full-text

Estimating the Number of Clusters in Multivariate Data by Self-Organizing Maps

International Journal of Neural Systems ◽

10.1142/s0129065799000186 ◽

1999 ◽

Vol 09 (03) ◽

pp. 195-202 ◽

Cited By ~ 18

Author(s):

JOSÉ ALFREDO FERREIRA COSTA ◽

MÁRCIO LUIZ DE ANDRADE NETTO

Keyword(s):

A Priori ◽

Multivariate Data ◽

Feature Space ◽

Search Space ◽

Data Sets ◽

Self Organizing Maps ◽

Data Set ◽

Number Of Clusters ◽

Using Data ◽

Self Organizing

Determining the structure of data without prior knowledge of the number of clusters or any information about their composition is a problem of interest in many fields, such as image analysis, astrophysics, biology, etc. Partitioning a set of n patterns in a p-dimensional feature space must be done such that those in a given cluster are more similar to each other than the rest. As there are approximately [Formula: see text] possible ways of partitioning the patterns among K clusters, finding the best solution is very hard when n is large. The search space is increased when we have no a priori number of partitions. Although the self-organizing feature map (SOM) can be used to visualize clusters, the automation of knowledge discovery by SOM is a difficult task. This paper proposes region-based image processing methods to post-processing the U-matrix obtained after the unsupervised learning performed by SOM. Mathematical morphology is applied to identify regions of neurons that are similar. The number of regions and their labels are automatically found and they are related to the number of clusters in a multivariate data set. New data can be classified by labeling it according to the best match neuron. Simulations using data sets drawn from finite mixtures of p-variate normal densities are presented as well as related advantages and drawbacks of the method.

Download Full-text