An Ontology Based Model for Document Clustering

Clustering is an important topic to find relevant content from a document collection and it also reduces the search space. The current clustering research emphasizes the development of a more efficient clustering method without considering the domain knowledge and user’s need. In recent years the semantics of documents have been utilized in document clustering. The discussed work focuses on the clustering model where ontology approach is applied. The major challenge is to use the background knowledge in the similarity measure. This paper presents an ontology based annotation of documents and clustering system. The semi-automatic document annotation and concept weighting scheme is used to create an ontology based knowledge base. The Particle Swarm Optimization (PSO) clustering algorithm can be applied to obtain the clustering solution. The accuracy of clustering has been computed before and after combining ontology with Vector Space Model (VSM). The proposed ontology based framework gives improved performance and better clustering compared to the traditional vector space model. The result using ontology was significant and promising.

Download Full-text

DGFCM: A New Dynamic Clustering Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.556-562.3945 ◽

2014 ◽

Vol 556-562 ◽

pp. 3945-3948

Author(s):

Xin Qing Geng ◽

Hong Yan Yang ◽

Feng Mei Tao

Keyword(s):

Vector Space ◽

Fuzzy Clustering ◽

Clustering Algorithm ◽

Vector Space Model ◽

Dynamic Clustering ◽

Self Organizing Maps ◽

Space Model ◽

Fuzzy Clustering Algorithm ◽

Present Algorithm ◽

Self Organizing

This paper applies the dynamic self-organizing maps algorithm to determining the number of clustering. The text eigenvector is acquired based on the vector space model (VSM) and TF.IDF method. The number of clustering acquired by the dynamic self-organizing maps. The threshold GT control the network’s growth.Compared to the traditional fuzzy clustering algorithm, the present algorithm possesses higher precision. The example demonstrates the effectiveness of the present algorithm.

Download Full-text

A Sequence Clustering Algorithm for Detecting Software Vulnerabilities Based on Vector Space Model

INTERNATIONAL JOURNAL ON Advances in Information Sciences and Service Sciences ◽

10.4156/aiss.vol4.issue16.30 ◽

2012 ◽

Vol 4 (16) ◽

pp. 258-264

Author(s):

Yanyan WANG ◽

Yanning WANG ◽

Jiadong REN

Keyword(s):

Vector Space ◽

Clustering Algorithm ◽

Vector Space Model ◽

Space Model ◽

Software Vulnerabilities ◽

Sequence Clustering

Download Full-text

K-Means Document Clustering using Vector Space Model

Bonfring International Journal of Data Mining ◽

10.9756/bijdm.8076 ◽

2015 ◽

Vol 5 (2) ◽

pp. 10-14 ◽

Cited By ~ 3

Author(s):

R. Malathi Ravindran ◽

Dr. Antony Selvadoss Thanamani

Keyword(s):

Vector Space ◽

Document Clustering ◽

Vector Space Model ◽

Space Model

Download Full-text

A Similarity Rough Set Model for Document Representation and Document Clustering

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2011.p0125 ◽

2011 ◽

Vol 15 (2) ◽

pp. 125-133 ◽

Cited By ~ 3

Author(s):

Nguyen Chi Thanh ◽

◽

Koichi Yamada ◽

Muneyuki Unehara

Keyword(s):

Vector Space ◽

Rough Set ◽

Document Clustering ◽

Vector Space Model ◽

Document Representation ◽

Space Model ◽

Large Sets ◽

Tolerance Rough Set ◽

Document Organization ◽

The One

Document clustering is a textmining technique for unsupervised document organization. It helps the users browse and navigate large sets of documents. Ho et al. proposed a Tolerance Rough Set Model (TRSM) [1] for improving the vector space model that represents documents by vectors of terms and applied it to document clustering. In this paper we analyze their model to propose a new model for efficient clustering of documents. We introduce Similarity Rough Set Model (SRSM) as another model for presenting documents in document clustering. The model is evaluated by experiments on test collections. The experiment results show that the SRSM document clusteringmethod outperforms the one with TRSM and the results of SRSM are less affected by the value of parameter than TRSM.

Download Full-text

A Text Categorization Method Based on SVM and Improved K-Means

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.427-429.2449 ◽

2013 ◽

Vol 427-429 ◽

pp. 2449-2453

Author(s):

Rong Ze Xia ◽

Yan Jia ◽

Hu Li

Keyword(s):

Support Vector Machine ◽

Vector Space ◽

High Performance ◽

Supervised Classification ◽

Text Categorization ◽

Clustering Algorithm ◽

Vector Space Model ◽

Classification Method ◽

Support Vector ◽

Space Model

Traditional supervised classification method such as support vector machine (SVM) could achieve high performance in text categorization. However, we should first hand-labeled the samples before classifying. Its a time-consuming task. Unsupervised method such as k-means could also be used for handling the text categorization problem. However, Traditional k-means could easily be affected by several isolated observations. In this paper, we proposed a new text categorization method. First we improved the traditional k-means clustering algorithm. The improved k-means is used for clustering vectors in our vector space model. After that, we use the SVM to categorize vectors which are preprocessed by improved k-means. The experiments show that our algorithm could out-perform the traditional SVM text categorization method.

Download Full-text

Best Approximate of Vector Space Model by Using SVD

Al-Mustansiriyah Journal of Science ◽

10.23851/mjs.v28i2.509 ◽

2018 ◽

Vol 28 (2) ◽

pp. 143

Author(s):

Raghad M. Hadi

Keyword(s):

Text Mining ◽

Vector Space ◽

Document Clustering ◽

Vector Space Model ◽

Internet Technology ◽

Low Rank ◽

Space Model ◽

Text Document ◽

Space Technique ◽

Text Mining Application

A quick growth of internet technology makes it easy to assemble a huge volume of data as text document; e. g., journals, blogs, network pages, articles, email letters. In text mining application, increasing text space of datasets represent excessive task which makes it hard to pre-processing documents in efficient way to prepare it for text mining application like document clustering. The proposed system focuses on pre-processing document and reduction document space technique to prepare it for clustering technique. The mutual method for text mining problematic is vector space model (VSM), each term represent a features. Thus the proposed system create vector-space mod-el by using pre-processing method to reduce of trivial data from dataset. While the hug dimen-sionality of VSM is resolved by using low-rank SVD. Experiment results show that the proposed system give better document representation results about 10% from previous approach to prepare it for document clustering

Download Full-text

A Clustering Algorithm towards Microblogs based on Vector Space Model

Proceedings of 2012 National Conference on Information Technology and Computer Science ◽

10.2991/citcs.2012.243 ◽

2012 ◽

Author(s):

Guoyou Chen ◽

Jiajia Miao ◽

Handong Mao ◽

Le Wang ◽

Siyu Jiang

Keyword(s):

Vector Space ◽

Clustering Algorithm ◽

Vector Space Model ◽

Space Model

Download Full-text

Extended Vector Space Model with Semantic Relatedness on Java Archive Search Engine

Jurnal Teknik Informatika dan Sistem Informasi ◽

10.28932/jutisi.v1i2.372 ◽

2015 ◽

Vol 1 (2) ◽

Cited By ~ 2

Author(s):

Oscar Karnalim

Keyword(s):

Vector Space ◽

Search Engine ◽

Vector Space Model ◽

Semantic Relatedness ◽

Space Model

Download Full-text

Aplikasi Deteksi Kemiripan Tugas Paper

Matrik Jurnal Manajemen Teknik Informatika dan Rekayasa Komputer ◽

10.30812/matrik.v15i2.39 ◽

2017 ◽

Vol 15 (2) ◽

pp. 5

Author(s):

Anthony Anggrawan ◽

Azhari

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Mean Average Precision ◽

Average Precision ◽

Information Searching ◽

Space Model ◽

Model Method

Information searching based on users’ query, which is hopefully able to find the documents based on users’ need, is known as Information Retrieval. This research uses Vector Space Model method in determining the similarity percentage of each student’s assignment. This research uses PHP programming and MySQL database. The finding is represented by ranking the similarity of document with query, with mean average precision value of 0,874. It shows how accurate the application with the examination done by the experts, which is gained from the evaluation with 5 queries that is compared to 25 samples of documents. If the number of counted assignments has higher similarity, thus the process of similarity counting needs more time, it depends on the assignment’s number which is submitted.

Download Full-text