A Similarity Rough Set Model for Document Representation and Document Clustering

Clustering is an important topic to find relevant content from a document collection and it also reduces the search space. The current clustering research emphasizes the development of a more efficient clustering method without considering the domain knowledge and user’s need. In recent years the semantics of documents have been utilized in document clustering. The discussed work focuses on the clustering model where ontology approach is applied. The major challenge is to use the background knowledge in the similarity measure. This paper presents an ontology based annotation of documents and clustering system. The semi-automatic document annotation and concept weighting scheme is used to create an ontology based knowledge base. The Particle Swarm Optimization (PSO) clustering algorithm can be applied to obtain the clustering solution. The accuracy of clustering has been computed before and after combining ontology with Vector Space Model (VSM). The proposed ontology based framework gives improved performance and better clustering compared to the traditional vector space model. The result using ontology was significant and promising.

Download Full-text

K-Means Document Clustering using Vector Space Model

Bonfring International Journal of Data Mining ◽

10.9756/bijdm.8076 ◽

2015 ◽

Vol 5 (2) ◽

pp. 10-14 ◽

Cited By ~ 3

Author(s):

R. Malathi Ravindran ◽

Dr. Antony Selvadoss Thanamani

Keyword(s):

Vector Space ◽

Document Clustering ◽

Vector Space Model ◽

Space Model

Download Full-text

An Ontology Based Model for Document Clustering

Organizational Efficiency through Intelligent Information Technologies ◽

10.4018/978-1-4666-2047-6.ch013 ◽

2012 ◽

pp. 199-215

Author(s):

Sridevi U. K. ◽

Nagaveni N.

Keyword(s):

Vector Space ◽

Domain Knowledge ◽

Clustering Algorithm ◽

Document Clustering ◽

Vector Space Model ◽

Search Space ◽

Space Model ◽

Before And After ◽

Document Collection ◽

Improved Performance

Clustering is an important topic to find relevant content from a document collection and it also reduces the search space. The current clustering research emphasizes the development of a more efficient clustering method without considering the domain knowledge and user’s need. In recent years the semantics of documents have been utilized in document clustering. The discussed work focuses on the clustering model where ontology approach is applied. The major challenge is to use the background knowledge in the similarity measure. This paper presents an ontology based annotation of documents and clustering system. The semi-automatic document annotation and concept weighting scheme is used to create an ontology based knowledge base. The Particle Swarm Optimization (PSO) clustering algorithm can be applied to obtain the clustering solution. The accuracy of clustering has been computed before and after combining ontology with Vector Space Model (VSM). The proposed ontology based framework gives improved performance and better clustering compared to the traditional vector space model. The result using ontology was significant and promising.

Download Full-text

Best Approximate of Vector Space Model by Using SVD

Al-Mustansiriyah Journal of Science ◽

10.23851/mjs.v28i2.509 ◽

2018 ◽

Vol 28 (2) ◽

pp. 143

Author(s):

Raghad M. Hadi

Keyword(s):

Text Mining ◽

Vector Space ◽

Document Clustering ◽

Vector Space Model ◽

Internet Technology ◽

Low Rank ◽

Space Model ◽

Text Document ◽

Space Technique ◽

Text Mining Application

A quick growth of internet technology makes it easy to assemble a huge volume of data as text document; e. g., journals, blogs, network pages, articles, email letters. In text mining application, increasing text space of datasets represent excessive task which makes it hard to pre-processing documents in efficient way to prepare it for text mining application like document clustering. The proposed system focuses on pre-processing document and reduction document space technique to prepare it for clustering technique. The mutual method for text mining problematic is vector space model (VSM), each term represent a features. Thus the proposed system create vector-space mod-el by using pre-processing method to reduce of trivial data from dataset. While the hug dimen-sionality of VSM is resolved by using low-rank SVD. Experiment results show that the proposed system give better document representation results about 10% from previous approach to prepare it for document clustering

Download Full-text

Extended Vector Space Model with Semantic Relatedness on Java Archive Search Engine

Jurnal Teknik Informatika dan Sistem Informasi ◽

10.28932/jutisi.v1i2.372 ◽

2015 ◽

Vol 1 (2) ◽

Cited By ~ 2

Author(s):

Oscar Karnalim

Keyword(s):

Vector Space ◽

Search Engine ◽

Vector Space Model ◽

Semantic Relatedness ◽

Space Model

Download Full-text

Aplikasi Deteksi Kemiripan Tugas Paper

Matrik Jurnal Manajemen Teknik Informatika dan Rekayasa Komputer ◽

10.30812/matrik.v15i2.39 ◽

2017 ◽

Vol 15 (2) ◽

pp. 5

Author(s):

Anthony Anggrawan ◽

Azhari

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Mean Average Precision ◽

Average Precision ◽

Information Searching ◽

Space Model ◽

Model Method

Information searching based on users’ query, which is hopefully able to find the documents based on users’ need, is known as Information Retrieval. This research uses Vector Space Model method in determining the similarity percentage of each student’s assignment. This research uses PHP programming and MySQL database. The finding is represented by ranking the similarity of document with query, with mean average precision value of 0,874. It shows how accurate the application with the examination done by the experts, which is gained from the evaluation with 5 queries that is compared to 25 samples of documents. If the number of counted assignments has higher similarity, thus the process of similarity counting needs more time, it depends on the assignment’s number which is submitted.

Download Full-text

Aplikasi Rekomendasi Buku Pada Katalog Perpustakaan Universitas Multimedia Nusantara Menggunakan Vector Space Model

Jurnal ULTIMATICS ◽

10.31937/ti.v9i2.639 ◽

2018 ◽

Vol 9 (2) ◽

pp. 97-105

Author(s):

Richard Firdaus Oeyliawan ◽

Dennis Gunawan

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Vector Model ◽

Library Management ◽

Space Model ◽

Library Management System ◽

Index Terms ◽

Library Catalogue ◽

Language Sample ◽

F Measure

Library is one of the facilities which provides information, knowledge resource, and acts as an academic helper for readers to get the information. The huge number of books which library has, usually make readers find the books with difficulty. Universitas Multimedia Nusantara uses the Senayan Library Management System (SLiMS) as the library catalogue. SLiMS has many features which help readers, but there is still no recommendation feature to help the readers finding the books which are relevant to the specific book that readers choose. The application has been developed using Vector Space Model to represent the document in vector model. The recommendation in this application is based on the similarity of the books description. Based on the testing phase using one-language sample of the relevant books, the F-Measure value gained is 55% using 0.1 as cosine similarity threshold. The books description and variety of languages affect the F-Measure value gained. Index Terms—Book Recommendation, Porter Stemmer, SLiMS Universitas Multimedia Nusantara, TF-IDF, Vector Space Model

Download Full-text