Using Latent Semantic Indexing to Improve the Accuracy of Document Clustering

Document clustering is a significant research issue in information retrieval and text mining. Traditionally, most clustering methods were based on the vector space model which has a few limitations such as high dimensionality and weakness in handling synonymous and polysemous problems. Latent semantic indexing (LSI) is able to deal with such problems to some extent. Previous studies have shown that using LSI could reduce the time in clustering a large document set while having little effect on clustering accuracy. However, when conducting clustering upon a small document set, the accuracy is more concerned than efficiency. In this paper, we demonstrate that LSI can improve the clustering accuracy of a small document set and we also recommend the dimensions needed to achieve the best clustering performance.

Download Full-text

Analysis of a Vector Space Model, Latent Semantic Indexing and Formal Concept Analysis for Information Retrieval

Cybernetics and Information Technologies ◽

10.2478/cait-2012-0003 ◽

2012 ◽

Vol 12 (1) ◽

pp. 34-48 ◽

Cited By ~ 11

Author(s):

Ch. Aswani Kumar ◽

M. Radvansky ◽

J. Annapurna

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Formal Concept Analysis ◽

Vector Space Model ◽

Latent Semantic Indexing ◽

Concept Analysis ◽

Formal Concept ◽

Semantic Indexing ◽

Space Model ◽

Classical Vector

Abstract Latent Semantic Indexing (LSI), a variant of classical Vector Space Model (VSM), is an Information Retrieval (IR) model that attempts to capture the latent semantic relationship between the data items. Mathematical lattices, under the framework of Formal Concept Analysis (FCA), represent conceptual hierarchies in data and retrieve the information. However, both LSI and FCA use the data represented in the form of matrices. The objective of this paper is to systematically analyze VSM, LSI and FCA for the task of IR using standard and real life datasets.

Download Full-text

Optimising the Heuristics in Latent Semantic Indexing for Effective Information Retrieval

Journal of Information & Knowledge Management ◽

10.1142/s0219649206001359 ◽

2006 ◽

Vol 05 (02) ◽

pp. 97-105 ◽

Cited By ~ 3

Author(s):

S. Srinivas ◽

Ch. AswaniKumar

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Latent Semantic Indexing ◽

Semantic Indexing ◽

Retrieval Performance ◽

Term Weighting ◽

Space Model ◽

Rank Approximation

Latent Semantic Indexing (LSI) is a famous Information Retrieval (IR) technique that tries to overcome the problems of lexical matching using conceptual indexing. LSI is a variant of vector space model and proved to be 30% more effective. Many studies have reported that good retrieval performance is related to the use of various retrieval heuristics. In this paper, we focus on optimising two LSI retrieval heuristics: term weighting and rank approximation. The results obtained demonstrate that the LSI performance improves significantly with the combination of optimised term weighting and rank approximation.

Download Full-text

Comparing and combining the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval

Information Processing & Management ◽

10.1016/0306-4573(89)90100-3 ◽

1989 ◽

Vol 25 (6) ◽

pp. 665-676 ◽

Cited By ~ 31

Author(s):

Karen E. Lochbaum ◽

Lynn A. Streeter

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Latent Semantic Indexing ◽

Semantic Indexing ◽

Space Model

Download Full-text

Latent Semantic Indexing for Indonesian Text Similarity

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.3.12619 ◽

2018 ◽

Vol 7 (2.3) ◽

pp. 73 ◽

Cited By ~ 3

Author(s):

Robbi Rahim ◽

Nuning Kurniasih ◽

Muhammad Dedi Irawan ◽

Yustria Handika Siregar ◽

Abdurrozzaq Hasibuan ◽

...

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Scientific Work ◽

Latent Semantic Indexing ◽

Semantic Indexing ◽

Text Similarity ◽

Space Model ◽

Indexing Method

Document is a written letter that can be used as evidence of information. Plagiarism is a deliberate or unintentional act of obtaining or attempting to obtain credit or value for a scientific work, citing some or all of the scientific work of another party acknowledged as a scientific work without stating the source properly and adequately. Latent Semantic Indexing method serves to find text that has the same text against from a document. The algorithm used is TF/IDF Algorithm that is the result of multiplication of TF value with IDF for a term in document while Vector Space Model (VSM) is method to see the level of closeness or similarity of word by way of weighting term.

Download Full-text

A Web Service Recommender System Using Vector Space Model and Latent Semantic Indexing

2011 IEEE International Conference on Advanced Information Networking and Applications ◽

10.1109/aina.2011.99 ◽

2011 ◽

Cited By ~ 9

Author(s):

Nguyen Ngoc Chan ◽

Walid Gaaloul ◽

Samir Tata

Keyword(s):

Vector Space ◽

Web Service ◽

Recommender System ◽

Vector Space Model ◽

Latent Semantic Indexing ◽

Semantic Indexing ◽

Space Model

Download Full-text

Web-Based Information Search System Development Using a Semantic Network

KnE Social Sciences ◽

10.18502/kss.v5i6.9223 ◽

2021 ◽

pp. 347-352

Author(s):

Joko Samodra ◽

Primardiana Hermilia Wijayati ◽

. Rosyidah ◽

Andika Agung Sutrisno

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Semantic Network ◽

Vector Space Model ◽

Latent Semantic Indexing ◽

Search System ◽

Web Based ◽

Space Model ◽

Retrieval Systems ◽

Information Retrieval Systems

Finding information from a large collection of documents is a complicated task; therefore, we need a method called an information retrieval system. Several models that have been used in information retrieval systems include the Vector Space Model (VSM), DICE Similarity, Latent Semantic Indexing (LSI), Generalized Vector Space Model (GVSM), and semantic-based information retrieval systems. The purpose of this study was to develop a semantic network-based search system that will find information based on keywords and the semantic relationship of keywords provided by users. This cannot be done by most search systems that only work based on keyword matching or similarities. The Waterfall development model was used, which divides the development stages into five steps, namely: (1) requirements analysis and definition; (2) system and software design; (3) implementation and unit testing; (4) integration and system testing; and (5) operation and maintenance. The developed system/application was tested by trying to find information based on various combinations of keywords provided by the user. The results showed that the system can find information that matches the keyword, and other relevant information based on the semantic relationships of these keywords. Keywords: information retrieval, search system, semantic network, web-based application

Download Full-text

Recommending Library Methods: An Evaluation of the Vector Space Model (VSM) and Latent Semantic Indexing (LSI)

Lecture Notes in Computer Science - Reuse of Off-the-Shelf Components ◽

10.1007/11763864_16 ◽

2006 ◽

pp. 217-230 ◽

Cited By ~ 8

Author(s):

Frank McCarey ◽

Mel Ó Cinnéide ◽

Nicholas Kushmerick

Keyword(s):

Vector Space ◽

Vector Space Model ◽

Latent Semantic Indexing ◽

Semantic Indexing ◽

Space Model

Download Full-text

Aplikasi Deteksi Kemiripan Tugas Paper

Matrik Jurnal Manajemen Teknik Informatika dan Rekayasa Komputer ◽

10.30812/matrik.v15i2.39 ◽

2017 ◽

Vol 15 (2) ◽

pp. 5

Author(s):

Anthony Anggrawan ◽

Azhari

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Mean Average Precision ◽

Average Precision ◽

Information Searching ◽

Space Model ◽

Model Method

Information searching based on users’ query, which is hopefully able to find the documents based on users’ need, is known as Information Retrieval. This research uses Vector Space Model method in determining the similarity percentage of each student’s assignment. This research uses PHP programming and MySQL database. The finding is represented by ranking the similarity of document with query, with mean average precision value of 0,874. It shows how accurate the application with the examination done by the experts, which is gained from the evaluation with 5 queries that is compared to 25 samples of documents. If the number of counted assignments has higher similarity, thus the process of similarity counting needs more time, it depends on the assignment’s number which is submitted.

Download Full-text

A relational vector-space model of information retrieval adapted to images

ACM SIGIR Forum ◽

10.1145/1067268.1067292 ◽

2005 ◽

Vol 39 (1) ◽

pp. 62-62

Author(s):

Jean Martinet

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Space Model

Download Full-text

On Generalized Vector Space Model in Information Retrieval

Fundamenta Informaticae ◽

10.3233/fi-1985-8207 ◽

1985 ◽

Vol 8 (2) ◽

pp. 253-267

Author(s):

S.K.M. Wong ◽

Wojciech Ziarko

Keyword(s):

Information Retrieval ◽

Vector Space ◽

A Priori ◽

Vector Space Model ◽

Smart System ◽

Space Model ◽

Retrieval Systems ◽

Information Retrieval Systems ◽

Index Terms ◽

Minimal Modification

In information retrieval, it is common to model index terms and documents as vectors in a suitably defined vector space. The main difficulty with this approach is that the explicit representation of term vectors is not known a priori. For this reason, the vector space model adopted by Salton for the SMART system treats the terms as a set of orthogonal vectors. In such a model it is often necessary to adopt a separate, corrective procedure to take into account the correlations between terms. In this paper, we propose a systematic method (the generalized vector space model) to compute term correlations directly from automatic indexing scheme. We also demonstrate how such correlations can be included with minimal modification in the existing vector based information retrieval systems.

Download Full-text