Analysis on the use of Latent Semantic Indexing (LSI) for document classification and retrieval system of PNP files

Document classification is the process of categorizing documents from many mixed files automatically [1]. In this paper, an approach to classification of documents for admin-case files of Philippine National Police (PNP) using Latent Semantic Indexing (LSI) method is proposed. The model for this that represents term-to-term, document-todocument and term-to-document relationships has been applied. Regular Expression is implemented also to define a search pattern based on character strings which the LSI used to establish the semantic relevance of the character strings to the search term or keyword. The aim of the study is to evaluate the performance of LSI in classifying PNP documents; experimentation was done using software to test the capability of LSI towards text retrieval. Indexing is according to the pattern matched in the collection of text that uses model of SVD. Based on tests, documents were indexed based on file relationships and was able to return a search result as the retrieved information from PNP files. Weights are used to check the accuracy of the method; the positive values identified in query similarity are regarded as the most relevant among the related searches, meaning, the query word matches words in a text file and it returns a query result.

Download Full-text

Document Classification Method based on Latent Semantic Indexing

International Journal of Grid and Distributed Computing ◽

10.14257/ijgdc.2018.11.4.09 ◽

2018 ◽

Vol 11 (4) ◽

pp. 97-112

Author(s):

Jeong-Joon Kim ◽

Yong-Soo Lee ◽

Jin-Yong Moon ◽

Jeong-Min Park

Keyword(s):

Latent Semantic Indexing ◽

Document Classification ◽

Classification Method ◽

Semantic Indexing

Download Full-text

Artificial Neural Network for Document Classification Using Latent Semantic Indexing

2007 International Symposium on Information Technology Convergence (ISITC 2007) ◽

10.1109/isitc.2007.69 ◽

2007 ◽

Cited By ~ 6

Author(s):

Cheng Hua Li ◽

Soon Cheol Park

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Latent Semantic Indexing ◽

Document Classification ◽

Semantic Indexing ◽

Artificial Neural

Download Full-text

Support vector machine for customized email filtering based on improving latent semantic indexing

2005 International Conference on Machine Learning and Cybernetics ◽

10.1109/icmlc.2005.1527599 ◽

2005 ◽

Cited By ~ 1

Author(s):

Qing Yang ◽

Fang-Min Li

Keyword(s):

Support Vector Machine ◽

Latent Semantic Indexing ◽

Support Vector ◽

Semantic Indexing ◽

Email Filtering

Download Full-text

Analyzing Large-Scale Proteomics Projects with Latent Semantic Indexing

Journal of Proteome Research ◽

10.1021/pr070461k ◽

2008 ◽

Vol 7 (1) ◽

pp. 182-191 ◽

Cited By ~ 33

Author(s):

Sebastian Klie ◽

Lennart Martens ◽

Juan Antonio Vizcaíno ◽

Richard Côté ◽

Phil Jones ◽

...

Keyword(s):

Large Scale ◽

Latent Semantic Indexing ◽

Semantic Indexing

Download Full-text

A Method Based on Support Vector Machine for Feature Selection of Latent Semantic Features

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.181-182.830 ◽

2011 ◽

Vol 181-182 ◽

pp. 830-835

Author(s):

Min Song Li

Keyword(s):

Support Vector Machine ◽

Text Categorization ◽

Latent Semantic Indexing ◽

Classification Performance ◽

Compact Representation ◽

Support Vector ◽

Semantic Features ◽

Semantic Indexing ◽

Feature Extraction Method ◽

Feature Subspace

Latent Semantic Indexing(LSI) is an effective feature extraction method which can capture the underlying latent semantic structure between words in documents. However, it is probably not the most appropriate for text categorization to use the method to select feature subspace, since the method orders extracted features according to their variance,not the classification power. We proposed a method based on support vector machine to extract features and select a Latent Semantic Indexing that be suited for classification. Experimental results indicate that the method improves classification performance with more compact representation.

Download Full-text

Adaptive label-driven scaling for latent semantic indexing

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '08 ◽

10.1145/1390334.1390525 ◽

2008 ◽

Cited By ~ 1

Author(s):

Xiaojun Quan ◽

Enhong Chen ◽

Qiming Luo ◽

Hui Xiong

Keyword(s):

Latent Semantic Indexing ◽

Semantic Indexing

Download Full-text

Biomedical Document Clustering Based on Accelerated Symbiotic Organisms Search Algorithm

International Journal of Swarm Intelligence Research ◽

10.4018/ijsir.2021100109 ◽

2021 ◽

Vol 12 (4) ◽

pp. 169-185

Author(s):

Saida Ishak Boushaki ◽

Omar Bendjeghaba ◽

Nadjet Kamel

Keyword(s):

Clustering Algorithm ◽

Search Algorithm ◽

Clustering Algorithms ◽

Document Clustering ◽

Latent Semantic Indexing ◽

Research Area ◽

Semantic Indexing ◽

Local Optima ◽

Symbiotic Organisms Search ◽

Symbiotic Organisms

Clustering is an important unsupervised analysis technique for big data mining. It finds its application in several domains including biomedical documents of the MEDLINE database. Document clustering algorithms based on metaheuristics is an active research area. However, these algorithms suffer from the problems of getting trapped in local optima, need many parameters to adjust, and the documents should be indexed by a high dimensionality matrix using the traditional vector space model. In order to overcome these limitations, in this paper a new documents clustering algorithm (ASOS-LSI) with no parameters is proposed. It is based on the recent symbiotic organisms search metaheuristic (SOS) and enhanced by an acceleration technique. Furthermore, the documents are represented by semantic indexing based on the famous latent semantic indexing (LSI). Conducted experiments on well-known biomedical documents datasets show the significant superiority of ASOS-LSI over five famous algorithms in terms of compactness, f-measure, purity, misclassified documents, entropy, and runtime.

Download Full-text