scholarly journals Analysis on the use of Latent Semantic Indexing (LSI) for document classification and retrieval system of PNP files

2018 ◽  
Vol 189 ◽  
pp. 03009
Author(s):  
Angelica M. Aquino ◽  
Enrico P. Chavez

Document classification is the process of categorizing documents from many mixed files automatically [1]. In this paper, an approach to classification of documents for admin-case files of Philippine National Police (PNP) using Latent Semantic Indexing (LSI) method is proposed. The model for this that represents term-to-term, document-todocument and term-to-document relationships has been applied. Regular Expression is implemented also to define a search pattern based on character strings which the LSI used to establish the semantic relevance of the character strings to the search term or keyword. The aim of the study is to evaluate the performance of LSI in classifying PNP documents; experimentation was done using software to test the capability of LSI towards text retrieval. Indexing is according to the pattern matched in the collection of text that uses model of SVD. Based on tests, documents were indexed based on file relationships and was able to return a search result as the retrieved information from PNP files. Weights are used to check the accuracy of the method; the positive values identified in query similarity are regarded as the most relevant among the related searches, meaning, the query word matches words in a text file and it returns a query result.

2018 ◽  
Vol 11 (4) ◽  
pp. 97-112
Author(s):  
Jeong-Joon Kim ◽  
Yong-Soo Lee ◽  
Jin-Yong Moon ◽  
Jeong-Min Park

2008 ◽  
Vol 7 (1) ◽  
pp. 182-191 ◽  
Author(s):  
Sebastian Klie ◽  
Lennart Martens ◽  
Juan Antonio Vizcaíno ◽  
Richard Côté ◽  
Phil Jones ◽  
...  

2011 ◽  
Vol 181-182 ◽  
pp. 830-835
Author(s):  
Min Song Li

Latent Semantic Indexing(LSI) is an effective feature extraction method which can capture the underlying latent semantic structure between words in documents. However, it is probably not the most appropriate for text categorization to use the method to select feature subspace, since the method orders extracted features according to their variance,not the classification power. We proposed a method based on support vector machine to extract features and select a Latent Semantic Indexing that be suited for classification. Experimental results indicate that the method improves classification performance with more compact representation.


2021 ◽  
Vol 12 (4) ◽  
pp. 169-185
Author(s):  
Saida Ishak Boushaki ◽  
Omar Bendjeghaba ◽  
Nadjet Kamel

Clustering is an important unsupervised analysis technique for big data mining. It finds its application in several domains including biomedical documents of the MEDLINE database. Document clustering algorithms based on metaheuristics is an active research area. However, these algorithms suffer from the problems of getting trapped in local optima, need many parameters to adjust, and the documents should be indexed by a high dimensionality matrix using the traditional vector space model. In order to overcome these limitations, in this paper a new documents clustering algorithm (ASOS-LSI) with no parameters is proposed. It is based on the recent symbiotic organisms search metaheuristic (SOS) and enhanced by an acceleration technique. Furthermore, the documents are represented by semantic indexing based on the famous latent semantic indexing (LSI). Conducted experiments on well-known biomedical documents datasets show the significant superiority of ASOS-LSI over five famous algorithms in terms of compactness, f-measure, purity, misclassified documents, entropy, and runtime.


Sign in / Sign up

Export Citation Format

Share Document