A new approach to build a geographical taxonomy of adjacency automatically using the latent semantic indexing method

Author(s):  
Omar El Midaoui ◽  
Abderrahim El Qadi ◽  
Moulay Driss Rahmani ◽  
Driss Aboutajdine
2013 ◽  
Vol 58 (3) ◽  
pp. 927-930 ◽  
Author(s):  
S. Kluska-Nawarecka ◽  
K. Regulski ◽  
M. Krzyżak ◽  
G. Leśniak ◽  
M. Gurda

Abstract This paper presents assumptions for a system of automatic cataloging and semantic text documents searching. As an example, a document repository for metals processing technology was used. The system by using ontological model provides the user with a new approach to the exploration of database resources - easier and more intuitive information search. In the current document storage systems, searching is often based only on keywords and descriptions created manually by the system administrator. The use of text mining methods, especially latent semantic indexing, allows automatic clustering of documents with respect to their content. The result of this clustering is integrated with the ontological model, making navigation through documents resources intuitive and does not require the manual creation of directories. Such an approach seems to be particularly useful in a situation where we are dealing with large repositories of unstructured documents from such sources as the Internet. This situation is very typical for cases of searching information and knowledge in the area of metallurgy, for example with regard to innovation and non-traditional suppliers of materials and equipment.


2018 ◽  
Vol 7 (2.3) ◽  
pp. 73 ◽  
Author(s):  
Robbi Rahim ◽  
Nuning Kurniasih ◽  
Muhammad Dedi Irawan ◽  
Yustria Handika Siregar ◽  
Abdurrozzaq Hasibuan ◽  
...  

Document is a written letter that can be used as evidence of information. Plagiarism is a deliberate or unintentional act of obtaining or attempting to obtain credit or value for a scientific work, citing some or all of the scientific work of another party acknowledged as a scientific work without stating the source properly and adequately. Latent Semantic Indexing method serves to find text that has the same text against from a document. The algorithm used is TF/IDF Algorithm that is the result of multiplication of TF value with IDF for a term in document while Vector Space Model (VSM) is method to see the level of closeness or similarity of word by way of weighting term.  


2017 ◽  
Vol 26 (2) ◽  
pp. 263-285
Author(s):  
Yanyan Xu ◽  
Dengfeng Ke ◽  
Kaile Su

AbstractThe writing part in Chinese language tests is badly in need of a mature automated essay scoring system. In this paper, we propose a new approach applied to automated Chinese essay scoring (ACES), called contextualized latent semantic indexing (CLSI), of which Genuine CLSI and Modified CLSI are two versions. The n-gram language model and the weighted finite-state transducer (WFST), two critical components, are used to extract context information in our ACES system. Not only does CLSI improve conventional latent semantic indexing (LSI), but bridges the gap between latent semantics and their context information, which is absent in LSI. Moreover, CLSI can score essays from the perspectives of language fluency and contents, and address the local overrating and underrating problems caused by LSI. Experimental results show that CLSI outperforms LSI, Regularized LSI, and latent Dirichlet allocation in many aspects, and thus, proves to be an effective approach.


2008 ◽  
Vol 7 (1) ◽  
pp. 182-191 ◽  
Author(s):  
Sebastian Klie ◽  
Lennart Martens ◽  
Juan Antonio Vizcaíno ◽  
Richard Côté ◽  
Phil Jones ◽  
...  

2011 ◽  
Vol 181-182 ◽  
pp. 830-835
Author(s):  
Min Song Li

Latent Semantic Indexing(LSI) is an effective feature extraction method which can capture the underlying latent semantic structure between words in documents. However, it is probably not the most appropriate for text categorization to use the method to select feature subspace, since the method orders extracted features according to their variance,not the classification power. We proposed a method based on support vector machine to extract features and select a Latent Semantic Indexing that be suited for classification. Experimental results indicate that the method improves classification performance with more compact representation.


2021 ◽  
Vol 12 (4) ◽  
pp. 169-185
Author(s):  
Saida Ishak Boushaki ◽  
Omar Bendjeghaba ◽  
Nadjet Kamel

Clustering is an important unsupervised analysis technique for big data mining. It finds its application in several domains including biomedical documents of the MEDLINE database. Document clustering algorithms based on metaheuristics is an active research area. However, these algorithms suffer from the problems of getting trapped in local optima, need many parameters to adjust, and the documents should be indexed by a high dimensionality matrix using the traditional vector space model. In order to overcome these limitations, in this paper a new documents clustering algorithm (ASOS-LSI) with no parameters is proposed. It is based on the recent symbiotic organisms search metaheuristic (SOS) and enhanced by an acceleration technique. Furthermore, the documents are represented by semantic indexing based on the famous latent semantic indexing (LSI). Conducted experiments on well-known biomedical documents datasets show the significant superiority of ASOS-LSI over five famous algorithms in terms of compactness, f-measure, purity, misclassified documents, entropy, and runtime.


Sign in / Sign up

Export Citation Format

Share Document