Text term weighting approach based on latent semantic indexing

Berbagai metode perangkingan dokumen dalam aplikasi InformationRetrieval telah dikembangkan dan diimplementasikan. Salah satu metode yangsangat populer adalah perangkingan dokumen menggunakan vector space modelberbasis pada nilai term weighting TF.IDF. Metode tersebut hanya melakukanpembobotan term berdasarkan frekuensi kemunculannya pada dokumen tanpamemperhatikan hubungan semantik antar term. Dalam kenyataannya hubungansemantik antar term memiliki peranan penting untuk meningkatkan relevansi hasilpencarian dokumen. Penelitian ini mengembangkan metode TF.IDF.ICF.IBFdengan menambahkan Latent Semantic Indexing untuk menemukan hubungansemantik antar term pada kasus perangkingan dokumen berbahasa Arab. Datasetyang digunakan diambil dari kumpulan dokumen pada perangkat lunak MaktabahSyamilah. Hasil pengujian menunjukkan bahwa metode yang diusulkanmemberikan nilai evaluasi yang lebih baik dibandingkan dengan metodeTF.IDF.ICF.IBF. Secara berurut nilai f-measure metode TF.IDF.ICF.IBF.LSIpada ambang cosine similarity 0,3, 0,4, dan 0,5 adalah 45%, 51%, dan 60%. Namun metode yang disulkan memiliki waktu komputasi rata-rata lebih tinggidibandingkan dengan metode TF.IDF.ICF.IBF sebesar 2 menit 8 detik.

Download Full-text

Optimising the Heuristics in Latent Semantic Indexing for Effective Information Retrieval

Journal of Information & Knowledge Management ◽

10.1142/s0219649206001359 ◽

2006 ◽

Vol 05 (02) ◽

pp. 97-105 ◽

Cited By ~ 3

Author(s):

S. Srinivas ◽

Ch. AswaniKumar

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Latent Semantic Indexing ◽

Semantic Indexing ◽

Retrieval Performance ◽

Term Weighting ◽

Space Model ◽

Rank Approximation

Latent Semantic Indexing (LSI) is a famous Information Retrieval (IR) technique that tries to overcome the problems of lexical matching using conceptual indexing. LSI is a variant of vector space model and proved to be 30% more effective. Many studies have reported that good retrieval performance is related to the use of various retrieval heuristics. In this paper, we focus on optimising two LSI retrieval heuristics: term weighting and rank approximation. The results obtained demonstrate that the LSI performance improves significantly with the combination of optimised term weighting and rank approximation.

Download Full-text

Source Code Classification using Latent Semantic Indexing with Structural and Frequency Term Weighting

Research Journal of Applied Sciences ◽

10.3923/rjasci.2012.266.271 ◽

2012 ◽

Vol 7 (5) ◽

pp. 266-271

Author(s):

Yuhanis Yusof ◽

Taha Alhersh ◽

Massudi Mahmuddin ◽

Aniza Mohamed Din

Keyword(s):

Source Code ◽

Latent Semantic Indexing ◽

Semantic Indexing ◽

Term Weighting ◽

Frequency Term

Download Full-text

Latent semantic indexing and large dataset: Study of term-weighting schemes

2010 Fifth International Conference on Digital Information Management (ICDIM) ◽

10.1109/icdim.2010.5664669 ◽

2010 ◽

Cited By ~ 4

Author(s):

A N K Zaman ◽

C G Brown

Keyword(s):

Latent Semantic Indexing ◽

Semantic Indexing ◽

Term Weighting ◽

Large Dataset ◽

Weighting Schemes

Download Full-text

Support vector machine for customized email filtering based on improving latent semantic indexing

2005 International Conference on Machine Learning and Cybernetics ◽

10.1109/icmlc.2005.1527599 ◽

2005 ◽

Cited By ~ 1

Author(s):

Qing Yang ◽

Fang-Min Li

Keyword(s):

Support Vector Machine ◽

Latent Semantic Indexing ◽

Support Vector ◽

Semantic Indexing ◽

Email Filtering

Download Full-text

Analyzing Large-Scale Proteomics Projects with Latent Semantic Indexing

Journal of Proteome Research ◽

10.1021/pr070461k ◽

2008 ◽

Vol 7 (1) ◽

pp. 182-191 ◽

Cited By ~ 33

Author(s):

Sebastian Klie ◽

Lennart Martens ◽

Juan Antonio Vizcaíno ◽

Richard Côté ◽

Phil Jones ◽

...

Keyword(s):

Large Scale ◽

Latent Semantic Indexing ◽

Semantic Indexing

Download Full-text

A Method Based on Support Vector Machine for Feature Selection of Latent Semantic Features

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.181-182.830 ◽

2011 ◽

Vol 181-182 ◽

pp. 830-835

Author(s):

Min Song Li

Keyword(s):

Support Vector Machine ◽

Text Categorization ◽

Latent Semantic Indexing ◽

Classification Performance ◽

Compact Representation ◽

Support Vector ◽

Semantic Features ◽

Semantic Indexing ◽

Feature Extraction Method ◽

Feature Subspace

Latent Semantic Indexing(LSI) is an effective feature extraction method which can capture the underlying latent semantic structure between words in documents. However, it is probably not the most appropriate for text categorization to use the method to select feature subspace, since the method orders extracted features according to their variance,not the classification power. We proposed a method based on support vector machine to extract features and select a Latent Semantic Indexing that be suited for classification. Experimental results indicate that the method improves classification performance with more compact representation.

Download Full-text

Adaptive label-driven scaling for latent semantic indexing

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '08 ◽

10.1145/1390334.1390525 ◽

2008 ◽

Cited By ~ 1

Author(s):

Xiaojun Quan ◽

Enhong Chen ◽

Qiming Luo ◽

Hui Xiong

Keyword(s):

Latent Semantic Indexing ◽

Semantic Indexing

Download Full-text

Biomedical Document Clustering Based on Accelerated Symbiotic Organisms Search Algorithm

International Journal of Swarm Intelligence Research ◽

10.4018/ijsir.2021100109 ◽

2021 ◽

Vol 12 (4) ◽

pp. 169-185

Author(s):

Saida Ishak Boushaki ◽

Omar Bendjeghaba ◽

Nadjet Kamel

Keyword(s):

Clustering Algorithm ◽

Search Algorithm ◽

Clustering Algorithms ◽

Document Clustering ◽

Latent Semantic Indexing ◽

Research Area ◽

Semantic Indexing ◽

Local Optima ◽

Symbiotic Organisms Search ◽

Symbiotic Organisms

Clustering is an important unsupervised analysis technique for big data mining. It finds its application in several domains including biomedical documents of the MEDLINE database. Document clustering algorithms based on metaheuristics is an active research area. However, these algorithms suffer from the problems of getting trapped in local optima, need many parameters to adjust, and the documents should be indexed by a high dimensionality matrix using the traditional vector space model. In order to overcome these limitations, in this paper a new documents clustering algorithm (ASOS-LSI) with no parameters is proposed. It is based on the recent symbiotic organisms search metaheuristic (SOS) and enhanced by an acceleration technique. Furthermore, the documents are represented by semantic indexing based on the famous latent semantic indexing (LSI). Conducted experiments on well-known biomedical documents datasets show the significant superiority of ASOS-LSI over five famous algorithms in terms of compactness, f-measure, purity, misclassified documents, entropy, and runtime.

Download Full-text

Bilingual Chunk Alignment Based on Interactional Matching and Probabilistic Latent Semantic Indexing

Natural Language Processing – IJCNLP 2004 - Lecture Notes in Computer Science ◽

10.1007/978-3-540-30211-7_44 ◽

2005 ◽

pp. 416-425

Author(s):

Feifan Liu ◽

Qianli Jin ◽

Jun Zhao ◽

Bo Xu

Keyword(s):

Latent Semantic Indexing ◽

Semantic Indexing

Download Full-text