Source Code Classification using Latent Semantic Indexing with Structural and Frequency Term Weighting

Berbagai metode perangkingan dokumen dalam aplikasi InformationRetrieval telah dikembangkan dan diimplementasikan. Salah satu metode yangsangat populer adalah perangkingan dokumen menggunakan vector space modelberbasis pada nilai term weighting TF.IDF. Metode tersebut hanya melakukanpembobotan term berdasarkan frekuensi kemunculannya pada dokumen tanpamemperhatikan hubungan semantik antar term. Dalam kenyataannya hubungansemantik antar term memiliki peranan penting untuk meningkatkan relevansi hasilpencarian dokumen. Penelitian ini mengembangkan metode TF.IDF.ICF.IBFdengan menambahkan Latent Semantic Indexing untuk menemukan hubungansemantik antar term pada kasus perangkingan dokumen berbahasa Arab. Datasetyang digunakan diambil dari kumpulan dokumen pada perangkat lunak MaktabahSyamilah. Hasil pengujian menunjukkan bahwa metode yang diusulkanmemberikan nilai evaluasi yang lebih baik dibandingkan dengan metodeTF.IDF.ICF.IBF. Secara berurut nilai f-measure metode TF.IDF.ICF.IBF.LSIpada ambang cosine similarity 0,3, 0,4, dan 0,5 adalah 45%, 51%, dan 60%. Namun metode yang disulkan memiliki waktu komputasi rata-rata lebih tinggidibandingkan dengan metode TF.IDF.ICF.IBF sebesar 2 menit 8 detik.

Download Full-text

Text term weighting approach based on latent semantic indexing

Journal of Computer Applications ◽

10.3724/sp.j.1087.2008.01460 ◽

2008 ◽

Vol 28 (6) ◽

pp. 1460-1462

Author(s):

Yuan-yuan LI

Keyword(s):

Latent Semantic Indexing ◽

Semantic Indexing ◽

Term Weighting

Download Full-text

Optimising the Heuristics in Latent Semantic Indexing for Effective Information Retrieval

Journal of Information & Knowledge Management ◽

10.1142/s0219649206001359 ◽

2006 ◽

Vol 05 (02) ◽

pp. 97-105 ◽

Cited By ~ 3

Author(s):

S. Srinivas ◽

Ch. AswaniKumar

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Vector Space Model ◽

Latent Semantic Indexing ◽

Semantic Indexing ◽

Retrieval Performance ◽

Term Weighting ◽

Space Model ◽

Rank Approximation

Latent Semantic Indexing (LSI) is a famous Information Retrieval (IR) technique that tries to overcome the problems of lexical matching using conceptual indexing. LSI is a variant of vector space model and proved to be 30% more effective. Many studies have reported that good retrieval performance is related to the use of various retrieval heuristics. In this paper, we focus on optimising two LSI retrieval heuristics: term weighting and rank approximation. The results obtained demonstrate that the LSI performance improves significantly with the combination of optimised term weighting and rank approximation.

Download Full-text

Latent semantic indexing and large dataset: Study of term-weighting schemes

2010 Fifth International Conference on Digital Information Management (ICDIM) ◽

10.1109/icdim.2010.5664669 ◽

2010 ◽

Cited By ~ 4

Author(s):

A N K Zaman ◽

C G Brown

Keyword(s):

Latent Semantic Indexing ◽

Semantic Indexing ◽

Term Weighting ◽

Large Dataset ◽

Weighting Schemes

Download Full-text

Recovering documentation-to-source-code traceability links using latent semantic indexing

25th International Conference on Software Engineering, 2003. Proceedings. ◽

10.1109/icse.2003.1201194 ◽

2003 ◽

Cited By ~ 333

Author(s):

A. Marcus ◽

J.I. Maletic

Keyword(s):

Source Code ◽

Latent Semantic Indexing ◽

Semantic Indexing

Download Full-text

RECOVERY OF TRACEABILITY LINKS BETWEEN SOFTWARE DOCUMENTATION AND SOURCE CODE

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194005002543 ◽

2005 ◽

Vol 15 (05) ◽

pp. 811-836 ◽

Cited By ~ 82

Author(s):

ANDRIAN MARCUS ◽

JONATHAN I. MALETIC ◽

ANDREY SERGEYEV

Keyword(s):

Information Retrieval ◽

Case Studies ◽

Semantic Information ◽

Source Code ◽

Latent Semantic Indexing ◽

Automatic Process ◽

Semantic Indexing ◽

Software Documentation ◽

Retrieval Technique ◽

Positive Results

An approach for the semi-automated recovery of traceability links between software documentation and source code is presented. The methodology is based on the application of information retrieval techniques to extract and analyze the semantic information from the source code and associated documentation. A semi-automatic process is defined based on the proposed methodology. The paper advocates the use of latent semantic indexing (LSI) as the supporting information retrieval technique. Two case studies using existing software are presented comparing this approach with others. The case studies show positive results for the proposed approach, especially considering the flexibility of the methods used.

Download Full-text

Support vector machine for customized email filtering based on improving latent semantic indexing

2005 International Conference on Machine Learning and Cybernetics ◽

10.1109/icmlc.2005.1527599 ◽

2005 ◽

Cited By ~ 1

Author(s):

Qing Yang ◽

Fang-Min Li

Keyword(s):

Support Vector Machine ◽

Latent Semantic Indexing ◽

Support Vector ◽

Semantic Indexing ◽

Email Filtering

Download Full-text

Analyzing Large-Scale Proteomics Projects with Latent Semantic Indexing

Journal of Proteome Research ◽

10.1021/pr070461k ◽

2008 ◽

Vol 7 (1) ◽

pp. 182-191 ◽

Cited By ~ 33

Author(s):

Sebastian Klie ◽

Lennart Martens ◽

Juan Antonio Vizcaíno ◽

Richard Côté ◽

Phil Jones ◽

...

Keyword(s):

Large Scale ◽

Latent Semantic Indexing ◽

Semantic Indexing

Download Full-text

A Method Based on Support Vector Machine for Feature Selection of Latent Semantic Features

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.181-182.830 ◽

2011 ◽

Vol 181-182 ◽

pp. 830-835

Author(s):

Min Song Li

Keyword(s):

Support Vector Machine ◽

Text Categorization ◽

Latent Semantic Indexing ◽

Classification Performance ◽

Compact Representation ◽

Support Vector ◽

Semantic Features ◽

Semantic Indexing ◽

Feature Extraction Method ◽

Feature Subspace

Latent Semantic Indexing(LSI) is an effective feature extraction method which can capture the underlying latent semantic structure between words in documents. However, it is probably not the most appropriate for text categorization to use the method to select feature subspace, since the method orders extracted features according to their variance,not the classification power. We proposed a method based on support vector machine to extract features and select a Latent Semantic Indexing that be suited for classification. Experimental results indicate that the method improves classification performance with more compact representation.

Download Full-text

Adaptive label-driven scaling for latent semantic indexing

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '08 ◽

10.1145/1390334.1390525 ◽

2008 ◽

Cited By ~ 1

Author(s):

Xiaojun Quan ◽

Enhong Chen ◽

Qiming Luo ◽

Hui Xiong

Keyword(s):

Latent Semantic Indexing ◽

Semantic Indexing

Download Full-text