Analysis on the Effect of Term-Document's Matrix to the Accuracy of Latent-Semantic-Analysis-Based Cross-Language Plagiarism Detection

Latent Semantic Analysis (LSA) or Latent Semantic Indexing (LSI), when applied to information retrieval, has been a major analysis approach in text mining. It is an extension of the vector space method in information retrieval, representing documents as numerical vectors but using a more sophisticated mathematical approach to characterize the essential features of the documents and reduce the number of features in the search space. This chapter summarizes several major approaches to this dimensionality reduction, each of which has strengths and weaknesses, and it describes recent breakthroughs and advances. It shows how the constructs and products of LSA applications can be made user-interpretable and reviews applications of LSA beyond information retrieval, in particular, to text information visualization. While the major application of LSA is for text mining, it is also highly applicable to cross-language information retrieval, Web mining, and analysis of text transcribed from speech and textual information in video.

Download Full-text

An Approach to Source-Code Plagiarism Detection and Investigation Using Latent Semantic Analysis

IEEE Transactions on Computers ◽

10.1109/tc.2011.223 ◽

2012 ◽

Vol 61 (3) ◽

pp. 379-394 ◽

Cited By ~ 75

Author(s):

G. Cosma ◽

M. Joy

Keyword(s):

Latent Semantic Analysis ◽

Semantic Analysis ◽

Source Code ◽

Plagiarism Detection

Download Full-text

Development of Cross Language Clone Detector for C, C++ & Java Repositories using Natural Language Processing

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b3612.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 2289-2293

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Code Clones ◽

Code Base ◽

Value Decomposition ◽

Cross Language ◽

Bug Fixes

Reusing the code with or without modification is common process in building all the large codebases of system software like Linux, gcc , and jdk. This process is referred to as software cloning or forking. Developers always find difficulty of bug fixes in porting large code base from one language to other native language during software porting. There exist many approaches in identifying software clones of same language that may not contribute for the developers involved in porting hence there is a need for cross language clone detector. This paper uses primary Natural Language Processing (NLP) approach using latent semantic analysis to find the cross language clones of other neighboring languages in terms of all 4 types of clones using latent semantic analysis algorithm that uses Singular value decomposition. It takes input as code(C, C++ or Java) and matches all the neighboring code clones in the static repository in terms of frequency of lines matched

Download Full-text

Perbandingan Hasil Deteksi Plagiarisme Dokumen dengan Metode Jaro-Winkler Distance dan Metode Latent Semantic Analysis

Jurnal Teknologi dan Sistem Komputer ◽

10.14710/jtsiskom.6.1.2018.7-12 ◽

2018 ◽

Vol 6 (1) ◽

pp. 7-12 ◽

Cited By ~ 1

Author(s):

Tinaliah Tinaliah ◽

Triana Elizabeth

Keyword(s):

Test Data ◽

Latent Semantic Analysis ◽

Semantic Analysis ◽

Analysis Method ◽

Plagiarism Detection ◽

Distance Method ◽

Better Than

Various methods are applied in the application of plagiarism detection to help check the similarity of a document. Jaro-Winkler Distance can measure the distance between two strings. However, this method basically depends on the position of the word. Latent Semantic Analysis emphasizes the words contained in the document regardless of its linguistic character. This study compares the results of plagiarism detection using the Jaro-Winkler Distance and the Latent Semantic Analysis method. From comparing results of Jaro-Winkler Distance method and Latent Semantic Analysis method, Jaro-Winkler Distance method is better than Latent Semantic Analysis method if using the same test data. Jaro-Winkler Distance method will give plagiarism result 100% and Latent Semantic Analysis method will give plagiarism result 97,14%.

Download Full-text

Chinese-English Cross-Language Text Clustering Algorithm Based on Latent Semantic Analysis

10.22323/1.300.0007 ◽

2018 ◽

Author(s):

Huihong Lan ◽

Jinde Huang

Keyword(s):

Latent Semantic Analysis ◽

Clustering Algorithm ◽

Semantic Analysis ◽

Text Clustering ◽

Cross Language ◽

Language Text

Download Full-text