scholarly journals Deteksi Plagiarisme Abstrak Skripsi dengan Menggunakan Algoritma Rabin Karp (Studi Kasus: Fakultas Ilmu Komputer Universitas Singaperbangsa Karawang)

Author(s):  
Indra Gunawan

Abstract— Pada lingkup pendidikan khususnya perguruan tinggi plagiarisme sering terlihat. Umumnya plagiarisme terjadi karena rasa malas dan ingin cepat dalam menyelesaikan urusan tugasnya. Algoritma Rabin Karp merupakan algoritma pencarian string, algoritma ini digunakan untuk mendeteksi plagiarisme pada teks. Tujuan penelitian mengetahui hasil evaluasi yang didapat dari proses Algoritma Rabin Karp. Data penelitian akan melewati semua tahapan preprocessing (case folding, tokenizing, filtering, dan stemming) dan melewati sebagian tahapan preprocessing (case folding), k-gram yang diuji yaitu 2gram, 3gram, 4gram, 5gram, dan 6gram kemudian melewati tahapan hashing dan mendapatkan nilai fingerprint kemudian diuji tingkat kemiripannya menggunakan Dice Similarity Coeffcient. Metode penelitian yang digunakan yaitu metode Text Mining yang memiliki tahapan Akuisisi, Text Preprocessing, Modeling, dan Evaluasi. Dari data yang digunakan menghasilkan nilai rata-rata total kemiripan 86.84% pada 2gram, 69.56% pada 3gram, 56.06% pada 4gram, 48.71% pada 5gram, dan 44.30% pada 6gram. hasil dari tahapan Preprocessing dengan hasil dari tahapan Sebagian Preprocessing, memiliki perbedaan yaitu, hasil tahapan Preprocessing lebih kecil persentase kemiripannya daripada hasil sebagian Preprocessing, ini disebabkan penghilangan kata pada tahapan filtering dan perubahan kata pada tahapan stemming. Dapat disimpulkan bahwa dari data yang digunakan terlihat adanya tindakan plagiarisme pada abstrak, hal ini didukung dengan adanya data yang memiliki nilai kemiripan hingga 100%.

2018 ◽  
Vol 197 ◽  
pp. 03019
Author(s):  
Yan Watequlis Syaifudin ◽  
Pramana Yoga Saputra ◽  
Dwi Puspitasari

The plagiarism of scientific work, especially undergraduate thesis, mostly happened in the college. In this research we used text mining, a new method which can be used to do the checking procedure, to obtain specific pattern of the document. After obtaining the document pattern, we compare the pattern with another document pattern. If the level of pattern similarity is high, it can be suspected as plagiarism. This paper will explain the development of the text preprocessing, a part of text mining. We choosed Nazief and Adriani Algorithm as a text preprocessing algorithm for this research. This research will result a text preprocessing web service. The web service is expected to be used for further development of text mining.


2020 ◽  
pp. 109442812097168
Author(s):  
Louis Hickman ◽  
Stuti Thapa ◽  
Louis Tay ◽  
Mengyang Cao ◽  
Padmini Srinivasan

Recent advances in text mining have provided new methods for capitalizing on the voluminous natural language text data created by organizations, their employees, and their customers. Although often overlooked, decisions made during text preprocessing affect whether the content and/or style of language are captured, the statistical power of subsequent analyses, and the validity of insights derived from text mining. Past methodological articles have described the general process of obtaining and analyzing text data, but recommendations for preprocessing text data were inconsistent. Furthermore, primary studies use and report different preprocessing techniques. To address this, we conduct two complementary reviews of computational linguistics and organizational text mining research to provide empirically grounded text preprocessing decision-making recommendations that account for the type of text mining conducted (i.e., open or closed vocabulary), the research question under investigation, and the data set’s characteristics (i.e., corpus size and average document length). Notably, deviations from these recommendations will be appropriate and, at times, necessary due to the unique characteristics of one’s text data. We also provide recommendations for reporting text mining to promote transparency and reproducibility.


Author(s):  
Đorđe Petrović ◽  
Milena Stanković

Text mining to a great extent depends on the various text preprocessing techniques. The preprocessing methods and tools which are used to prepare texts for further mining can be divided into those which are and those which are not language-dependent. The subject matter of this research was the analysis of the influence of these methods and tools on further text mining. We first focused on the analysis of the influence on the reduction of the vector space model for the multidimensional represen-tation of text documents. We then analyzed the influence on calculating text similarity, which is the focus of this research. The conclusion we reached is that the implemen-tation of various text preprocessing methods in the Serbian language, which are used for the reduction of the vector space model for the multidimensional representation of text document, achieves the required results. But, the implementation of various text preprocessing methods specific to the Serbian language for the purpose of calculating text similarity can lead to great differences in the results.


2013 ◽  
Author(s):  
Ronald N. Kostoff ◽  
◽  
Henry A. Buchtel ◽  
John Andrews ◽  
Kirstin M. Pfiel

2020 ◽  
Vol 42 (5) ◽  
pp. 279-307
Author(s):  
Yonglim Joe
Keyword(s):  

2019 ◽  
Vol 19 (2) ◽  
pp. 29-38
Author(s):  
Young-Hee Kim ◽  
◽  
Taek-Hyun Lee ◽  
Jong-Myoung Kim ◽  
Won-Hyung Park ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document