Deteksi Plagiarisme Abstrak Skripsi dengan Menggunakan Algoritma Rabin Karp (Studi Kasus: Fakultas Ilmu Komputer Universitas Singaperbangsa Karawang)

Indra Gunawan

doi:10.36805/technoxplore.v6i2.1227

Deteksi Plagiarisme Abstrak Skripsi dengan Menggunakan Algoritma Rabin Karp (Studi Kasus: Fakultas Ilmu Komputer Universitas Singaperbangsa Karawang)

Techno Xplore : Jurnal Ilmu Komputer dan Teknologi Informasi ◽

10.36805/technoxplore.v6i2.1227 ◽

2021 ◽

Vol 6 (2) ◽

pp. 75-81

Author(s):

Indra Gunawan

Keyword(s):

Text Mining ◽

Text Preprocessing

Abstract— Pada lingkup pendidikan khususnya perguruan tinggi plagiarisme sering terlihat. Umumnya plagiarisme terjadi karena rasa malas dan ingin cepat dalam menyelesaikan urusan tugasnya. Algoritma Rabin Karp merupakan algoritma pencarian string, algoritma ini digunakan untuk mendeteksi plagiarisme pada teks. Tujuan penelitian mengetahui hasil evaluasi yang didapat dari proses Algoritma Rabin Karp. Data penelitian akan melewati semua tahapan preprocessing (case folding, tokenizing, filtering, dan stemming) dan melewati sebagian tahapan preprocessing (case folding), k-gram yang diuji yaitu 2gram, 3gram, 4gram, 5gram, dan 6gram kemudian melewati tahapan hashing dan mendapatkan nilai fingerprint kemudian diuji tingkat kemiripannya menggunakan Dice Similarity Coeffcient. Metode penelitian yang digunakan yaitu metode Text Mining yang memiliki tahapan Akuisisi, Text Preprocessing, Modeling, dan Evaluasi. Dari data yang digunakan menghasilkan nilai rata-rata total kemiripan 86.84% pada 2gram, 69.56% pada 3gram, 56.06% pada 4gram, 48.71% pada 5gram, dan 44.30% pada 6gram. hasil dari tahapan Preprocessing dengan hasil dari tahapan Sebagian Preprocessing, memiliki perbedaan yaitu, hasil tahapan Preprocessing lebih kecil persentase kemiripannya daripada hasil sebagian Preprocessing, ini disebabkan penghilangan kata pada tahapan filtering dan perubahan kata pada tahapan stemming. Dapat disimpulkan bahwa dari data yang digunakan terlihat adanya tindakan plagiarisme pada abstrak, hal ini didukung dengan adanya data yang memiliki nilai kemiripan hingga 100%.

Get full-text (via PubEx)

A Text Preprocessing Framework for Text Mining on Big Data Infrastructure

2018 2nd International Conference on Imaging, Signal Processing and Communication (ICISPC) ◽

10.1109/icispc44900.2018.9006718 ◽

2018 ◽

Author(s):

Watcharaporn Sriyanong ◽

Nunnapus Moungmingsuk ◽

Nattawat Khamphakdee

Keyword(s):

Big Data ◽

Text Mining ◽

Data Infrastructure ◽

Text Preprocessing

Get full-text (via PubEx)

The implementation of web service based text preprocessing to measure Indonesian student thesis similarity level

MATEC Web of Conferences ◽

10.1051/matecconf/201819703019 ◽

2018 ◽

Vol 197 ◽

pp. 03019

Author(s):

Yan Watequlis Syaifudin ◽

Pramana Yoga Saputra ◽

Dwi Puspitasari

Keyword(s):

Text Mining ◽

Web Service ◽

Scientific Work ◽

Specific Pattern ◽

Pattern Similarity ◽

Text Preprocessing ◽

Checking Procedure ◽

Further Development ◽

Undergraduate Thesis ◽

Student Thesis

The plagiarism of scientific work, especially undergraduate thesis, mostly happened in the college. In this research we used text mining, a new method which can be used to do the checking procedure, to obtain specific pattern of the document. After obtaining the document pattern, we compare the pattern with another document pattern. If the level of pattern similarity is high, it can be suspected as plagiarism. This paper will explain the development of the text preprocessing, a part of text mining. We choosed Nazief and Adriani Algorithm as a text preprocessing algorithm for this research. This research will result a text preprocessing web service. The web service is expected to be used for further development of text mining.

Get full-text (via PubEx)

Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations

Organizational Research Methods ◽

10.1177/1094428120971683 ◽

2020 ◽

pp. 109442812097168

Author(s):

Louis Hickman ◽

Stuti Thapa ◽

Louis Tay ◽

Mengyang Cao ◽

Padmini Srinivasan

Keyword(s):

Text Mining ◽

Computational Linguistics ◽

Statistical Power ◽

Research Question ◽

Research Review ◽

Text Data ◽

General Process ◽

Natural Language Text ◽

Corpus Size ◽

Text Preprocessing

Recent advances in text mining have provided new methods for capitalizing on the voluminous natural language text data created by organizations, their employees, and their customers. Although often overlooked, decisions made during text preprocessing affect whether the content and/or style of language are captured, the statistical power of subsequent analyses, and the validity of insights derived from text mining. Past methodological articles have described the general process of obtaining and analyzing text data, but recommendations for preprocessing text data were inconsistent. Furthermore, primary studies use and report different preprocessing techniques. To address this, we conduct two complementary reviews of computational linguistics and organizational text mining research to provide empirically grounded text preprocessing decision-making recommendations that account for the type of text mining conducted (i.e., open or closed vocabulary), the research question under investigation, and the data set’s characteristics (i.e., corpus size and average document length). Notably, deviations from these recommendations will be appropriate and, at times, necessary due to the unique characteristics of one’s text data. We also provide recommendations for reporting text mining to promote transparency and reproducibility.

Get full-text (via PubEx)

THE INFLUENCE OF TEXT PREPROCESSING METHODS AND TOOLS ON CALCULATING TEXT SIMILARITY

Facta Universitatis Series Mathematics and Informatics ◽

10.22190/fumi1905973d ◽

2019 ◽

pp. 973

Author(s):

Đorđe Petrović ◽

Milena Stanković

Keyword(s):

Text Mining ◽

Vector Space ◽

Vector Space Model ◽

Text Similarity ◽

Text Documents ◽

Space Model ◽

Text Document ◽

The Subject ◽

Text Preprocessing ◽

Multidimensional Representation

Text mining to a great extent depends on the various text preprocessing techniques. The preprocessing methods and tools which are used to prepare texts for further mining can be divided into those which are and those which are not language-dependent. The subject matter of this research was the analysis of the inﬂuence of these methods and tools on further text mining. We ﬁrst focused on the analysis of the inﬂuence on the reduction of the vector space model for the multidimensional represen-tation of text documents. We then analyzed the inﬂuence on calculating text similarity, which is the focus of this research. The conclusion we reached is that the implemen-tation of various text preprocessing methods in the Serbian language, which are used for the reduction of the vector space model for the multidimensional representation of text document, achieves the required results. But, the implementation of various text preprocessing methods speciﬁc to the Serbian language for the purpose of calculating text similarity can lead to great diﬀerences in the results.

Get full-text (via PubEx)