scholarly journals ANALISIS TINGKAT PLAGIASI DOKUMEN SKRIPSI DENGAN METODE COSINE SIMILARITY DAN PEMBOBOTAN TF-IDF

2022 ◽  
Vol 2 (2) ◽  
pp. 90-95
Author(s):  
Muhammad Azmi

Plagiarism is the activity of duplicating or imitating the work of others then recognized as his own work without the author's permission or listing the source. Plagiarism or plagiarism is not something that is difficult to do because by using a copy-paste-modify technique in part or all of the document, the document can be said to be the result of plagiarism or duplication.             The practice of plagiarism occurs because students are accustomed to taking the writings of others without including the source of origin, even copying in its entirety and exactly the same. Plagiarism practices are mostly carried out by students, especially when completing the final project or thesis             One way that can be used to prevent the practice of plagiarism is by doing prevention and detecting. Plagiarism detection uses the concept of similarity or document similarity is one way to detect copy & paste plagiarism and disguised plagiarism. one of the right methods that can be done to detect plagiarism by analyzing the level of document plagiarism using the Cosine Similarity method and the TF-IDF weighting. This research produces an application that is able to process the similarity value of the document to be tested. Hasik testing shows that it is appropriate between manual calculations and implementation of algorithms in the application made. Use of the Literature Library is quite effective in the Stemming process. Calculations that use stemming will have a higher similarity value compared to calculations without stemming methods.

2019 ◽  
Vol 10 (2) ◽  
Author(s):  
Dhamayanti Dhamayanti ◽  
Lidia Permata Sari

<p align="center"><strong>ABSTRACT</strong><em></em></p><p><em>Thesis is a final project that must be taken by students to complete their studies at the Indo Global Mandiri University in Palembang. Thesis data processing and storage, especially in the Information Systems department is still done conventionally, so checking the similarity of the title even the contents of the student thesis is difficult to detect. Difficulties in detecting the title and content of the thesis cause students to easily and freely plagiarize the proposal preparation and thesis report from beginning to end without being known by the lecturer and the Information System department. Plagiarism is the act of a shortcut that steals ideas, takes the work, and recognizes the work of others as their own without including references from the original source. This research will discuss the problem of plagiarism in the Information Systems department through making applications that can detect the plagiarism of titles and contents of the thesis especially in the information systems department so as to overcome the plagiarism problems faced by the information systems department. This plagiarism detection application is built using the cosine similarity method</em><em>. </em><em>Cosine similarity is a method for calculating similarity (level of similarity) between two object. In testing the similarity of documents with the results of the study, cosine similarity has a higher degree of accuracy. Cosine similarity is used to calculate the similarity value by equating said words and become one of the techniques to measure the similarity of popular texts. Plagiarism detection application using cosine similarity method which is implemented with PHP and MySQL as the database can help efforts to reduce the occurrence of plagiarism in the title and contents of the thesis in the Information Systems department.</em></p><pre> </pre><p><em> </em></p><pre><strong><em>Keywords :</em></strong><em> </em><em>P</em><em>lagiarism</em><em>, </em><em>P</em><em>lagiarism </em><em>D</em><em>etection </em><em>A</em><em>pplication</em>, <em>Cosine Similarity, PHP</em></pre><p align="center"><strong>ABSTRAK</strong></p><p><em>Skripsi merupakan tugas akhir yang wajib ditempuh mahasiswa untuk menyelesaikan studi di Universitas Indo Global Mandiri Palembang</em><em>.</em><em> Pengolahan dan penyimpanan data skripsi  khususnya pada program studi Sistem Informasi masih dilakukan secara konvensional, sehingga pengecekan kemiripan judul bahkan isi skripsi mahasiswa sulit untuk dideteksi. Kesulitan pendeteksian judul dan isi skripsi menyebabkan mahasiswa dengan mudah dan bebas melakukan plagiasi pada pembuatan proposal maupun laporan skripsi dari awal hingga akhir tanpa diketahui oleh dosen dan pihak program studi. Plagiasi merupakan tindakan sebuah jalan pintas yang mencuri ide, mengambil hasil karya, dan mengakui hasil karya orang lain sebagai miliknya sendiri tanpa mencantumkan referensi dari sumber aslinya.</em><em> </em><em>Penelitian ini akan membahas permasalahan plagiasi pada program studi Sistem Informasi melalui pembuatan aplikasi yang dapat mendeteksi plagiasi judul dan isi skripsi khusunya pada program studi </em><em>sistem informasi sehingga dapat mengatasi permasalahan plagiasi yang dihadapi oleh program studi sistem informasi. Aplikasi pendeteksi plagiasi ini dibagun  dengan menggunakan metode cosine similarity. Cosine similarity adalah metode untuk menghitung similarity (tingkat kesamaan) antar dua buah objek. Pada pengujian kesamaan dokumen dengan hasil penelitian menunjukkan cosine similarity memiliki tingkat akurasi yang lebih tinggi. Cosine similarity digunakan untuk menghitung nilai kemiripan dengan menyamakan kata perkata dan menjadi salah satu teknik untuk mengukur kemiripan teks yang popular.</em><em> </em><em>Aplikasi pendeteksi plagiasi dengan menggunakan metode </em><em>cosine similarity yang diimplemntasikan dengan PHP dan MySQL sebagai databasenya dapat membantu upaya mengurangi terjadinya plagisi pada judul dan isi skripsi di program studi Sistem Informasi.</em></p><strong><em>Kata kunci</em></strong><em> : Plagiasi, Aplikasi Pendeteksi Plagiasi, Cosine Similarity, PHP</em>


SinkrOn ◽  
2021 ◽  
Vol 5 (2) ◽  
pp. 305-313
Author(s):  
Oppi Anda Resta ◽  
Addin Aditya ◽  
Febry Eka Purwiantono

The main requirement for graduation from students is to make a final scientific paper. One of the factors determining the quality of a student's scientific work is the uniqueness and innovation of the work. This research aims to apply data mining methods to detect similarities in titles, abstracts, or topics of students' final scientific papers so that plagiarism does not occur. In this research, the cosine similarity method is combined with the preprocessing method and TF-IDF to calculate the level of similarity between the title and the abstract of a student's final scientific paper, then the results will be displayed and compared with the existing final project repository based on the threshold value to make a decision whether scientific work can be accepted or rejected. Based on the test data and training data that has been applied to the TF-IDF method, it shows that the percentage level of similarity between the training data document and the test data document is 8%. This shows that the student thesis is still classified as unique and does not contain plagiarism content. The findings of this study can help the university in managing the administration of student theses so that plagiarism does not occur. Furthermore, it is necessary to study further adding methods to increase the accuracy of system performance so that when the process is run the system will work faster and optimally.


2021 ◽  
Vol 5 (2) ◽  
pp. 726
Author(s):  
Indra Mawanta ◽  
T S Gunawan ◽  
Wanayumini Wanayumini

Deli Husada Health Institute is a health campus that has been established for 34 years, currently it has 30000 students, each student at the final level will submit a final project of study program every year, each student before doing his final project report must provide the title of an assignment report. Finally, to the study program, to reduce the level of similarity in the title of the student's final report, the study program usually conducts a manual check, the result that appears is that it is not effective in determining the title of the final project for students, so that it creates quite a lot of similarities between students. So that many final project reports look the same. With the above conditions, the Sentence Similarity Test of the Final Project Title was carried out with the Cosine Similarity Method and TF-IDF Weighting at the Deli Husada Delitua Health Institute Campus. At the end of the test results on the training data against the training data, the results obtained were 43% of the titles in Submitted is not eligible to be submitted again and 53% is eligible to be submitted as the title of the final project because it has high similarities to the title of the final project report. And get the average time 0.12117 in minutes


Author(s):  
Rosihan Ari Yuana ◽  
Dewanto Harjunowibowo ◽  
Nugraha Arif Karyanta ◽  
Cucuk Wawan Budiyanto

Wartegg test is a widely adopted personality evaluation instrument known for its drawing completion technique.  Employee personality data, for instance, can be sorted by the closest similarity with the expected characters. Whereas, Wartegg test plays a significant role in data similarity filtering. Despite the potential contribution of personal characters identification technique, practical guidance is rarely found in the literature. This paper demonstrates the usage of cosine-similarity method for data similarity filtering on Wartegg personality test. The method used in this study is a case study, in which will be selected several Wartegg test subjects. By using the value of each character aspect derived from the Wartegg test, the cosine-similarity value will be calculated against the expected/ideal aspect character. Based on this value, the Wartegg test subjects will be filtered based on similarity to the expected/ideal character aspects. A technical procedure to perform the method is also presented in this paper. In order to find out the effectiveness, sample data scores of each character aspect from five test subjects, and also the ideal scores of the expected characters are given. By using FWAT, a graphical representation of the test subjects' characters to the ideal characters is generated. Then, this graph was compared to the results obtained from the cosine-similarity method. Drawn from the results, the cosine-similarity is effectively applied for Wartegg test data similarity filtering.


2019 ◽  
Vol 8 (1) ◽  
pp. 27-35
Author(s):  
Jans Hendry ◽  
Aditya Rachman ◽  
Dodi Zulherman

In this study, a system has been developed to help detect the accuracy of the reading of the Koran in the Surah Al-Kautsar based on the accuracy of the number and pronunciation of words in one complete surah. This system is very dependent on the accuracy of word segmentation based on envelope signals. The feature extraction method used was Mel Frequency Cepstrum Coefficients (MFCC), while the Cosine Similarity method was used to detect the accuracy of the reading. From 60 data, 30 data were used for training, while the rest were for testing. From each of the 30 training and test data, 15 data were correct readings, and 15 other data were incorrect readings. System accuracy was measured by word-for-word recognition, which results in 100 % of recall and 98.96 % of precision for the training word data, and 100 % of recall and 99.65 % of precision for the test word data. For the overall reading of the surah, there were 15 correct readings and 14 incorrect readings that were recognized correctly.


2021 ◽  
Vol 10 (6) ◽  
pp. 25347-25351
Author(s):  
Shashank Pola ◽  
Venkatesh M ◽  
Ravi Chandra Reddy K ◽  
Indira Priyadarsini P

Together with the fast advancement of continuous expansion and the Internet of E-commerce scope, product quantity, as well as assortment, boost fast. Merchants offer many goods via going shopping customers and websites generally consider a huge amount of moment to discover the products of theirs.Within e-commerce sites, the item rating is among the primary key ingredients of an excellent pc user expertise. Many methods are working with whose users to consider the goods they wish. A comparable item suggestion is among the favorite modes working with whose customers look for items in line with the item scores. In general, the suggestions aren't personalized to a particular pc user. Exploring a great deal of solutions tends to make customers runoff as a result of the info clog but not offering proper reviews for solutions.Traditional algorithms has data sparsity and cold start issues. To overcome these problems we use cosine similarity method to identify the similarity between those vectors. The nearest similar vector ratings will be used during the estimation of the unknown ratings.The proposed methodology records ratings of each product from users and those are represented by a vector, and the cosine similarity is used a measure to identify the similarity between those vectors. The nearest similar vector ratings will be used during the estimation of the unknown ratings.Hence, By using the above approach it can overcome the above problems and also it can achieve high efficiency and accuracy in a simple manner.


2021 ◽  
Vol 8 (2) ◽  
pp. 343
Author(s):  
Eka Larasati Amalia ◽  
Angelita Justien Jumadi ◽  
Irsyad Arif Mashudi ◽  
Dimas Wahyu Wibowo

<p>Dalam konsep <em>e-learning</em> pelaksanaan ujian dilakukan secara online salah satunya ujian esai. Ujian esai online merupakan ujian yang menggunakan metode online dan mewajibkan siswa menjawab dengan kalimat mereka sendiri. Namun, dalam ujian esai online ini memerlukan waktu yang lama untuk mengoreksi jawaban jika dikerjakan secara manual. Agar tidak memakan banyak waktu untuk mengoreksi jawaban siswa maka dalam sistem terdapat penilaian kemiripan jawaban untuk penilaian. Pada penelitian ini dilakukan pembuatan sistem ujian esai online dengan penilaian kemiripan jawaban menggunakan metode <em>Cosine Similarity</em> dan persamaan <em>Term Frequency</em> (TF) untuk menyamakan frekuensi setiap kata yang terdapat dalam kalimat. Suatu faktor yang menentukan bobot kata berdasarkan pada jumlah frekuensi kata dalam sebuah dokumen disebut dengan<em> Term Frequency</em>. Untuk pengujian akurasi metode dilakukan pengujian <em>precision, recall</em>, dan <em>f-measure</em> dan berdasarkan hasil analisis dengan menggunakan metode yang telah dicoba diperoleh rata-rata 81%.</p><p> </p><p><em>Abstract</em></p><p> </p><p><em>In the e-learning concept, the implementation of exams is carried out online, one of which is an essay exam. The online essay exam is an exam that uses an online method and requires students to answer in their own sentences. However, in this online essay exam, it takes a long time to correct answers if done manually. In order not to take a lot of time to correct student answers, the system has an assessment of the similarity of answers for the assessment. In this study, an online essay exam system was made with the similarity of answers using the Cosine Similarity method and the Term Frequency (TF) equation to equalize the frequency of each word contained in a sentence. Term Frequency is a factor that determines word weight based on the number of word frequencies in a document. To test the accuracy of the method, precision, recall, and f-measure tests were carried out and based on the results of the analysis using the method that had been tried, an average of 81% was obtained.</em></p>


Sign in / Sign up

Export Citation Format

Share Document