scholarly journals Combination of levenshtein distance and rabin-karp to improve the accuracy of document equivalence level

2018 ◽  
Vol 7 (2.27) ◽  
pp. 17 ◽  
Author(s):  
Andysah Putera Utama Siahaan ◽  
Solly Aryza ◽  
Eko Hariyanto ◽  
Rusiadi . ◽  
Andre Hasudungan Lubis ◽  
...  

Rabin Karp algorithm is a search algorithm that searches for a substring pattern in a text using hashing. It is beneficial for matching words with many patterns. One of the practical applications of Rabin Karp's algorithm is in the detection of plagiarism. Michael O. Rabin and Richard M. Karp invented the algorithm. This algorithm performs string search by using a hash function. A hash function is the values that are compared between two documents to determine the level of similarity of the document. Rabin-Karp algorithm is not very good for single pattern text search. This algorithm is perfect for multiple pattern search. The Levenshtein algorithm can be used to replace the hash calculation on the Rabin-Karp algorithm. The hash calculation on Rabin-Karp only counts the number of hashes that have the same value in both documents. Using the Levenshtein algorithm, the calculation of the hash distance in both documents will result in better accuracy.  

2020 ◽  
Vol 3 (1) ◽  
pp. 9
Author(s):  
Herman Herman ◽  
Lukman Syafie ◽  
Tasmil Tasmil ◽  
Muhammad Resha

Plagiarism is the use of data, language and writing without including the original author or source. The place where palgiate practice occurs most often is the academic environment. In the academic world, the most frequently plagiarized thing is scientific work, for example thesis. To minimize the practice of plagiarism, it is not enough to just remind students. Therefore we need a system or application that can help in measuring the level of similarity of student thesis proposals in order to minimize plagiarism practice. In computer science, the Rabin-Karp algorithm can be used in measuring the level of similarity of texts. The Rabin-Karp algorithm is a string matching algorithm that uses a hash function as a comparison between the search string (m) and substrings in text (n). The Rabin-Karp algorithm is a string search algorithm that can work for large data sizes. The test results show that the use of values on k-gram has an effect on the results of the measurement of similarity levels. In addition, it was also found that the use of the value 5 on k-gram was faster in executing than the values 4 and 6.


1988 ◽  
Vol VIII (3) ◽  
pp. 87-97
Author(s):  
P. Wood ◽  
D. Turcaso

2018 ◽  
Vol 1 (1) ◽  
Author(s):  
Danny Steveson ◽  
Halim Agung ◽  
Fendra Mulia

Plagiarism is a very frequent problem in all aspects of one occurring in school. There is often plagiarism on the content of the papers or assignments collected by the students. This is to support the decreasing creativity of students in giving ideas and personal opinions on the task given. To answer the above problems then this research using Rabin-Karp algorithm. Rabin-Karp algorithm is a string search algorithm that uses hashing to find one of a series of string patterns in text. Using this application, the user can compare document 1 with another document, which gives results in sentence similarity, then spelled out per word, followed by per hashing and is calculated from the average number of percentages. The test in this research is done by taking samples 50 times and in comparison between percentage with Rabin Karp algorithm and percentage with manual taking. Testing is done by comparing one document with another document. Based on the result of the research, it can be concluded by using Rabin Karp Algorithm, which can be implemented in plagiarism application evidenced by the test using 50 test samples with 43 samples of success of 14.22%.<br />Keywords: document , Rabin Karp Algorithm, Dice Sorensen Index, Plagiarism, sentence, word


Recent applications of conventional iterative coordinate descent (ICD) algorithms to multislice helical CT reconstructions have shown that conventional ICD can greatly improve image quality by increasing resolution as well as reducing noise and some artifacts. However, high computational cost and long reconstruction times remain as a barrier to the use of conventional algorithm in the practical applications. Among the various iterative methods that have been studied for conventional, ICD has been found to have relatively low overall computational requirements due to its fast convergence. This paper presents a fast model-based iterative reconstruction algorithm using spatially nonhomogeneous ICD (NH-ICD) optimization. The NH-ICD algorithm speeds up convergence by focusing computation where it is most needed. The NH-ICD algorithm has a mechanism that adaptively selects voxels for update. First, a voxel selection criterion VSC determines the voxels in greatest need of update. Then a voxel selection algorithm VSA selects the order of successive voxel updates based upon the need for repeated updates of some locations, while retaining characteristics for global convergence. In order to speed up each voxel update, we also propose a fast 3-D optimization algorithm that uses a quadratic substitute function to upper bound the local 3-D objective function, so that a closed form solution can be obtained rather than using a computationally expensive line search algorithm. The experimental results show that the proposed method accelerates the reconstructions by roughly a factor of three on average for typical 3-D multislice geometries.


Author(s):  
S. Salehi ◽  
M. Karami ◽  
R. Fensholt

Lichens are the dominant autotrophs of polar and subpolar ecosystems commonly encrust the rock outcrops. Spectral mixing of lichens and bare rock can shift diagnostic spectral features of materials of interest thus leading to misinterpretation and false positives if mapping is done based on perfect spectral matching methodologies. Therefore, the ability to distinguish the lichen coverage from rock and decomposing a mixed pixel into a collection of pure reflectance spectra, can improve the applicability of hyperspectral methods for mineral exploration. The objective of this study is to propose a robust lichen index that can be used to estimate lichen coverage, regardless of the mineral composition of the underlying rocks. The performance of three index structures of ratio, normalized ratio and subtraction have been investigated using synthetic linear mixtures of pure rock and lichen spectra with prescribed mixing ratios. Laboratory spectroscopic data are obtained from lichen covered samples collected from Karrat, Liverpool Land, and Sisimiut regions in Greenland. The spectra are then resampled to Hyperspectral Mapper (HyMAP) resolution, in order to further investigate the functionality of the indices for the airborne platform. In both resolutions, a Pattern Search (PS) algorithm is used to identify the optimal band wavelengths and bandwidths for the lichen index. The results of our band optimization procedure revealed that the ratio between R<sub>894-1246</sub> and R<sub>1110</sub> explains most of the variability in the hyperspectral data at the original laboratory resolution (R<sup>2</sup>=0.769). However, the normalized index incorporating R<sub>1106-1121</sub> and R<sub>904-1251</sub> yields the best results for the HyMAP resolution (R<sup>2</sup>=0.765).


Sign in / Sign up

Export Citation Format

Share Document