Plagiarism Detection Algorithm for Source Code in Computer Science Education

2015 ◽  
Vol 13 (4) ◽  
pp. 29-39 ◽  
Author(s):  
Xin Liu ◽  
Chan Xu ◽  
Boyu Ouyang

Nowadays, computer programming is getting more necessary in the course of program design in college education. However, the trick of plagiarizing plus a little modification exists among some students' home works. It's not easy for teachers to judge if there's plagiarizing in source code or not. Traditional detection algorithms cannot fit this condition. The author designed an effective and complete method to detect source code plagiarizing according to the popular way of students' plagiarizing. There are two basic concepts of the algorithm. One is to standardize the source code via filtration against to remove the majority noises intentionally blended by plagiarists. The other one is an improved Longest Common Subsequence algorithm for text matching, using statement as the unit for matching. The authors also designed an appropriate HASH function to increase the efficiency of matching. Based on the algorithm, a system was designed and proved to be practical and sufficient, which runs well and meet the practical requirement in application.

Author(s):  
Xin Liu ◽  
Chan Xu ◽  
Boyu Ouyang

Nowadays, computer programming is getting more necessary in the course of program design in college education. However, the trick of plagiarizing plus a little modification exists among some students' home works. It's not easy for teachers to judge if there's plagiarizing in source code or not. Traditional detection algorithms cannot fit this condition. The author designed an effective and complete method to detect source code plagiarizing according to the popular way of students' plagiarizing. There are two basic concepts of the algorithm. One is to standardize the source code via filtration against to remove the majority noises intentionally blended by plagiarists. The other one is an improved Longest Common Subsequence algorithm for text matching, using statement as the unit for matching. The authors also designed an appropriate HASH function to increase the efficiency of matching. Based on the algorithm, a system was designed and proved to be practical and sufficient, which runs well and meet the practical requirement in application.


2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Daniel Straulino ◽  
Mattie Landman ◽  
Neave O’Clery

AbstractHere we propose a new method to compare the modular structure of a pair of node-aligned networks. The majority of current methods, such as normalized mutual information, compare two node partitions derived from a community detection algorithm yet ignore the respective underlying network topologies. Addressing this gap, our method deploys a community detection quality function to assess the fit of each node partition with respect to the other network’s connectivity structure. Specifically, for two networks A and B, we project the node partition of B onto the connectivity structure of A. By evaluating the fit of B’s partition relative to A’s own partition on network A (using a standard quality function), we quantify how well network A describes the modular structure of B. Repeating this in the other direction, we obtain a two-dimensional distance measure, the bi-directional (BiDir) distance. The advantages of our methodology are three-fold. First, it is adaptable to a wide class of community detection algorithms that seek to optimize an objective function. Second, it takes into account the network structure, specifically the strength of the connections within and between communities, and can thus capture differences between networks with similar partitions but where one of them might have a more defined or robust community structure. Third, it can also identify cases in which dissimilar optimal partitions hide the fact that the underlying community structure of both networks is relatively similar. We illustrate our method for a variety of community detection algorithms, including multi-resolution approaches, and a range of both simulated and real world networks.


2013 ◽  
Vol 373-375 ◽  
pp. 1172-1177
Author(s):  
Bo Shu ◽  
Xiao Jun Du

Because of the complexity of the software development, some software developers may plagiarize source code that comes from other projects or open source software in order to shorten development cycle. Usually the copyist would modify and disguise the source code copied to escape plagiarism detection. So far, most algorithms cant completely detect the source disguised by the copyist, especially cant exactly distinguish between the source code and the plagiaristic code. In this paper, we summarize and analyze the effect of disguised source to the detection process, design the strategy to remove the effect of disguised source, and propose a PDG-based software source code plagiarism detection algorithm. The algorithm can detect the existence of disguised source, so as to find out source code plagiarism. And we propose a heuristic rule to make the detection algorithm have the ability to give the plagiarism direction. Any existing algorithm does not have this function. We prove the availability of the algorithm by experiment.


2014 ◽  
Vol 668-669 ◽  
pp. 899-902 ◽  
Author(s):  
Hong Mei Zhu ◽  
Liang Zhang ◽  
Wei Sun ◽  
Yong Xiang Sun

In order to help teachers to identify plagiarism in student assignment submissions among students’ Source code quickly and accurately, this paper discusses a measurement method of Source code similarity. In the proposed algorithm, firstly, both of token oriented edit distance (TD) and token oriented length of longest common subsequence (TLCSLen) is calculated; secondly, considering the TD and TLCSLen, a similarity calculation formula is given to measure similarity of Source code; Thirdly, a dynamic and variable similarity threshold is set to determine whether there is plagiarism between Source codes, which ensure a relatively reasonable judgment of plagiarism. This method has been applied to the university's programming course work online submission system and online examination system. Practical application results show that this method can identify similar Source code timely, effectively and accurately.


2020 ◽  
Author(s):  
Zahid Iqbal ◽  
Shakeeb Murtaza

Plagiarism Detection is being one of the challenging tasks in academic research world to ensure integrity/authenticity of a document. Currently, many efficient algorithms are available to sufficiently detect the plagiarism in a document. Pre-processing of a document typically remain a master key to achieve maximum stable goal. Although all algorithms, before checking plagiarism, initially perform some sort of pre-processing on documents and convert the document into a particular format like by removing whitespaces and all special characters, etc. In this paper, we focus on two possible techniques, which can be used for plagiarism, which existing plagiarism detection algorithms are omitting. First is replacing the white spaces with a hidden character with white colour (background colour) between consecutive words so apparently, they seem to be distinct words, but algorithm/computer will incorrectly consider them as a single word. So even a 100% copied statement would not be identified as plagiarised content. Second is hiding spam text behind images to falsely report maximum number of words count in a document but as they are hidden so human eye can’t discover them and algorithm will consider them as some words resulting in less percentile score of the plagiarised document. Our proposed (pre-processing) technique can efficiently handle these two critical problems which results in improved accuracy and authenticity of plagiarism checking algorithms. We have compared performance of our algorithm considering these critical issues with other state-of-art algorithm (particularly with Turnitin) and our algorithm handles these issues efficiently.


2019 ◽  
Vol 2019 ◽  
pp. 1-11
Author(s):  
Israr Haneef ◽  
Rao Muhammad Adeel Nawab ◽  
Ehsan Ullah Munir ◽  
Imran Sarwar Bajwa

Cross-lingual plagiarism occurs when the source (or original) text(s) is in one language and the plagiarized text is in another language. In recent years, cross-lingual plagiarism detection has attracted the attention of the research community because a large amount of digital text is easily accessible in many languages through online digital repositories and machine translation systems are readily available, making it easier to perform cross-lingual plagiarism and harder to detect it. To develop and evaluate cross-lingual plagiarism detection systems, standard evaluation resources are needed. The majority of earlier studies have developed cross-lingual plagiarism corpora for English and other European language pairs. However, for Urdu-English language pair, the problem of cross-lingual plagiarism detection has not been thoroughly explored although a large amount of digital text is readily available in Urdu and it is spoken in many countries of the world (particularly in Pakistan, India, and Bangladesh). To fulfill this gap, this paper presents a large benchmark cross-lingual corpus for Urdu-English language pair. The proposed corpus contains 2,395 source-suspicious document pairs (540 are automatic translation, 539 are artificially paraphrased, 508 are manually paraphrased, and 808 are nonplagiarized). Furthermore, our proposed corpus contains three types of cross-lingual examples including artificial (automatic translation and artificially paraphrased), simulated (manually paraphrased), and real (nonplagiarized), which have not been previously reported in the development of cross-lingual corpora. Detailed analysis of our proposed corpus was carried out using n-gram overlap and longest common subsequence approaches. Using Word unigrams, mean similarity scores of 1.00, 0.68, 0.52, and 0.22 were obtained for automatic translation, artificially paraphrased, manually paraphrased, and nonplagiarized documents, respectively. These results show that documents in the proposed corpus are created using different obfuscation techniques, which makes the dataset more realistic and challenging. We believe that the corpus developed in this study will help to foster research in an underresourced language of Urdu and will be useful in the development, comparison, and evaluation of cross-lingual plagiarism detection systems for Urdu-English language pair. Our proposed corpus is free and publicly available for research purposes.


Author(s):  
Zoran Vrucinic

The future of medicine belongs to immunology and alergology. I tried to not be too wide in description, but on the other hand to mention the most important concepts of alergology to make access to these diseases more understandable, logical and more useful for our patients, that without complex pathophysiology and mechanism of immune reaction,we gain some basic insight into immunological principles. The name allergy to medicine was introduced by Pirquet in 1906, and is of Greek origin (allos-other + ergon-act; different reaction), essentially representing the reaction of an organism to a substance that has already been in contact with it, and manifested as a specific response thatmanifests as either a heightened reaction, a hypersensitivity, or as a reduced reaction immunity. Synonyms for hypersensitivity are: altered reactivity, reaction, hypersensitivity. The word sensitization comes from the Latin (sensibilitas, atis, f.), which means sensibility,sensitivity, and has retained that meaning in medical vocabulary, while in immunology and allergology this term implies the creation of hypersensitivity to an antigen. Antigen comes from the Greek words, anti-anti + genos-genus, the opposite, anti-substance substance that causes the body to produce antibodies.


Computers ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 47
Author(s):  
Fariha Iffath ◽  
A. S. M. Kayes ◽  
Md. Tahsin Rahman ◽  
Jannatul Ferdows ◽  
Mohammad Shamsul Arefin ◽  
...  

A programming contest generally involves the host presenting a set of logical and mathematical problems to the contestants. The contestants are required to write computer programs that are capable of solving these problems. An online judge system is used to automate the judging procedure of the programs that are submitted by the users. Online judges are systems designed for the reliable evaluation of the source codes submitted by the users. Traditional online judging platforms are not ideally suitable for programming labs, as they do not support partial scoring and efficient detection of plagiarized codes. When considering this fact, in this paper, we present an online judging framework that is capable of automatic scoring of codes by detecting plagiarized contents and the level of accuracy of codes efficiently. Our system performs the detection of plagiarism by detecting fingerprints of programs and using the fingerprints to compare them instead of using the whole file. We used winnowing to select fingerprints among k-gram hash values of a source code, which was generated by the Rabin–Karp Algorithm. The proposed system is compared with the existing online judging platforms to show the superiority in terms of time efficiency, correctness, and feature availability. In addition, we evaluated our system by using large data sets and comparing the run time with MOSS, which is the widely used plagiarism detection technique.


Sign in / Sign up

Export Citation Format

Share Document