Plagiarism Detection Algorithm for Source Code in Computer Science Education

Nowadays, computer programming is getting more necessary in the course of program design in college education. However, the trick of plagiarizing plus a little modification exists among some students' home works. It's not easy for teachers to judge if there's plagiarizing in source code or not. Traditional detection algorithms cannot fit this condition. The author designed an effective and complete method to detect source code plagiarizing according to the popular way of students' plagiarizing. There are two basic concepts of the algorithm. One is to standardize the source code via filtration against to remove the majority noises intentionally blended by plagiarists. The other one is an improved Longest Common Subsequence algorithm for text matching, using statement as the unit for matching. The authors also designed an appropriate HASH function to increase the efficiency of matching. Based on the algorithm, a system was designed and proved to be practical and sufficient, which runs well and meet the practical requirement in application.

Download Full-text

Plagiarism Detection Algorithm for Source Code in Computer Science Education

Scholarly Ethics and Publishing ◽

10.4018/978-1-5225-8057-7.ch017 ◽

2019 ◽

pp. 354-365

Author(s):

Xin Liu ◽

Chan Xu ◽

Boyu Ouyang

Keyword(s):

Hash Function ◽

Source Code ◽

College Education ◽

Detection Algorithm ◽

Longest Common Subsequence ◽

The Other ◽

Plagiarism Detection ◽

Detection Algorithms ◽

Common Subsequence ◽

Basic Concepts

Download Full-text

Batch source-code plagiarism detection using an algorithm for the bounded longest common subsequence problem

2012 9th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE) ◽

10.1109/iceee.2012.6421180 ◽

2012 ◽

Cited By ~ 4

Author(s):

R. A. Castro Campos ◽

F. J. Zaragoza Martinez

Keyword(s):

Source Code ◽

Longest Common Subsequence ◽

Plagiarism Detection ◽

Longest Common Subsequence Problem ◽

Common Subsequence

Download Full-text

A bi-directional approach to comparing the modular structure of networks

EPJ Data Science ◽

10.1140/epjds/s13688-021-00269-8 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Daniel Straulino ◽

Mattie Landman ◽

Neave O’Clery

Keyword(s):

Community Structure ◽

Community Detection ◽

Distance Measure ◽

Detection Algorithm ◽

The Other ◽

Modular Structure ◽

Quality Function ◽

Normalized Mutual Information ◽

Connectivity Structure ◽

Detection Algorithms

AbstractHere we propose a new method to compare the modular structure of a pair of node-aligned networks. The majority of current methods, such as normalized mutual information, compare two node partitions derived from a community detection algorithm yet ignore the respective underlying network topologies. Addressing this gap, our method deploys a community detection quality function to assess the fit of each node partition with respect to the other network’s connectivity structure. Specifically, for two networks A and B, we project the node partition of B onto the connectivity structure of A. By evaluating the fit of B’s partition relative to A’s own partition on network A (using a standard quality function), we quantify how well network A describes the modular structure of B. Repeating this in the other direction, we obtain a two-dimensional distance measure, the bi-directional (BiDir) distance. The advantages of our methodology are three-fold. First, it is adaptable to a wide class of community detection algorithms that seek to optimize an objective function. Second, it takes into account the network structure, specifically the strength of the connections within and between communities, and can thus capture differences between networks with similar partitions but where one of them might have a more defined or robust community structure. Third, it can also identify cases in which dissimilar optimal partitions hide the fact that the underlying community structure of both networks is relatively similar. We illustrate our method for a variety of community detection algorithms, including multi-resolution approaches, and a range of both simulated and real world networks.

Download Full-text

Software Source Code Plagiarism and Direction Detection Based on PDG

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.373-375.1172 ◽

2013 ◽

Vol 373-375 ◽

pp. 1172-1177

Author(s):

Bo Shu ◽

Xiao Jun Du

Keyword(s):

Software Development ◽

Open Source ◽

Open Source Software ◽

Process Design ◽

Source Code ◽

Detection Algorithm ◽

Plagiarism Detection ◽

Software Developers ◽

Development Cycle ◽

Direction Detection

Because of the complexity of the software development, some software developers may plagiarize source code that comes from other projects or open source software in order to shorten development cycle. Usually the copyist would modify and disguise the source code copied to escape plagiarism detection. So far, most algorithms cant completely detect the source disguised by the copyist, especially cant exactly distinguish between the source code and the plagiaristic code. In this paper, we summarize and analyze the effect of disguised source to the detection process, design the strategy to remove the effect of disguised source, and propose a PDG-based software source code plagiarism detection algorithm. The algorithm can detect the existence of disguised source, so as to find out source code plagiarism. And we propose a heuristic rule to make the detection algorithm have the ability to give the plagiarism direction. Any existing algorithm does not have this function. We prove the availability of the algorithm by experiment.

Download Full-text

A Token Oriented Measurement Method of Source Code Similarity

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.668-669.899 ◽

2014 ◽

Vol 668-669 ◽

pp. 899-902 ◽

Cited By ~ 1

Author(s):

Hong Mei Zhu ◽

Liang Zhang ◽

Wei Sun ◽

Yong Xiang Sun

Keyword(s):

Measurement Method ◽

Source Code ◽

Longest Common Subsequence ◽

Practical Application ◽

Examination System ◽

Source Codes ◽

Course Work ◽

Online Submission ◽

Online Examination ◽

Common Subsequence

In order to help teachers to identify plagiarism in student assignment submissions among students’ Source code quickly and accurately, this paper discusses a measurement method of Source code similarity. In the proposed algorithm, firstly, both of token oriented edit distance (TD) and token oriented length of longest common subsequence (TLCSLen) is calculated; secondly, considering the TD and TLCSLen, a similarity calculation formula is given to measure similarity of Source code; Thirdly, a dynamic and variable similarity threshold is set to determine whether there is plagiarism between Source codes, which ensure a relatively reasonable judgment of plagiarism. This method has been applied to the university's programming course work online submission system and online examination system. Practical application results show that this method can identify similar Source code timely, effectively and accurately.

Download Full-text

Handling Illusive Text In Document To Improve Accuracy Of Plagiarism Detection Algorithm

10.31219/osf.io/hq2j8 ◽

2020 ◽

Author(s):

Zahid Iqbal ◽

Shakeeb Murtaza

Keyword(s):

Academic Research ◽

Detection Algorithm ◽

Processing Technique ◽

Plagiarism Detection ◽

Detection Algorithms ◽

Percentile Score ◽

Challenging Tasks ◽

Critical Issues ◽

Critical Problems ◽

Improved Accuracy

Plagiarism Detection is being one of the challenging tasks in academic research world to ensure integrity/authenticity of a document. Currently, many efficient algorithms are available to sufficiently detect the plagiarism in a document. Pre-processing of a document typically remain a master key to achieve maximum stable goal. Although all algorithms, before checking plagiarism, initially perform some sort of pre-processing on documents and convert the document into a particular format like by removing whitespaces and all special characters, etc. In this paper, we focus on two possible techniques, which can be used for plagiarism, which existing plagiarism detection algorithms are omitting. First is replacing the white spaces with a hidden character with white colour (background colour) between consecutive words so apparently, they seem to be distinct words, but algorithm/computer will incorrectly consider them as a single word. So even a 100% copied statement would not be identified as plagiarised content. Second is hiding spam text behind images to falsely report maximum number of words count in a document but as they are hidden so human eye can’t discover them and algorithm will consider them as some words resulting in less percentile score of the plagiarised document. Our proposed (pre-processing) technique can efficiently handle these two critical problems which results in improved accuracy and authenticity of plagiarism checking algorithms. We have compared performance of our algorithm considering these critical issues with other state-of-art algorithm (particularly with Turnitin) and our algorithm handles these issues efficiently.

Download Full-text

Design and Development of a Large Cross-Lingual Plagiarism Corpus for Urdu-English Language Pair

Scientific Programming ◽

10.1155/2019/2962040 ◽

2019 ◽

Vol 2019 ◽

pp. 1-11

Author(s):

Israr Haneef ◽

Rao Muhammad Adeel Nawab ◽

Ehsan Ullah Munir ◽

Imran Sarwar Bajwa

Keyword(s):

English Language ◽

Longest Common Subsequence ◽

Original Text ◽

Plagiarism Detection ◽

Digital Text ◽

Detection Systems ◽

Automatic Translation ◽

Common Subsequence ◽

Cross Lingual ◽

Language Pair

Cross-lingual plagiarism occurs when the source (or original) text(s) is in one language and the plagiarized text is in another language. In recent years, cross-lingual plagiarism detection has attracted the attention of the research community because a large amount of digital text is easily accessible in many languages through online digital repositories and machine translation systems are readily available, making it easier to perform cross-lingual plagiarism and harder to detect it. To develop and evaluate cross-lingual plagiarism detection systems, standard evaluation resources are needed. The majority of earlier studies have developed cross-lingual plagiarism corpora for English and other European language pairs. However, for Urdu-English language pair, the problem of cross-lingual plagiarism detection has not been thoroughly explored although a large amount of digital text is readily available in Urdu and it is spoken in many countries of the world (particularly in Pakistan, India, and Bangladesh). To fulfill this gap, this paper presents a large benchmark cross-lingual corpus for Urdu-English language pair. The proposed corpus contains 2,395 source-suspicious document pairs (540 are automatic translation, 539 are artificially paraphrased, 508 are manually paraphrased, and 808 are nonplagiarized). Furthermore, our proposed corpus contains three types of cross-lingual examples including artificial (automatic translation and artificially paraphrased), simulated (manually paraphrased), and real (nonplagiarized), which have not been previously reported in the development of cross-lingual corpora. Detailed analysis of our proposed corpus was carried out using n-gram overlap and longest common subsequence approaches. Using Word unigrams, mean similarity scores of 1.00, 0.68, 0.52, and 0.22 were obtained for automatic translation, artificially paraphrased, manually paraphrased, and nonplagiarized documents, respectively. These results show that documents in the proposed corpus are created using different obfuscation techniques, which makes the dataset more realistic and challenging. We believe that the corpus developed in this study will help to foster research in an underresourced language of Urdu and will be useful in the development, comparison, and evaluation of cross-lingual plagiarism detection systems for Urdu-English language pair. Our proposed corpus is free and publicly available for research purposes.

Download Full-text

Basic Concepts and Definitions of Allergology

Journal of Immunology and Allergy ◽

10.37191/mapsci-2582-6549-1(1)-004 ◽

2019 ◽

Author(s):

Zoran Vrucinic

Keyword(s):

Immune Reaction ◽

The Body ◽

The Other ◽

Specific Response ◽

Other Hand ◽

Medical Vocabulary ◽

The Future ◽

Basic Concepts ◽

Basic Insight ◽

Insight Into

The future of medicine belongs to immunology and alergology. I tried to not be too wide in description, but on the other hand to mention the most important concepts of alergology to make access to these diseases more understandable, logical and more useful for our patients, that without complex pathophysiology and mechanism of immune reaction,we gain some basic insight into immunological principles. The name allergy to medicine was introduced by Pirquet in 1906, and is of Greek origin (allos-other + ergon-act; different reaction), essentially representing the reaction of an organism to a substance that has already been in contact with it, and manifested as a specific response thatmanifests as either a heightened reaction, a hypersensitivity, or as a reduced reaction immunity. Synonyms for hypersensitivity are: altered reactivity, reaction, hypersensitivity. The word sensitization comes from the Latin (sensibilitas, atis, f.), which means sensibility,sensitivity, and has retained that meaning in medical vocabulary, while in immunology and allergology this term implies the creation of hypersensitivity to an antigen. Antigen comes from the Greek words, anti-anti + genos-genus, the opposite, anti-substance substance that causes the body to produce antibodies.

Download Full-text

XLCS: A New Bit-Parallel Longest Common Subsequence Algorithm on Xeon Phi Clusters

2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS) ◽

10.1109/hpcc/smartcity/dss.2019.00204 ◽

2019 ◽

Author(s):

Zekun Yin ◽

Hao Zhang ◽

Kai Xu ◽

Yuandong Chan ◽

Shaoliang Peng ◽

...

Keyword(s):

Longest Common Subsequence ◽

Xeon Phi ◽

Common Subsequence

Download Full-text

Online Judging Platform Utilizing Dynamic Plagiarism Detection Facilities

Computers ◽

10.3390/computers10040047 ◽

2021 ◽

Vol 10 (4) ◽

pp. 47

Author(s):

Fariha Iffath ◽

A. S. M. Kayes ◽

Md. Tahsin Rahman ◽

Jannatul Ferdows ◽

Mohammad Shamsul Arefin ◽

...

Keyword(s):

Source Code ◽

Large Data ◽

Large Data Sets ◽

Detection Technique ◽

Data Sets ◽

Plagiarism Detection ◽

Source Codes ◽

Efficient Detection ◽

Mathematical Problems ◽

Automatic Scoring

A programming contest generally involves the host presenting a set of logical and mathematical problems to the contestants. The contestants are required to write computer programs that are capable of solving these problems. An online judge system is used to automate the judging procedure of the programs that are submitted by the users. Online judges are systems designed for the reliable evaluation of the source codes submitted by the users. Traditional online judging platforms are not ideally suitable for programming labs, as they do not support partial scoring and efficient detection of plagiarized codes. When considering this fact, in this paper, we present an online judging framework that is capable of automatic scoring of codes by detecting plagiarized contents and the level of accuracy of codes efficiently. Our system performs the detection of plagiarism by detecting fingerprints of programs and using the fingerprints to compare them instead of using the whole file. We used winnowing to select fingerprints among k-gram hash values of a source code, which was generated by the Rabin–Karp Algorithm. The proposed system is compared with the existing online judging platforms to show the superiority in terms of time efficiency, correctness, and feature availability. In addition, we evaluated our system by using large data sets and comparing the run time with MOSS, which is the widely used plagiarism detection technique.

Download Full-text