Application of Java Relationship Graphs (JRG) to plagiarism detection in Java Projects: A Neo4j Graph Database Approach

Two different paradigms in the field of plagiarism detection resulting in External Plagiarism Detection (EPD) and Intrinsic Plagiarism Detection (IPD) systems. The most common applied system is EPD, which requires its algorithm to make a heuristic comparison between a suspicious document with documents in a corpus. In contrast, given a suspicious document only, an algorithm of IPD should be able to find the plagiarism section by looking for text segments having different writing styles. Previous researches for Indonesian texts fell only in the field of the EPD development system. Therefore, this research focuses on and contributes to experimenting and analyzing the stylometric features and segmentation strategies to build an IPD system for Indonesian texts. The experimentation results show that the paragraph segment performs better by scoring 0.92 for Macro Averaged-Accuracy and 0.54 for Macro Averaged-F1. The stylometric features achieving the highest scores of F-1 and Accuracy are the frequency of punctuation, the average paragraph length, and the type-token ratio.

Download Full-text

RAKING: An Efficient K-Maximal Frequent Pattern Mining Algorithm on Uncertain Graph Database

Chinese Journal of Computers ◽

10.3724/sp.j.1016.2010.01387 ◽

2010 ◽

Vol 33 (8) ◽

pp. 1387-1395 ◽

Cited By ~ 4

Author(s):

Meng HAN ◽

Wei ZHANG ◽

Jian-Zhong LI

Keyword(s):

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Graph Database ◽

Uncertain Graph ◽

Mining Algorithm ◽

Maximal Frequent Pattern

Download Full-text

Revisiting the Challenges and Opportunities in Software Plagiarism Detection

2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER) ◽

10.1109/saner48275.2020.9054847 ◽

2020 ◽

Author(s):

Xi Xu ◽

Ming Fan ◽

Ang Jia ◽

Yin Wang ◽

Zheng Yan ◽

...

Keyword(s):

Plagiarism Detection ◽

Challenges And Opportunities

Download Full-text

Online Judging Platform Utilizing Dynamic Plagiarism Detection Facilities

Computers ◽

10.3390/computers10040047 ◽

2021 ◽

Vol 10 (4) ◽

pp. 47

Author(s):

Fariha Iffath ◽

A. S. M. Kayes ◽

Md. Tahsin Rahman ◽

Jannatul Ferdows ◽

Mohammad Shamsul Arefin ◽

...

Keyword(s):

Source Code ◽

Large Data ◽

Large Data Sets ◽

Detection Technique ◽

Data Sets ◽

Plagiarism Detection ◽

Source Codes ◽

Efficient Detection ◽

Mathematical Problems ◽

Automatic Scoring

A programming contest generally involves the host presenting a set of logical and mathematical problems to the contestants. The contestants are required to write computer programs that are capable of solving these problems. An online judge system is used to automate the judging procedure of the programs that are submitted by the users. Online judges are systems designed for the reliable evaluation of the source codes submitted by the users. Traditional online judging platforms are not ideally suitable for programming labs, as they do not support partial scoring and efficient detection of plagiarized codes. When considering this fact, in this paper, we present an online judging framework that is capable of automatic scoring of codes by detecting plagiarized contents and the level of accuracy of codes efficiently. Our system performs the detection of plagiarism by detecting fingerprints of programs and using the fingerprints to compare them instead of using the whole file. We used winnowing to select fingerprints among k-gram hash values of a source code, which was generated by the Rabin–Karp Algorithm. The proposed system is compared with the existing online judging platforms to show the superiority in terms of time efficiency, correctness, and feature availability. In addition, we evaluated our system by using large data sets and comparing the run time with MOSS, which is the widely used plagiarism detection technique.

Download Full-text

Cross-Language Source Code Plagiarism Detection using Explicit Semantic Analysis and Scored Greedy String Tilling

Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 ◽

10.1145/3383583.3398594 ◽

2020 ◽

Cited By ~ 1

Author(s):

Tomáš Foltýnek ◽

Richard Všianský ◽

Norman Meuschke ◽

Dita Dlabolová ◽

Bela Gipp

Keyword(s):

Semantic Analysis ◽

Source Code ◽

Plagiarism Detection ◽

Cross Language ◽

Explicit Semantic Analysis

Download Full-text

Design and implementation of express logistics system based on graph database and Baidu map

2020 2nd International Conference on Information Technology and Computer Application (ITCA) ◽

10.1109/itca52113.2020.00089 ◽

2020 ◽

Author(s):

Zhang Xiaoliang ◽

Zeng Qingtao ◽

Tang Mingjie ◽

Huang Hui

Keyword(s):

Graph Database ◽

Logistics System ◽

Design And Implementation

Download Full-text

Advantages of using graph databases to explore chromatin conformation capture experiments

BMC Bioinformatics ◽

10.1186/s12859-020-03937-0 ◽

2021 ◽

Vol 22 (S2) ◽

Author(s):

Daniele D’Agostino ◽

Pietro Liò ◽

Marco Aldinucci ◽

Ivan Merelli

Keyword(s):

Web Application ◽

High Throughput Sequencing ◽

Cell Types ◽

Graph Database ◽

Graph Databases ◽

Sources Of Information ◽

Chromosome Conformation ◽

Wide Scale ◽

User Friendly ◽

Different Cell Types

Abstract Background High-throughput sequencing Chromosome Conformation Capture (Hi-C) allows the study of DNA interactions and 3D chromosome folding at the genome-wide scale. Usually, these data are represented as matrices describing the binary contacts among the different chromosome regions. On the other hand, a graph-based representation can be advantageous to describe the complex topology achieved by the DNA in the nucleus of eukaryotic cells. Methods Here we discuss the use of a graph database for storing and analysing data achieved by performing Hi-C experiments. The main issue is the size of the produced data and, working with a graph-based representation, the consequent necessity of adequately managing a large number of edges (contacts) connecting nodes (genes), which represents the sources of information. For this, currently available graph visualisation tools and libraries fall short with Hi-C data. The use of graph databases, instead, supports both the analysis and the visualisation of the spatial pattern present in Hi-C data, in particular for comparing different experiments or for re-mapping omics data in a space-aware context efficiently. In particular, the possibility of describing graphs through statistical indicators and, even more, the capability of correlating them through statistical distributions allows highlighting similarities and differences among different Hi-C experiments, in different cell conditions or different cell types. Results These concepts have been implemented in NeoHiC, an open-source and user-friendly web application for the progressive visualisation and analysis of Hi-C networks based on the use of the Neo4j graph database (version 3.5). Conclusion With the accumulation of more experiments, the tool will provide invaluable support to compare neighbours of genes across experiments and conditions, helping in highlighting changes in functional domains and identifying new co-organised genomic compartments.

Download Full-text

Parallel Betweenness Computation in Graph Database for Contingency Selection

2020 IEEE Power & Energy Society General Meeting (PESGM) ◽

10.1109/pesgm41954.2020.9281492 ◽

2020 ◽

Author(s):

Yongli Zhu ◽

Renchang Dai ◽

Guangyi Liu

Keyword(s):

Graph Database

Download Full-text

Application of Java Relationship Graphs (JRG) to plagiarism detection in Java Projects: A Neo4j Graph Database Approach

Transforming Product Catalogue Relational into Graph Database: a Performance Comparison

Plagiarism Detection and Avoidance Consequences in Academic World

Analysis of Stylometric Features and Segmentation Strategies in Intrinsic Plagiarism Detection System

RAKING: An Efficient K-Maximal Frequent Pattern Mining Algorithm on Uncertain Graph Database

Revisiting the Challenges and Opportunities in Software Plagiarism Detection

Online Judging Platform Utilizing Dynamic Plagiarism Detection Facilities

Cross-Language Source Code Plagiarism Detection using Explicit Semantic Analysis and Scored Greedy String Tilling

Design and implementation of express logistics system based on graph database and Baidu map

Advantages of using graph databases to explore chromatin conformation capture experiments

Parallel Betweenness Computation in Graph Database for Contingency Selection

Export Citation Format