Corrigendum to “Cross-Language Interoperability in a Multi-Language Runtime”, by Grimmer et al., ACM Transactions on Programming Languages and Systems (TOPLAS) Volume 40, Issue 2, Article No. 8

Source code similarity detection has various applications in code plagiarism detection and software intellectual property protection. In computer programming teaching, students may convert the source code written in one programming language into another language for their code assignment submission. Existing similarity measures of source code written in the same language are not applicable for the cross-language code similarity detection because of syntactic differences among different programming languages. Meanwhile, existing cross-language source similarity detection approaches are susceptible to complex code obfuscation techniques, such as replacing equivalent control structure and adding redundant statements. To solve this problem, we propose a cross-language code similarity detection (CLCSD) approach based on code flowcharts. In general, two source code fragments written in different programming languages are transformed into standardized code flowcharts (SCFC), and their similarity is obtained by measuring their corresponding SCFC. More specifically, we first introduce the standardized code flowchart (SCFC) model to be the uniform flowcharts representation of source code written in different languages. SCFC is language-independent, and therefore, it can be used as the intermediate structure for source code similarity detection. Meanwhile, transformation techniques are given to transform source code written in a specific programming language into an SCFC. Second, we propose the SCFC-SPGK algorithm based on the shortest path graph kernel to measure the similarity between two SCFCs. Thus, the similarity between two pieces of source code in different programming languages is given by the similarity between SCFCs. Experimental results show that compared with existing approaches, CLCSD has higher accuracy in cross-language source code similarity detection. Furthermore, CLCSD cannot only handle common source code obfuscation techniques used by students in computer programming teaching but also obtain nearly 90% accuracy in dealing with some complex obfuscation techniques.

Download Full-text

High-performance cross-language interoperability in a multi-language runtime

ACM SIGPLAN Notices ◽

10.1145/2936313.2816714 ◽

2016 ◽

Vol 51 (2) ◽

pp. 78-90 ◽

Cited By ~ 4

Author(s):

Matthias Grimmer ◽

Chris Seaton ◽

Roland Schatz ◽

Thomas Würthinger ◽

Hanspeter Mössenböck

Keyword(s):

High Performance ◽

Language Interoperability ◽

Cross Language

Download Full-text

JNI light: an operational model for the core JNI

Mathematical Structures in Computer Science ◽

10.1017/s0960129513000042 ◽

2014 ◽

Vol 25 (4) ◽

pp. 805-840 ◽

Cited By ~ 1

Author(s):

GANG TAN

Keyword(s):

Programming Languages ◽

Exception Handling ◽

Formal Specifications ◽

Operational Model ◽

The Core ◽

Safety And Reliability ◽

Formal Basis ◽

High Level ◽

Cross Language ◽

Java Native Interface

Through foreign function interfaces (FFIs), software components in different programming languages interact with each other in the same address space. Recent years have witnessed a number of systems that analyse FFIs for safety and reliability. However, lack of formal specifications of FFIs hampers progress in this endeavour. We present a formal operational model, Java Native Interface (JNI) light (JNIL), for a subset of a widely used FFI – the Java Native Interface (JNI). JNIL focuses on the core issues when a high-level garbage-collected language interacts with a low-level language. It proposes abstractions for handling a shared heap, cross-language method calls, cross-language exception handling, and garbage collection. JNIL can directly serve as a formal basis for JNI tools and systems. We demonstrate its utility by proving soundness of a system that checks native code in JNI programs for type-unsafe use of JNI functions. The abstractions in JNIL are also useful when modelling other FFIs, such as the Python/C interface and the OCaml/C interface.

Download Full-text

High-performance cross-language interoperability in a multi-language runtime

Proceedings of the 11th Symposium on Dynamic Languages - DLS 2015 ◽

10.1145/2816707.2816714 ◽

2015 ◽

Cited By ~ 17

Author(s):

Matthias Grimmer ◽

Chris Seaton ◽

Roland Schatz ◽

Thomas Würthinger ◽

Hanspeter Mössenböck

Keyword(s):

High Performance ◽

Language Interoperability ◽

Cross Language

Download Full-text

TF-IDF Inspired Detection for Cross-Language Source Code Plagiarism and Collusion

Computer Science ◽

10.7494/csci.2020.21.1.3389 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Oscar Karnalim

Keyword(s):

Programming Languages ◽

Programming Language ◽

Source Code ◽

Development Stage ◽

Detection Technique ◽

Cross Language ◽

Language Conversion ◽

The Impact

Several computing courses allow students to choose which programming language they want to use for completing a programming task. This can lead to cross-language code plagiarism and collusion, in which the copied code file is rewritten in another programming language. In response to that, this paper proposes a detection technique which is able to accurately compare code files written in various programming languages, but with limited effort in accommodating such languages at development stage. The only language-dependent feature used in the technique is source code tokeniser and no code conversion is applied. The impact of coincidental similarity is reduced by applying a TF-IDF inspired weighting, in which rare matches are prioritised. Our evaluation shows that the technique outperforms common techniques in academia for handling language conversion disguises. Further, it is comparable to those techniques when dealing with conventional disguises.

Download Full-text

Cross-Language Interoperability in a Multi-Language Runtime

ACM Transactions on Programming Languages and Systems ◽

10.1145/3201898 ◽

2018 ◽

Vol 40 (2) ◽

pp. 1-43 ◽

Cited By ~ 3

Author(s):

Matthias Grimmer ◽

Roland Schatz ◽

Chris Seaton ◽

Thomas Würthinger ◽

Mikel Luján ◽

...

Keyword(s):

Language Interoperability ◽

Cross Language

Download Full-text

ArrowSAM: In-Memory Genomics Data Processing Using Apache Arrow

10.1101/741843 ◽

2019 ◽

Cited By ~ 1

Author(s):

Tanveer Ahmad ◽

Nauman Ahmed ◽

Johan Peltenburg ◽

Zaid Al-Ars

Keyword(s):

Cost Effective ◽

Data Representation ◽

Data Sets ◽

Memory Processing ◽

Physical Constraints ◽

Language Interoperability ◽

Sequencing Technologies ◽

Fast Processing ◽

Computing Platforms ◽

Cross Language

AbstractThe rapidly growing size of genomics data bases, driven by advances in sequencing technologies, demands fast and cost-effective processing. However, processing this data creates many challenges, particularly in selecting appropriate algorithms and computing platforms. Computing systems need data closer to the processor for fast processing. Traditionally, due to cost, volatility and other physical constraints of DRAM, it was not feasible to place large amounts of working data sets in memory. However, new emerging storage class memories allow storing and processing big data closer to the processor. In this work, we show how the commonly used genomics data format, Sequence Alignment/Map (SAM), can be presented in the Apache Arrow in-memory data representation to benefit of in-memory processing and to ensure better scalability through shared memory objects, by avoiding large (de)-serialization overheads in cross-language interoperability. To demonstrate the benefits of such a system, we propose ArrowSAM, an in-memory SAM format that uses the Apache Arrow framework, and integrate it into genome pre-processing pipelines including BWA-MEM, Picard and Sambamba. Results show 15x and 2.4x speedups as compared to Picard and Sambamba, respectively. The code and scripts for running all workflows are freely available at https://github.com/abs-tudelft/ArrowSAM.

Download Full-text