Source Code Author Identification Based on N-gram Author Profiles

Nowadays, in a wide variety of situations, source code authorship identification has become an issue of major concern. Such situations include authorship disputes, proof of authorship in court, cyber attacks in the form of viruses, trojan horses, logic bombs, fraud, and credit card cloning. Source code author identification deals with the task of identifying the most likely author of a computer program, given a set of predefined author candidates. We present a new approach, called the SCAP (Source Code Author Profiles) approach, based on byte-level n-grams in order to represent a source code author’s style. Experiments on data sets of different programming-language (Java,C++ and Common Lisp) and varying difficulty (6 to 30 candidate authors) demonstrate the effectiveness of the proposed approach. A comparison with a previous source code authorship identification study based on more complicated information shows that the SCAP approach is language independent and that n-gram author profiles are better able to capture the idiosyncrasies of the source code authors. It is also demonstrated that the effectiveness of the proposed model is not affected by the absence of comments in the source code, a condition usually met in cyber-crime cases.

Download Full-text

ICodeNet - A Hierarchical Neural Network Approach For Source Code Author Identification

2021 13th International Conference on Machine Learning and Computing ◽

10.1145/3457682.3457709 ◽

2021 ◽

Author(s):

Pranali Bora ◽

Tulika Awalgaonkar ◽

Himanshu Palve ◽

Raviraj Joshi ◽

Purvi Goel

Keyword(s):

Neural Network ◽

Source Code ◽

Network Approach ◽

Neural Network Approach ◽

Author Identification ◽

Hierarchical Neural Network

Download Full-text

A robust authorship attribution on big period

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i4.pp3167-3174 ◽

2019 ◽

Vol 9 (4) ◽

pp. 3167 ◽

Cited By ~ 1

Author(s):

Mubin Shoukat Tamboli ◽

Rajesh Prasad

Keyword(s):

Identification Problem ◽

Authorship Attribution ◽

Support Vector ◽

Writing Style ◽

Author Identification ◽

Time Period ◽

N Gram ◽

Corpus Selection ◽

Writing Sample ◽

Small Period

Authorship attribution is a task to identify the writer of unknown text and categorize it to known writer. Writing style of each author is distinct and can be used for the discrimination. There are different parameters responsible for rectifying such changes. When the writing samples collected for an author when it belongs to small period, it can participate efficiently for identification of unknown sample. In this paper author identification problem considered where writing sample is not available on the same time period. Such evidences collected over long period of time. And character n-gram, word n-gram and pos n-gram features used to build the model. As they are contributing towards style of writer in terms of content as well as statistic characteristic of writing style. We applied support vector machine algorithm for classification. Effective results and outcome came out from the experiments. While discriminating among multiple authors, corpus selection and construction were the most tedious task which was implemented effectively. It is observed that accuracy varied on feature type. Word and character n-gram have shown good accuracy than PoS n-gram.

Download Full-text