scholarly journals TF-IDF Inspired Detection for Cross-Language Source Code Plagiarism and Collusion

2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Oscar Karnalim

Several computing courses allow students to choose which programming language they want to use for completing a programming task. This can lead to cross-language code plagiarism and collusion, in which the copied code file is rewritten in another programming language. In response to that, this paper proposes a detection technique which is able to accurately compare code files written in various programming languages, but with limited effort in accommodating such languages at development stage. The only language-dependent feature used in the technique is source code tokeniser and no code conversion is applied. The impact of coincidental similarity is reduced by applying a TF-IDF inspired weighting, in which rare matches are prioritised. Our evaluation shows that the technique outperforms common techniques in academia for handling language conversion disguises. Further, it is comparable to those techniques when dealing with conventional disguises.

2020 ◽  
Vol 2020 ◽  
pp. 1-15
Author(s):  
Feng Zhang ◽  
Guofan Li ◽  
Cong Liu ◽  
Qian Song

Source code similarity detection has various applications in code plagiarism detection and software intellectual property protection. In computer programming teaching, students may convert the source code written in one programming language into another language for their code assignment submission. Existing similarity measures of source code written in the same language are not applicable for the cross-language code similarity detection because of syntactic differences among different programming languages. Meanwhile, existing cross-language source similarity detection approaches are susceptible to complex code obfuscation techniques, such as replacing equivalent control structure and adding redundant statements. To solve this problem, we propose a cross-language code similarity detection (CLCSD) approach based on code flowcharts. In general, two source code fragments written in different programming languages are transformed into standardized code flowcharts (SCFC), and their similarity is obtained by measuring their corresponding SCFC. More specifically, we first introduce the standardized code flowchart (SCFC) model to be the uniform flowcharts representation of source code written in different languages. SCFC is language-independent, and therefore, it can be used as the intermediate structure for source code similarity detection. Meanwhile, transformation techniques are given to transform source code written in a specific programming language into an SCFC. Second, we propose the SCFC-SPGK algorithm based on the shortest path graph kernel to measure the similarity between two SCFCs. Thus, the similarity between two pieces of source code in different programming languages is given by the similarity between SCFCs. Experimental results show that compared with existing approaches, CLCSD has higher accuracy in cross-language source code similarity detection. Furthermore, CLCSD cannot only handle common source code obfuscation techniques used by students in computer programming teaching but also obtain nearly 90% accuracy in dealing with some complex obfuscation techniques.


2020 ◽  
Author(s):  
Cut Nabilah Damni

AbstrakSoftware komputer atau perangkat lunak komputer merupakan kumpulan instruksi (program atau prosedur) untuk dapat melaksanakan pekerjaan secara otomatis dengan cara mengolah atau memproses kumpulan intruksi (data) yang diberikan. (Yahfizham, 2019 : 19) Sebagian besar dari software komputer dibuat oleh (programmer) dengan menggunakan bahasa pemprograman. Orang yang membuat bahasa pemprograman menuliskan perintah dalam bahasa pemprograman seperti layaknya bahasa yang digunakan oleh orang pada umumnya dalam melakukan perbincangan. Perintah-perintah tersebut dinamakan (source code). Program komputer lainnya dinamakan (compiler) yang digunakan pada (source code) dan kemudian mengubah perintah tersebut kedalam bahasa yang dimengerti oleh komputer lalu hasilnya dinamakan program executable (EXE). Pada dasarnya, komputer selalu memiliki perangkat lunak komputer atau software yang terdiri dari sistem operasi, sistem aplikasi dan bahasa pemograman.AbstractComputer software or computer software is a collection of instructions (programs or procedures) to be able to carry out work automatically by processing or processing the collection of instructions (data) provided. (Yahfizham, 2019: 19) Most of the computer software is made by (programmers) using the programming language. People who make programming languages write commands in the programming language like the language used by people in general in conducting conversation. The commands are called (source code). Other computer programs called (compilers) are used in (source code) and then change the command into a language understood by the computer and the results are called executable programs (EXE). Basically, computers always have computer software or software consisting of operating systems, application systems and programming languages.


2020 ◽  
Author(s):  
S Mukhtar Ayubi Simatupang

Abstrak- Perangkat lunak komputer atau yang sering disebut sebagai (software) mempunyai sifat yang berbeda dengan (hardware) atau perangkat keras komputer. Jika perangkat keras komputer dapat dilihat dan disentuh keberadaannya maka perangkat lunak pada suatu komputer hanya dapat dilihat saja tanpa dapat kita rasa atau raba bendanya. Lebih tepatnya, perangkat lunak tidak dapat disentuh dan memang secara fisik tidak tampak namun kita dapat mengoperasikannya. Namun walaupun tidak tampak secara fisik perangkat lunak sangat berguna dalam pengoperasiannya dengan adanya perangkat lunak suatu komputer dapat menjalankan suatu perintah. Sebagian besar dari software komputer dibuat oleh (programmer) dengan menggunakan bahasa pemprograman. Orang yang membuat bahasa pemprograman menuliskan perintah dalam bahasa pemprograman seperti layaknya bahasa yang digunakan oleh orang pada umumnya dalam melakukan perbincangan. Perintah-perintah tersebut dinamakan (source code). Program komputer lainnya dinamakan (compiler) yang digunakan pada (source code) dan kemudian mengubah perintah tersebut kedalam bahasa yang dimengerti oleh komputer lalu hasilnya dinamakan program executable (EXE).Kata Kunci: Software, ProgrammerAbstac t- Computer software or often referred to as (software) has different properties from (hardware) or computer hardware. If the computer hardware can be seen and touched, then the software on a computer can only be seen without our feeling or feeling. More precisely, the software cannot be touched and it is physically invisible but we can operate it. But even though the software does not appear physically very useful in its operation with the software a computer can run a command. Most of the computer software is made by (programmers) using the programming language. People who make programming languages write commands in the programming language like the language used by people in general in conducting conversation. The commands are called (source code). Other computer programs called (compilers) are used in (source code) and then change the command into a language understood by the computer and the results are called executable programs (EXE).Keywords: Software, Programmer


2019 ◽  
Author(s):  
Budiman

During this period continued to develop computer software, programming language was no exception. At the start of the era of low level programming languages, then developed a high level programming language. It is characterized by the appearance of a method of programming offered by a programming language, that is, object-oriented programming (OOP). IDE (Integrated Development Environment) is a computer program that has some facilities that are required in the development of the software. The purpose of the IDEA is to provide all the necessary utilities in building software. As for the type of software text editor that can be used to manipulate the source code hereinafter referred to as the source code of programming languages such as Ultraedit, JediEdit, ClearEdit, cEdit, the Golden Pen, and so on. PuniEdit software is a text-based editor software that can simplify the user through correction, insertion, and modification of the source code. PuniEdit software is built using Borland Delphi 7.0 and SynEdit component. This software can be used for the Pascal programming language, C++ and HTML. In addition, the software PuniEdit can perform management of the token. This PuniEdit software, the user can clearly see every occurrence of the type of token as keywords (reserved word), identifier, operator, and so on.Keywords: Source code, programming language, source code is scanned.


2020 ◽  
Author(s):  
Cut Nabilah Damni

AbstrakSoftware komputer atau perangkat lunak komputer merupakan kumpulan instruksi (program atau prosedur) untuk dapat melaksanakan pekerjaan secara otomatis dengan cara mengolah atau memproses kumpulan intruksi (data) yang diberikan. (Yahfizham, 2019 : 19) Sebagian besar dari software komputer dibuat oleh (programmer) dengan menggunakan bahasa pemprograman. Orang yang membuat bahasa pemprograman menuliskan perintah dalam bahasa pemprograman seperti layaknya bahasa yang digunakan oleh orang pada umumnya dalam melakukan perbincangan. Perintah-perintah tersebut dinamakan (source code). Program komputer lainnya dinamakan (compiler) yang digunakan pada (source code) dan kemudian mengubah perintah tersebut kedalam bahasa yang dimengerti oleh komputer lalu hasilnya dinamakan program executable (EXE). Pada dasarnya, komputer selalu memiliki perangkat lunak komputer atau software yang terdiri dari sistem operasi, sistem aplikasi dan bahasa pemograman.AbstractComputer software or computer software is a collection of instructions (programs or procedures) to be able to carry out work automatically by processing or processing the collection of instructions (data) provided. (Yahfizham, 2019: 19) Most of the computer software is made by (programmers) using the programming language. People who make programming languages write commands in the programming language like the language used by people in general in conducting conversation. The commands are called (source code). Other computer programs called (compilers) are used in (source code) and then change the command into a language understood by the computer and the results are called executable programs (EXE). Basically, computers always have computer software or software consisting of operating systems, ap


2021 ◽  
Vol 15 (5) ◽  
pp. 1-21
Author(s):  
Xiang Ling ◽  
Lingfei Wu ◽  
Saizhuo Wang ◽  
Gaoning Pan ◽  
Tengfei Ma ◽  
...  

Code retrieval is to find the code snippet from a large corpus of source code repositories that highly matches the query of natural language description. Recent work mainly uses natural language processing techniques to process both query texts (i.e., human natural language) and code snippets (i.e., machine programming language), however, neglecting the deep structured features of query texts and source codes, both of which contain rich semantic information. In this article, we propose an end-to-end deep graph matching and searching (DGMS) model based on graph neural networks for the task of semantic code retrieval. To this end, we first represent both natural language query texts and programming language code snippets with the unified graph-structured data, and then use the proposed graph matching and searching model to retrieve the best matching code snippet. In particular, DGMS not only captures more structural information for individual query texts or code snippets, but also learns the fine-grained similarity between them by cross-attention based semantic matching operations. We evaluate the proposed DGMS model on two public code retrieval datasets with two representative programming languages (i.e., Java and Python). Experiment results demonstrate that DGMS significantly outperforms state-of-the-art baseline models by a large margin on both datasets. Moreover, our extensive ablation studies systematically investigate and illustrate the impact of each part of DGMS.


Compiler ◽  
2015 ◽  
Vol 4 (1) ◽  
Author(s):  
Ngadiyono Ngadiyono ◽  
Hero Wintolo

Designing a website is the first step to build website that gives the appearance of an interface to the website visitor. the appearence web design that interesting can be gived conveniences for visitors to browse the contents of any website content. designing of the website, skill are required several to be master multiple web programming languages, the programming language are  HTML, CSS and Javascript. In the overall control of the programming language takes time to be master and understand each other codes. Therefore, to built application that allows users to create a website design. This application is called WebEditor is built using CodeIgniter and Twitter Bootstrap framework. In the rendering process design needed parallel processing techniques to the process. so, the impact in terms of rendering time speed to website design. Throughout this system, users can design a website easily and quickly, and in the process of rendering design does not require for long time. the results of study case have done that it can be seen that influences of the implementation of rendering speed transmission media based on the number of processors and computers server. The percentage of the speed to rendering of design does not using a grid server on the LAN network are 33.7 %, 33.3 % and internet routers 33 %. While using a part of grid servers on the LAN network are 33.6 %, 33.4 % and internet routers 33 % and the final grid using 2 servers on the LAN network are 44 %, 33 % and internet routers 26 %. So the highest rendering speed on the LAN router network for further 44 % and 30 % at the latest on the Internet 26 % by using 2 part of grid servers. Thus rendering the best in the website design is using LAN with 2 part of grid servers.


2020 ◽  
Vol 14 ◽  
pp. 31-36
Author(s):  
Krzysztof Bezrąk ◽  
Sławomir Przyłucki

Recent years of cloud technology development have brought a sharp increase in interest in solutions known as serverless systems. Their performance, and thus usefulness in potential applications, strongly depends on the method of program implementation of specific tasks. The article analyzes the impact of selected, currently the most popular, programming languages on the performance of the serverless test infrastructure running in an environment managed by the Kubernetes system. The collected data were used to formulate conclusions regarding the suitability of individual languages in the conditions of varying serverless system loads.


2017 ◽  
Vol 1 (1) ◽  
pp. 39
Author(s):  
Muhammad Wali ◽  
Lukman Ahmad

a b s t r a c tof reference codes on a programming language and software evaluation. Today, most Source code library for the purposes of learning software developers in the form of documentation of the use of a programming language that can be accessed through the official website developer programming languages, forum and various blogs. Because of the complexity of the features most web-based Source code library can only be accessed through the website and some others have provided documentation on each software vendor from the developers company the device. This research tries to construct a model of the application Source code library that can be used as a form of documentation for learning the use of various programming languages flexibly both in online and offline. The application allows the renewal of data/content Source code library at a time when the Internet is still available or at the time of the user's area does not have a network the Internet. In the implementation of this research will be divided in three stages, namely data collection pre development, development and implementation, and data collection of post-war development. Data collection pre development intended to get a preliminary study about the provision the core issue at hand, while the development and implementation phase focuses on model software design into diagrams and make the programming code to implement the design that has been created. While the data collection stage of the post-war development was for revamping the application made in conclusion, withdrawal, and suggestions for further research topics.Keywords:Application, Source code library, software development a b s t r a kSource code library memungkinkan pengajar, programer maupun pelajar dan pengembang perangkat lunak untuk mendapatkan berbagai referensi kode-kode pada sebuah bahasa pemrograman perangkat lunak dan memberikan evaluasi. Saat ini, kebanyakan Source code library untuk keperluan pembelajaran pengembang perangkat lunak berupa dokumentasi penggunaan suatu bahasa pemrograman yang dapat diakses melalui website resmi pengembang bahasa pemrograman, forum dan berbagai blog. Karena kompleksitas fiturnya kebanyakan web-based Source code library hanya dapat diakses melalui website dan sebagian lainnya telah disediakan dokumentasi pada setiap software vendor dari perusahaan pengembang perangkat. Penelitian ini mencoba untuk membangun model aplikasi Source code library yang dapat digunakan sebagai bentuk dokumentasi pembelajaran penggunaan berbagai bahasa pemrograman secara fleksibel baik dalam kondisi online maupun offline. Aplikasi tersebut memungkinkan pembaharuan data/konten Source code library pada saat Internet masih tersedia atau pada saat pengguna pada area tidak memiliki jaringan Internet. Dalam pelaksanaannya penelitian ini akan dibagi dalam tiga tahapan, yaitu pengumpulan data pra pengembangan, pengembangan serta implementasi, dan pengumpulan data pasca pengembangan. Pengumpulan data pra pengembangan dimaksudkan untuk mendapatkan bekal studi pendahuluan tentang inti masalah yang sedang dihadapi, sedangkan tahap pengembangan dan implementasi berfokus pada memodelkan perancangan perangkat lunak ke dalam diagram dan membuat kode pemrograman untuk mengimplementasikan perancangan yang telah dibuat. Sedangkan tahapan pengumpulan data pasca pengembangan adalah untuk pembenahan aplikasi yang dibuat, penarikan kesimpulan, dan saran untuk topik penelitian selanjutnya.Kata Kunci:Aplikasi, Source code library, pengembangan perangkat lunak


Author(s):  
Tran Thanh Luong ◽  
Le My Canh

JavaScript has become more and more popular in recent years because its wealthy features as being dynamic, interpreted and object-oriented with first-class functions. Furthermore, JavaScript is designed with event-driven and I/O non-blocking model that boosts the performance of overall application especially in the case of Node.js. To take advantage of these characteristics, many design patterns that implement asynchronous programming for JavaScript were proposed. However, choosing a right pattern and implementing a good asynchronous source code is a challenge and thus easily lead into less robust application and low quality source code. Extended from our previous works on exception handling code smells in JavaScript and exception handling code smells in JavaScript asynchronous programming with promise, this research aims at studying the impact of three JavaScript asynchronous programming patterns on quality of source code and application.


Sign in / Sign up

Export Citation Format

Share Document