A Novel Approach for Measurement of Source Code Similarity

Author(s):  
Mayank Agrawal ◽  
Vinod Jain ◽  
Atul Kumar Uttam
Keyword(s):  
2015 ◽  
Vol 7 (3) ◽  
pp. 18-44 ◽  
Author(s):  
Soumia Bendakir ◽  
Nacereddine Zarour ◽  
Pierre Jean Charrel

Requirements change management (RCM) is actually an inevitable task that might be considered in system development's life cycle, since user requirements are continuously evolving (some are added, others are modified or deleted). A large majority of studies have examined the issue of change, while most of them focused on the design and source code, requirements were often forgotten, even though, the cost of fixing the defect and introduced error due to the requirements is less higher compared to the cost of error in design and implementation. For this purpose, this work focuses on change issues in the requirements engineering (RE) context, which contains the complete initial specification. Properties such as adaptability, perception, and cooperation of the multi-agent system (MAS) allow managing changing requirements in a controlled manner. The main objective of this work is to develop an agent-oriented approach which will be effective in the requirements management to be adapted to changes in their environments.


Author(s):  
Taher Ahmed Ghaleb ◽  
Khalid Aljasser ◽  
Musab A. Alturki

Design patterns are generic solutions to common programming problems. Design patterns represent a typical example of design reuse. However, implementing design patterns can lead to several problems, such as programming overhead and traceability. Existing research introduced several approaches to alleviate the implementation issues of design patterns. Nevertheless, existing approaches pose different implementation restrictions and require programmers to be aware of how design patterns should be implemented. Such approaches make the source code more prone to faults and defects. In addition, existing design pattern implementation approaches limit programmers to apply specific scenarios of design patterns (e.g. class-level), while other approaches require scattering implementation code snippets throughout the program. Such restrictions negatively impact understanding, tracing, or reusing design patterns. In this paper, we propose a novel approach to support the implementation of software design patterns as an extensible Java compiler. Our approach allows developers to use concise, easy-to-use language constructs to apply design patterns in their code. In addition, our approach allows the application of design patterns in different scenarios. We illustrate our approach using three commonly used design patterns, namely Singleton, Observer and Decorator. We show, through illustrative examples, how our design pattern constructs can significantly simplify implementing design patterns in a flexible, reusable and traceable manner. Moreover, our design pattern constructs allow class-level and instance-level implementations of design patterns.


2018 ◽  
Vol 2018 ◽  
pp. 1-13 ◽  
Author(s):  
Xiaolong Liu ◽  
Qiang Wei ◽  
Qingxian Wang ◽  
Zheng Zhao ◽  
Zhongxu Yin

Fuzzing is an effective technique to discover vulnerabilities that involves testing applications by constructing invalid input data. However, for applications with checksum mechanism, fuzzing can only achieve low coverage because samples generated by the fuzzer are possibly incapable of passing the checksum verification. To solve this problem, most current fuzzers advise the user to comment out the checksum verification code manually, but it requires considerable time to audit the source code to identify the checksum point corresponding to checksum verification. In this paper, we present a novel approach based on taint analysis to identify the checksum point automatically. To implement this approach, the checksum-aware fuzzing assistant tool (CAFA) is designed. After the checksum point is identified, the application is statically patched in an antilogical manner at the checksum point. The fuzzing tool then tests the patched program to bypass the checksum verification. To evaluate CAFA, we use it to assist the American Fuzzy Lop (AFL) tool in fuzzing eight real-world applications with known input specification. The experimental results show that CAFA can accurately and quickly identify the checksum points and greatly improve the coverage of AFL. With the help of CAFA, multiple buffer overflow vulnerabilities have been discovered in the newest ImageMagick and RAR applications.


2020 ◽  
Vol 14 (04) ◽  
pp. 501-516
Author(s):  
Joseph R. Barr ◽  
Peter Shaw ◽  
Faisal N. Abu-Khzam ◽  
Tyler Thatcher ◽  
Sheng Yu

We present an empirical analysis of the source code of the Fluoride Bluetooth module, which is a part of standard Android OS distribution, by exhibiting a novel approach for classifying and scoring source code and vulnerability rating. Our workflow combines deep learning, combinatorial optimization, heuristics and machine learning. A combination of heuristics and deep learning is used to embed function (and method) labels into a low-dimensional Euclidean space. Because the corpus of the Fluoride source code is rather limited (containing approximately 12,000 functions), a straightforward embedding (using, e.g. code2vec) is untenable. To overcome the challenge of dearth of data, it is necessary to go through an intermediate step of Byte-Pair Encoding. Subsequently, we embed the tokens from which we assemble an embedding of function/method labels. Long short-term memory network (LSTM) is used to embed tokens. The next step is to form a distance matrix consisting of the cosines between every pairs of vectors (function embedding) which in turn is interpreted as a (combinatorial) graph whose vertices represent functions, and edges correspond to entries whose value exceed some given threshold. Cluster-Editing is then applied to partition the vertex set of the graph into subsets representing “dense graphs,” that are nearly complete subgraphs. Finally, the vectors representing the components, plus additional heuristic-based features are used as features to model the components for vulnerability risk.


Author(s):  
Dávid Honfi ◽  
Zoltán Micskei

Testing is a significantly time-consuming, yet commonly employed activity to improve the quality of software. Thus, techniques like dynamic symbolic execution were proposed for generating tests only from source code. However, current approaches usually could not create thorough tests for software units with dependencies (e.g. calls to file system or external services). In this paper, we present a novel approach that synthesizes an isolation sandbox, which interacts with the test generator to increase the covered behaviour in the unit under test. The approach automatically transforms the code of the unit under test, and lets the test generator choose values for parameters in the calls to dependencies. The paper presents a prototype implementation that collaborates with the IntelliTest test generator. The automated isolation is evaluated on source code from open-source projects. The results show that the approach can significantly increase the code coverage achieved by the generated tests.


2018 ◽  
Vol 6 (9) ◽  
pp. 505-519
Author(s):  
Damitha D Karunarathna ◽  
Nasik Shafeek

Source-code that a developer writes may not definitely make sense to another, the understandability of a source code depends on the proficiency in the language and the logical thinking pattern of the person who has developed the code and who tries to understand it.   However, in distributed software development and in software maintenance there is a need to read and understand the source-code probably written by someone else after some time it has encoded.  Flowcharts are used to depict the logical flow of processes and can be used as an effective tool in representing the control flow of software programs. This paper presents a novel approach to generate flowcharts from program snippets. It demonstrates that by using an intermediate abstract representation, independent of any programming language, the generation of flowcharts for programs written in any programming language can be achieved. The feasibility of the proposed approach was demonstrated by developing a porotype system of compilers to generate flowcharts for source-codes written in the PHP language.


Author(s):  
Xing Hu ◽  
Ge Li ◽  
Xin Xia ◽  
David Lo ◽  
Shuai Lu ◽  
...  

Code summarization, aiming to generate succinct natural language description of source code, is extremely useful for code search and code comprehension. It has played an important role in software maintenance and evolution. Previous approaches generate summaries by retrieving summaries from similar code snippets. However, these approaches heavily rely on whether similar code snippets can be retrieved, how similar the snippets are, and fail to capture the API knowledge in the source code, which carries vital information about the functionality of the source code. In this paper, we propose a novel approach, named TL-CodeSum, which successfully uses API knowledge learned in a different but related task to code summarization. Experiments on large-scale real-world industry Java projects indicate that our approach is effective and outperforms the state-of-the-art in code summarization.


2018 ◽  
Vol 8 (10) ◽  
pp. 1902 ◽  
Author(s):  
Tina Beranič  ◽  
Vili Podgorelec ◽  
Marjan Heričko

Different challenges arise while detecting deficient software source code. Usually a large number of potentially problematic entities are identified when an individual software metric or individual quality aspect is used for the identification of deficient program entities. Additionally, a lot of these entities quite often turn out to be false positives, i.e., the metrics indicate poor quality whereas experienced developers do not consider program entities as problematic. The number of entities identified as potentially deficient does not decrease significantly when the identification of deficient entities is carried out by applying code smell detection rules. Moreover, the intersection of entities identified as allegedly deficient among different code smell detection tools is small, which suggests that the implementation of code smell detection rules are not consistent and uniform. To address these challenges, we present a novel approach for identifying deficient entities that is based on applying the majority function on the combination of software metrics. Program entities are assessed according to selected quality aspects that are evaluated with a set of software metrics and corresponding threshold values derived from benchmark data, considering the statistical distributions of software metrics values. The proposed approach was implemented and validated on projects developed in Java, C++ and C#. The validation of the proposed approach was done with expert judgment, where software developers and architects with multiple years of experiences assessed the quality of the software classes. Using a combination of software metrics as the criteria for the identification of deficient source code, the number of potentially deficient object-oriented program entities proved to be reduced. The results show the correctness of quality ratings determined by the proposed identification approach, and most importantly, confirm the absence of false positive entities.


2005 ◽  
Author(s):  
◽  
Alejandra Garrido

The C preprocessor is heavily used in C programs because it provides useful and even necessary additions to the C language. Since preprocessor directives are not part of C, they are removed before parsing and program analysis take place, during the phase called preprocessing. In the context of refactoring, it is inappropriate to remove preprocessor directives: if changes are applied on the preprocessed version of a program, it may not be possible to recover the un-preprocessed version. This means that after refactoring, all the source code would be contained in a single unit, targeted to a single configuration and without preprocessor macros. This thesis describes a novel approach to preserve preprocessor directives during parsing and program analysis, and integrate them in the program representations. Furthermore, it illustrates how the program representations are used during refactor ing and how transformations preserve preprocessor directives. Additionally, the semantics of the C preprocessor are formally specified, and the results of implementing this approach in a refactoring tool for C, CRefactory, are presented.


Sign in / Sign up

Export Citation Format

Share Document