Improving Similarity Measure for Java Programs Based on Optimal Matching of Control Flow Graphs

Measuring program similarity plays an important role in solving many problems in software engineering. However, because programs are instruction sequences with complex structures and semantic functions and furthermore, programs may be obfuscated deliberately through semantics-preserving transformations, measuring program similarity is a difficult task that has not been adequately addressed. In this paper, we propose a new approach to measuring Java program similarity. The approach first measures the low-level similarity between basic blocks according to the bytecode instruction sequences and the structural property of the basic blocks. Then, an error-tolerant graph matching algorithm that can combat structure transformations is used to match the Control Flow Graphs (CFG) based on the basic block similarity. The high-level similarity between Java programs is subsequently calculated on the matched pairs of the independent paths extracted from the optimal CFG matching. The proposed CFG-Match approach is compared with a string-based approach, a tree-based approach and a graph-based approach. Experimental results show that the CFG-Match approach is more accurate and robust against semantics-preserving transformations. The CFG-Match approach is used to detect Java program plagiarism. Experiments on the collection of benchmark program pairs collected from the students’ submission of project assignments demonstrate that the CFG-Match approach outperforms the comparative approaches in the detection of Java program plagiarism.

Download Full-text

StFuzzer: Contribution-Aware Coverage-Guided Fuzzing for Smart Devices

Security and Communication Networks ◽

10.1155/2021/1987844 ◽

2021 ◽

Vol 2021 ◽

pp. 1-15

Author(s):

Jiageng Yang ◽

Xinguo Zhang ◽

Hui Lu ◽

Muhammad Shafiq ◽

Zhihong Tian

Keyword(s):

Real World ◽

Control Flow ◽

Smart Devices ◽

Smart Device ◽

Basic Block ◽

Optimization Approach ◽

Seed Selection ◽

Code Coverage ◽

Real World Applications ◽

Basic Blocks

The root cause of the insecurity for smart devices is the potential vulnerabilities in smart devices. There are many approaches to find the potential bugs in smart devices. Fuzzing is the most effective vulnerability finding technique, especially the coverage-guided fuzzing. The coverage-guided fuzzing identifies the high-quality seeds according to the corresponding code coverage triggered by these seeds. Existing coverage-guided fuzzers consider that the higher the code coverage of seeds, the greater the probability of triggering potential bugs. However, in real-world applications running on smart devices or the operation system of the smart device, the logic of these programs is very complex. Basic blocks of these programs play a different role in the process of application exploration. This observation is ignored by existing seed selection strategies, which reduces the efficiency of bug discovery on smart devices. In this paper, we propose a contribution-aware coverage-guided fuzzing, which estimates the contributions of basic blocks for the process of smart device exploration. According to the control flow of the target on any smart device and the runtime information during the fuzzing process, we propose the static contribution of a basic block and the dynamic contribution built on the execution frequency of each block. The contribution-aware optimization approach does not require any prior knowledge of the target device, which ensures our optimization adapting gray-box fuzzing and white-box fuzzing. We designed and implemented a contribution-aware coverage-guided fuzzer for smart devices, called StFuzzer. We evaluated StFuzzer on four real-world applications that are often applied on smart devices to demonstrate the efficiency of our contribution-aware optimization. The result of our trials shows that the contribution-aware approach significantly improves the capability of bug discovery and obtains better execution speed than state-of-the-art fuzzers.

Download Full-text

AN APPROACH TO COMPARING CONTROL FLOW GRAPHS BASED ON BASIC BLOCK MATCHING

Indian Journal of Computer Science and Engineering ◽

10.21817/indjcse/2020/v11i3/201103237 ◽

2020 ◽

Vol 11 (3) ◽

pp. 289-296

Author(s):

Hyun-il Lim

Keyword(s):

Control Flow ◽

Block Matching ◽

Basic Block ◽

Flow Graphs

Download Full-text

Instruction Scheduling Across Control Flow

Scientific Programming ◽

10.1155/1993/536143 ◽

1993 ◽

Vol 2 (3) ◽

pp. 1-5

Author(s):

Martin Charles Golumbic ◽

Vladimir Rainish

Keyword(s):

Time Delays ◽

Instruction Scheduling ◽

Control Flow ◽

Basic Block ◽

High Quality ◽

Assembly Code ◽

Considerable Research ◽

Straight Line ◽

Compiled Code ◽

Basic Blocks

Instruction scheduling algorithms are used in compilers to reduce run-time delays for the compiled code by the reordering or transformation of program statements, usually at the intermediate language or assembly code level. Considerable research has been carried out on scheduling code within the scope of basic blocks, i.e., straight line sections of code, and very effective basic block schedulers are now included in most modern compilers and especially for pipeline processors. In previous work Golumbic and Rainis: IBM J. Res. Dev., Vol. 34, pp.93–97, 1990, we presented code replication techniques for scheduling beyond the scope of basic blocks that provide reasonable improvements of running time of the compiled code, but which still leaves room for further improvement. In this article we present a new method for scheduling beyond basic blocks called SHACOOF. This new technique takes advantage of a conventional, high quality basic block scheduler by first suppressing selected subsequences of instructions and then scheduling the modified sequence of instructions using the basic block scheduler. A candidate subsequence for suppression can be found by identifying a region of a program control flow graph, called an S-region, which has a unique entry and a unique exit and meets predetermined criteria. This enables scheduling of a sequence of instructions beyond basic block boundaries, with only minimal changes to an existing compiler, by identifying beneficial opportunities to cover delays that would otherwise have been beyond its scope.

Download Full-text

Implementing a high-efficiency similarity analysis approach for firmware code

PLoS ONE ◽

10.1371/journal.pone.0245098 ◽

2021 ◽

Vol 16 (1) ◽

pp. e0245098

Author(s):

Yisen Wang ◽

Ruimin Wang ◽

Jing Jing ◽

Huanwei Wang

Keyword(s):

Machine Learning ◽

High Efficiency ◽

Control Flow ◽

Rapid Expansion ◽

Analysis Approach ◽

Similarity Analysis ◽

Basic Block ◽

Comparison Results ◽

Flow Features ◽

Basic Blocks

The rapid expansion of the open-source community has shortened the software development cycle, but the spread of vulnerabilities has been accelerated, especially in the field of the Internet of Things. In recent years, the frequency of attacks against connected devices is increasing exponentially; thus, the vulnerabilities are more serious in nature. The state-of-the-art firmware security inspection technologies, such as methods based on machine learning and graph theory, find similar applications depending on the known vulnerabilities but cannot do anything without detailed information about the vulnerabilities. Moreover, model training, which is necessary for the machine learning technologies, requires a significant amount of time and data, resulting in low efficiency and poor extensibility. Aiming at the above shortcomings, a high-efficiency similarity analysis approach for firmware code is proposed in this study. First, the function control flow features and data flow features are extracted from the functions of the firmware and of the vulnerabilities, and the features are used to calculate the SimHash of the functions. The mass storage and fast query capabilities of the SimHash are implemented by the pigeonhole principle. Second, the similarity function pairs are analyzed in detail within and among the basic blocks. Within the basic blocks, the symbolic execution is used to generate the basic block semantic information, and the constraint solver is used to determine the semantic equivalence. Among the basic blocks, the local control flow graphs are analyzed to obtain their similarity. Then, we implemented a prototype and present the evaluation. The evaluation results demonstrate that the proposed approach can implement large-scale firmware function similarity analysis. It can also get the location of the real-world firmware patch without vulnerability function information. Finally, we compare our method with existing methods. The comparison results demonstrate that our method is more efficient and accurate than the Gemini and StagedMethod. More than 90% of the firmware functions can be indexed within 0.1 s, while the search time of 100,000 firmware functions is less than 2 s.

Download Full-text

Machine Learning–enabled Scalable Performance Prediction of Scientific Codes

ACM Transactions on Modeling and Computer Simulation ◽

10.1145/3450264 ◽

2021 ◽

Vol 31 (2) ◽

pp. 1-28

Author(s):

Gopinath Chennupati ◽

Nandakishore Santhi ◽

Phill Romero ◽

Stephan Eidenbenz

Keyword(s):

Machine Learning ◽

Performance Prediction ◽

Prediction Models ◽

Radiation Transport ◽

Discrete Event ◽

Basic Block ◽

Distribution Models ◽

Scientific Application ◽

High Level ◽

Access Patterns

Hardware architectures become increasingly complex as the compute capabilities grow to exascale. We present the Analytical Memory Model with Pipelines (AMMP) of the Performance Prediction Toolkit (PPT). PPT-AMMP takes high-level source code and hardware architecture parameters as input and predicts runtime of that code on the target hardware platform, which is defined in the input parameters. PPT-AMMP transforms the code to an (architecture-independent) intermediate representation, then (i) analyzes the basic block structure of the code, (ii) processes architecture-independent virtual memory access patterns that it uses to build memory reuse distance distribution models for each basic block, and (iii) runs detailed basic-block level simulations to determine hardware pipeline usage. PPT-AMMP uses machine learning and regression techniques to build the prediction models based on small instances of the input code, then integrates into a higher-order discrete-event simulation model of PPT running on Simian PDES engine. We validate PPT-AMMP on four standard computational physics benchmarks and present a use case of hardware parameter sensitivity analysis to identify bottleneck hardware resources on different code inputs. We further extend PPT-AMMP to predict the performance of a scientific application code, namely, the radiation transport mini-app SNAP. To this end, we analyze multi-variate regression models that accurately predict the reuse profiles and the basic block counts. We validate predicted SNAP runtimes against actual measured times.

Download Full-text

Neural reverse engineering of stripped binaries using augmented control flow graphs

Proceedings of the ACM on Programming Languages ◽

10.1145/3428293 ◽

2020 ◽

Vol 4 (OOPSLA) ◽

pp. 1-28 ◽

Cited By ~ 1

Author(s):

Yaniv David ◽

Uri Alon ◽

Eran Yahav

Keyword(s):

Reverse Engineering ◽

Control Flow ◽

Flow Graphs

Download Full-text

A new approach based on graph matching and evolutionary approach for sport scheduling problem

Intelligent Decision Technologies ◽

10.3233/idt-190114 ◽

2020 ◽

pp. 1-16

Author(s):

Meriem Khelifa ◽

Dalila Boughaci ◽

Esma Aïmeur

Keyword(s):

Graph Matching ◽

State Of The Art ◽

Travel Cost ◽

Round Robin ◽

New Approach ◽

Traveling Tournament Problem ◽

Significant Interest ◽

National League ◽

Better Than

The Traveling Tournament Problem (TTP) is concerned with finding a double round-robin tournament schedule that minimizes the total distances traveled by the teams. It has attracted significant interest recently since a favorable TTP schedule can result in significant savings for the league. This paper proposes an original evolutionary algorithm for TTP. We first propose a quick and effective constructive algorithm to construct a Double Round Robin Tournament (DRRT) schedule with low travel cost. We then describe an enhanced genetic algorithm with a new crossover operator to improve the travel cost of the generated schedules. A new heuristic for ordering efficiently the scheduled rounds is also proposed. The latter leads to significant enhancement in the quality of the schedules. The overall method is evaluated on publicly available standard benchmarks and compared with other techniques for TTP and UTTP (Unconstrained Traveling Tournament Problem). The computational experiment shows that the proposed approach could build very good solutions comparable to other state-of-the-art approaches or better than the current best solutions on UTTP. Further, our method provides new valuable solutions to some unsolved UTTP instances and outperforms prior methods for all US National League (NL) instances.

Download Full-text

Molecular Recognition: Perspective and a New Approach

Sensors ◽

10.3390/s21082757 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2757

Author(s):

W. Rudolf Seitz ◽

Casey J. Grenier ◽

John R. Csoros ◽

Rongfang Yang ◽

Tianyu Ren

Keyword(s):

Molecular Recognition ◽

Molecularly Imprinted Polymers ◽

Binding Sites ◽

Binding Kinetics ◽

Molecularly Imprinted ◽

New Approach ◽

Imprinted Polymers ◽

High Affinity ◽

Water Blocking ◽

High Level

This perspective presents an overview of approaches to the preparation of molecular recognition agents for chemical sensing. These approaches include chemical synthesis, using catalysts from biological systems, partitioning, aptamers, antibodies and molecularly imprinted polymers. The latter three approaches are general in that they can be applied with a large number of analytes, both proteins and smaller molecules like drugs and hormones. Aptamers and antibodies bind analytes rapidly while molecularly imprinted polymers bind much more slowly. Most molecularly imprinted polymers, formed by polymerizing in the presence of a template, contain a high level of covalent crosslinker that causes the polymer to form a separate phase. This results in a material that is rigid with low affinity for analyte and slow binding kinetics. Our approach to templating is to use predominantly or exclusively noncovalent crosslinks. This results in soluble templated polymers that bind analyte rapidly with high affinity. The biggest challenge of this approach is that the chains are tangled when the templated polymer is dissolved in water, blocking access to binding sites.

Download Full-text

4D Einstein equations in a general gauge Kaluza–Klein space

International Journal of Geometric Methods in Modern Physics ◽

10.1142/s021988781550036x ◽

2015 ◽

Vol 12 (03) ◽

pp. 1550036

Author(s):

Aurel Bejancu ◽

Constantin Călin

Keyword(s):

Einstein Equations ◽

Tensor Fields ◽

New Approach ◽

Higher Dimensional ◽

High Level ◽

General Gauge ◽

Kaluza Klein

Using the new approach on higher-dimensional Kaluza–Klein theories developed by the first author, we obtain the 4D Einstein equations on a (4 + n)D relativistic gauge Kaluza–Klein space. Adapted frame and coframe fields, adapted tensor fields, and the Riemannian adapted connection, have a fundamental role in the study. The high level of generality of the study, enables us to recover several results from earlier papers on this matter.

Download Full-text

Dependency Graph-based High-level Synthesis for Maximum Instruction Parallelism

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3468875 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1-15

Author(s):

Zhenghua Gu ◽

Wenqing Wan ◽

Jundong Xie ◽

Chang Wu

Keyword(s):

Performance Optimization ◽

Directed Acyclic Graph ◽

Scheduling Algorithm ◽

Dependency Graph ◽

High Level Synthesis ◽

Limiting Factor ◽

Circuit Performance ◽

State Transition Graph ◽

High Level ◽

Basic Blocks

Performance optimization is an important goal for High-level Synthesis (HLS). Existing HLS scheduling algorithms are all based on Control and Data Flow Graph (CDFG) and will schedule basic blocks in sequential order. Our study shows that the sequential scheduling order of basic blocks is a big limiting factor for achievable circuit performance. In this article, we propose a Dependency Graph (DG) with two important properties for scheduling. First, DG is a directed acyclic graph. Thus, no loop breaking heuristic is needed for scheduling. Second, DG can be used to identify the exact instruction parallelism. Our experiment shows that DG can lead to 76% instruction parallelism increase over CDFG. Based on DG, we propose a bottom-up scheduling algorithm to achieve much higher instruction parallelism than existing algorithms. Hierarchical state transition graph with guard conditions is proposed for efficient implementation of such high parallelism scheduling. Our experimental results show that our DG-based HLS algorithm can outperform the CDFG-based LegUp and the state-of-the-art industrial tool Vivado HLS by 2.88× and 1.29× on circuit latency, respectively.

Download Full-text