Implementing a high-efficiency similarity analysis approach for firmware code

The rapid expansion of the open-source community has shortened the software development cycle, but the spread of vulnerabilities has been accelerated, especially in the field of the Internet of Things. In recent years, the frequency of attacks against connected devices is increasing exponentially; thus, the vulnerabilities are more serious in nature. The state-of-the-art firmware security inspection technologies, such as methods based on machine learning and graph theory, find similar applications depending on the known vulnerabilities but cannot do anything without detailed information about the vulnerabilities. Moreover, model training, which is necessary for the machine learning technologies, requires a significant amount of time and data, resulting in low efficiency and poor extensibility. Aiming at the above shortcomings, a high-efficiency similarity analysis approach for firmware code is proposed in this study. First, the function control flow features and data flow features are extracted from the functions of the firmware and of the vulnerabilities, and the features are used to calculate the SimHash of the functions. The mass storage and fast query capabilities of the SimHash are implemented by the pigeonhole principle. Second, the similarity function pairs are analyzed in detail within and among the basic blocks. Within the basic blocks, the symbolic execution is used to generate the basic block semantic information, and the constraint solver is used to determine the semantic equivalence. Among the basic blocks, the local control flow graphs are analyzed to obtain their similarity. Then, we implemented a prototype and present the evaluation. The evaluation results demonstrate that the proposed approach can implement large-scale firmware function similarity analysis. It can also get the location of the real-world firmware patch without vulnerability function information. Finally, we compare our method with existing methods. The comparison results demonstrate that our method is more efficient and accurate than the Gemini and StagedMethod. More than 90% of the firmware functions can be indexed within 0.1 s, while the search time of 100,000 firmware functions is less than 2 s.

Download Full-text

Improving Similarity Measure for Java Programs Based on Optimal Matching of Control Flow Graphs

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194015500229 ◽

2015 ◽

Vol 25 (07) ◽

pp. 1171-1197 ◽

Cited By ~ 6

Author(s):

Dehong Qiu ◽

Jialin Sun ◽

Hao Li

Keyword(s):

Graph Matching ◽

Control Flow ◽

Basic Block ◽

New Approach ◽

Java Program ◽

Java Programs ◽

Instruction Sequences ◽

High Level ◽

Flow Graphs ◽

Basic Blocks

Measuring program similarity plays an important role in solving many problems in software engineering. However, because programs are instruction sequences with complex structures and semantic functions and furthermore, programs may be obfuscated deliberately through semantics-preserving transformations, measuring program similarity is a difficult task that has not been adequately addressed. In this paper, we propose a new approach to measuring Java program similarity. The approach first measures the low-level similarity between basic blocks according to the bytecode instruction sequences and the structural property of the basic blocks. Then, an error-tolerant graph matching algorithm that can combat structure transformations is used to match the Control Flow Graphs (CFG) based on the basic block similarity. The high-level similarity between Java programs is subsequently calculated on the matched pairs of the independent paths extracted from the optimal CFG matching. The proposed CFG-Match approach is compared with a string-based approach, a tree-based approach and a graph-based approach. Experimental results show that the CFG-Match approach is more accurate and robust against semantics-preserving transformations. The CFG-Match approach is used to detect Java program plagiarism. Experiments on the collection of benchmark program pairs collected from the students’ submission of project assignments demonstrate that the CFG-Match approach outperforms the comparative approaches in the detection of Java program plagiarism.

Download Full-text

Android malware analysis approach based on control flow graphs and machine learning algorithms

2016 4th International Symposium on Digital Forensic and Security (ISDFS) ◽

10.1109/isdfs.2016.7473512 ◽

2016 ◽

Cited By ~ 6

Author(s):

Mehmet Ali Atici ◽

Seref Sagiroglu ◽

Ibrahim Alper Dogru

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Control Flow ◽

Machine Learning Algorithms ◽

Analysis Approach ◽

Malware Analysis ◽

Android Malware ◽

Flow Graphs

Download Full-text

StFuzzer: Contribution-Aware Coverage-Guided Fuzzing for Smart Devices

Security and Communication Networks ◽

10.1155/2021/1987844 ◽

2021 ◽

Vol 2021 ◽

pp. 1-15

Author(s):

Jiageng Yang ◽

Xinguo Zhang ◽

Hui Lu ◽

Muhammad Shafiq ◽

Zhihong Tian

Keyword(s):

Real World ◽

Control Flow ◽

Smart Devices ◽

Smart Device ◽

Basic Block ◽

Optimization Approach ◽

Seed Selection ◽

Code Coverage ◽

Real World Applications ◽

Basic Blocks

The root cause of the insecurity for smart devices is the potential vulnerabilities in smart devices. There are many approaches to find the potential bugs in smart devices. Fuzzing is the most effective vulnerability finding technique, especially the coverage-guided fuzzing. The coverage-guided fuzzing identifies the high-quality seeds according to the corresponding code coverage triggered by these seeds. Existing coverage-guided fuzzers consider that the higher the code coverage of seeds, the greater the probability of triggering potential bugs. However, in real-world applications running on smart devices or the operation system of the smart device, the logic of these programs is very complex. Basic blocks of these programs play a different role in the process of application exploration. This observation is ignored by existing seed selection strategies, which reduces the efficiency of bug discovery on smart devices. In this paper, we propose a contribution-aware coverage-guided fuzzing, which estimates the contributions of basic blocks for the process of smart device exploration. According to the control flow of the target on any smart device and the runtime information during the fuzzing process, we propose the static contribution of a basic block and the dynamic contribution built on the execution frequency of each block. The contribution-aware optimization approach does not require any prior knowledge of the target device, which ensures our optimization adapting gray-box fuzzing and white-box fuzzing. We designed and implemented a contribution-aware coverage-guided fuzzer for smart devices, called StFuzzer. We evaluated StFuzzer on four real-world applications that are often applied on smart devices to demonstrate the efficiency of our contribution-aware optimization. The result of our trials shows that the contribution-aware approach significantly improves the capability of bug discovery and obtains better execution speed than state-of-the-art fuzzers.

Download Full-text

Instruction Scheduling Across Control Flow

Scientific Programming ◽

10.1155/1993/536143 ◽

1993 ◽

Vol 2 (3) ◽

pp. 1-5

Author(s):

Martin Charles Golumbic ◽

Vladimir Rainish

Keyword(s):

Time Delays ◽

Instruction Scheduling ◽

Control Flow ◽

Basic Block ◽

High Quality ◽

Assembly Code ◽

Considerable Research ◽

Straight Line ◽

Compiled Code ◽

Basic Blocks

Instruction scheduling algorithms are used in compilers to reduce run-time delays for the compiled code by the reordering or transformation of program statements, usually at the intermediate language or assembly code level. Considerable research has been carried out on scheduling code within the scope of basic blocks, i.e., straight line sections of code, and very effective basic block schedulers are now included in most modern compilers and especially for pipeline processors. In previous work Golumbic and Rainis: IBM J. Res. Dev., Vol. 34, pp.93–97, 1990, we presented code replication techniques for scheduling beyond the scope of basic blocks that provide reasonable improvements of running time of the compiled code, but which still leaves room for further improvement. In this article we present a new method for scheduling beyond basic blocks called SHACOOF. This new technique takes advantage of a conventional, high quality basic block scheduler by first suppressing selected subsequences of instructions and then scheduling the modified sequence of instructions using the basic block scheduler. A candidate subsequence for suppression can be found by identifying a region of a program control flow graph, called an S-region, which has a unique entry and a unique exit and meets predetermined criteria. This enables scheduling of a sequence of instructions beyond basic block boundaries, with only minimal changes to an existing compiler, by identifying beneficial opportunities to cover delays that would otherwise have been beyond its scope.

Download Full-text

Size and Cost Optimization of AutoCAD Oil and Gas Control Flow Designs Using Constraint Satisfaction Problem and Machine Learning

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i5.550555 ◽

2018 ◽

Vol 6 (5) ◽

pp. 550-555

Author(s):

H.A. Kore ◽

◽

S.B. Mane ◽

...

Keyword(s):

Machine Learning ◽

Constraint Satisfaction ◽

Oil And Gas ◽

Constraint Satisfaction Problem ◽

Cost Optimization ◽

Control Flow ◽

Gas Control

Download Full-text

Machine Learning–enabled Scalable Performance Prediction of Scientific Codes

ACM Transactions on Modeling and Computer Simulation ◽

10.1145/3450264 ◽

2021 ◽

Vol 31 (2) ◽

pp. 1-28

Author(s):

Gopinath Chennupati ◽

Nandakishore Santhi ◽

Phill Romero ◽

Stephan Eidenbenz

Keyword(s):

Machine Learning ◽

Performance Prediction ◽

Prediction Models ◽

Radiation Transport ◽

Discrete Event ◽

Basic Block ◽

Distribution Models ◽

Scientific Application ◽

High Level ◽

Access Patterns

Hardware architectures become increasingly complex as the compute capabilities grow to exascale. We present the Analytical Memory Model with Pipelines (AMMP) of the Performance Prediction Toolkit (PPT). PPT-AMMP takes high-level source code and hardware architecture parameters as input and predicts runtime of that code on the target hardware platform, which is defined in the input parameters. PPT-AMMP transforms the code to an (architecture-independent) intermediate representation, then (i) analyzes the basic block structure of the code, (ii) processes architecture-independent virtual memory access patterns that it uses to build memory reuse distance distribution models for each basic block, and (iii) runs detailed basic-block level simulations to determine hardware pipeline usage. PPT-AMMP uses machine learning and regression techniques to build the prediction models based on small instances of the input code, then integrates into a higher-order discrete-event simulation model of PPT running on Simian PDES engine. We validate PPT-AMMP on four standard computational physics benchmarks and present a use case of hardware parameter sensitivity analysis to identify bottleneck hardware resources on different code inputs. We further extend PPT-AMMP to predict the performance of a scientific application code, namely, the radiation transport mini-app SNAP. To this end, we analyze multi-variate regression models that accurately predict the reuse profiles and the basic block counts. We validate predicted SNAP runtimes against actual measured times.

Download Full-text

Android botnet detection using machine learning models based on a comprehensive static analysis approach

Journal of Information Security and Applications ◽

10.1016/j.jisa.2020.102735 ◽

2021 ◽

Vol 58 ◽

pp. 102735

Author(s):

Wadi’ Hijawi ◽

Ja’far Alqatawna ◽

Ala’ M. Al-Zoubi ◽

Mohammad A. Hassonah ◽

Hossam Faris

Keyword(s):

Machine Learning ◽

Static Analysis ◽

Analysis Approach ◽

Learning Models ◽

Botnet Detection ◽

Machine Learning Models

Download Full-text

Simulation of athlete gait recognition based on spectral features and machine learning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189568 ◽

2020 ◽

pp. 1-12

Author(s):

Linuo Wang

Keyword(s):

Machine Learning ◽

Angular Velocity ◽

Gait Recognition ◽

Spectral Feature ◽

Velocity Signal ◽

Comparison Results ◽

Computer Vision Technology ◽

Feature Technology ◽

Leg Movement ◽

Recognition Efficiency

The current technology related to athlete gait recognition has shortcomings such as complicated equipment and high cost, and there are also certain problems in recognition accuracy and recognition efficiency. In order to improve the efficiency of athletes’ gait recognition, this paper studies the different recognition technologies of athletes based on machine learning and spectral feature technology and applies computer vision technology to sports. Moreover, according to the calf angular velocity signal, the occurrence of leg movement is detected in real time, and the gait cycle is accurately divided to reduce the influence of the signal unrelated to the behavior on the recognition process. In addition, this study proposes a gait behavior recognition method based on event-driven strategies. This method uses a gyroscope as the main sensor and uses a wearable sensor node to collect the angular velocity signals of the legs and waist. In addition, this study analyzes the performance of the algorithm proposed by this paper through experimental research. The comparison results show that the method proposed by this paper has improved the number of recognition action types and accuracy and has certain advantages from the perspective of computation and scalability.

Download Full-text

Burst Pressure Prediction of API 5L X-Grade Dented Pipelines Using Deep Neural Network

Journal of Marine Science and Engineering ◽

10.3390/jmse8100766 ◽

2020 ◽

Vol 8 (10) ◽

pp. 766

Author(s):

Dohan Oh ◽

Julia Race ◽

Selda Oterkus ◽

Bonguk Koo

Keyword(s):

Neural Network ◽

Machine Learning ◽

Artificial Neural Network ◽

Network Model ◽

Neural Network Model ◽

Deep Neural Network ◽

Machine Learning Techniques ◽

Burst Pressure ◽

Comparison Results ◽

Artificial Neural

Mechanical damage is recognized as a problem that reduces the performance of oil and gas pipelines and has been the subject of continuous research. The artificial neural network in the spotlight recently is expected to be another solution to solve the problems relating to the pipelines. The deep neural network, which is on the basis of artificial neural network algorithm and is a method amongst various machine learning methods, is applied in this study. The applicability of machine learning techniques such as deep neural network for the prediction of burst pressure has been investigated for dented API 5L X-grade pipelines. To this end, supervised learning is employed, and the deep neural network model has four layers with three hidden layers, and the neural network uses the fully connected layer. The burst pressure computed by deep neural network model has been compared with the results of finite element analysis based parametric study, and the burst pressure calculated by the experimental results. According to the comparison results, it showed good agreement. Therefore, it is concluded that deep neural networks can be another solution for predicting the burst pressure of API 5L X-grade dented pipelines.

Download Full-text

Machine learning: The trends of developing high-efficiency single-atom materials

Chem Catalysis ◽

10.1016/j.checat.2021.04.005 ◽

2021 ◽

Vol 1 (1) ◽

pp. 24-26

Author(s):

Jiarui Yang ◽

Wen-Hao Li ◽

Dingsheng Wang

Keyword(s):

Machine Learning ◽

High Efficiency ◽

Single Atom

Download Full-text