Topology-Inspired Method Recovers Obfuscated Term Information From Induced Software Call-Stacks

Fuzzing is a systematic large-scale search for software vulnerabilities achieved by feeding a sequence of randomly mutated input files to the program of interest with the goal being to induce a crash. The information about inputs, software execution traces, and induced call stacks (crashes) can be used to pinpoint and fix errors in the code or exploited as a means to damage an adversary’s computer software. In black box fuzzing, the primary unit of information is the call stack: a list of nested function calls and line numbers that report what the code was executing at the time it crashed. The source code is not always available in practice, and in some situations even the function names are deliberately obfuscated (i.e., removed or given generic names). We define a topological object called the call-stack topology to capture the relationships between module names, function names and line numbers in a set of call stacks obtained via black-box fuzzing. In a proof-of-concept study, we show that structural properties of this object in combination with two elementary heuristics allow us to build a logistic regression model to predict the locations of distinct function names over a set of call stacks. We show that this model can extract function name locations with around 80% precision in data obtained from fuzzing studies of various linux programs. This has the potential to benefit software vulnerability experts by increasing their ability to read and compare call stacks more efficiently.

Download Full-text

PERANGKAT LUNAK KOMPUTER

10.31219/osf.io/tjbfr ◽

2020 ◽

Author(s):

Cut Nabilah Damni

Keyword(s):

Programming Languages ◽

Programming Language ◽

Operating Systems ◽

Source Code ◽

Computer Software ◽

Computer Programs ◽

Application Systems ◽

Executable Programs

AbstrakSoftware komputer atau perangkat lunak komputer merupakan kumpulan instruksi (program atau prosedur) untuk dapat melaksanakan pekerjaan secara otomatis dengan cara mengolah atau memproses kumpulan intruksi (data) yang diberikan. (Yahfizham, 2019 : 19) Sebagian besar dari software komputer dibuat oleh (programmer) dengan menggunakan bahasa pemprograman. Orang yang membuat bahasa pemprograman menuliskan perintah dalam bahasa pemprograman seperti layaknya bahasa yang digunakan oleh orang pada umumnya dalam melakukan perbincangan. Perintah-perintah tersebut dinamakan (source code). Program komputer lainnya dinamakan (compiler) yang digunakan pada (source code) dan kemudian mengubah perintah tersebut kedalam bahasa yang dimengerti oleh komputer lalu hasilnya dinamakan program executable (EXE). Pada dasarnya, komputer selalu memiliki perangkat lunak komputer atau software yang terdiri dari sistem operasi, sistem aplikasi dan bahasa pemograman.AbstractComputer software or computer software is a collection of instructions (programs or procedures) to be able to carry out work automatically by processing or processing the collection of instructions (data) provided. (Yahfizham, 2019: 19) Most of the computer software is made by (programmers) using the programming language. People who make programming languages write commands in the programming language like the language used by people in general in conducting conversation. The commands are called (source code). Other computer programs called (compilers) are used in (source code) and then change the command into a language understood by the computer and the results are called executable programs (EXE). Basically, computers always have computer software or software consisting of operating systems, application systems and programming languages.

Download Full-text

A Practical Black-box Attack on Source Code Authorship Identification Classifiers

IEEE Transactions on Information Forensics and Security ◽

10.1109/tifs.2021.3080507 ◽

2021 ◽

pp. 1-1

Author(s):

Qianjun Liu ◽

Shouling Ji ◽

Changchang Liu ◽

Chunming Wu

Keyword(s):

Source Code ◽

Black Box ◽

Authorship Identification

Download Full-text

Temporal concatenation for Markov decision processes

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964821000206 ◽

2021 ◽

pp. 1-28

Author(s):

Ruiyang Song ◽

Kuang Xu

Keyword(s):

Markov Decision Processes ◽

Large Scale ◽

Optimal Solution ◽

Upper Bounds ◽

Black Box ◽

Decision Processes ◽

Optimal Solutions ◽

Wide Range ◽

Markov Decision ◽

Speed Up

We propose and analyze a temporal concatenation heuristic for solving large-scale finite-horizon Markov decision processes (MDP), which divides the MDP into smaller sub-problems along the time horizon and generates an overall solution by simply concatenating the optimal solutions from these sub-problems. As a “black box” architecture, temporal concatenation works with a wide range of existing MDP algorithms. Our main results characterize the regret of temporal concatenation compared to the optimal solution. We provide upper bounds for general MDP instances, as well as a family of MDP instances in which the upper bounds are shown to be tight. Together, our results demonstrate temporal concatenation's potential of substantial speed-up at the expense of some performance degradation.

Download Full-text

Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based Invocation Metrics

Technologies ◽

10.3390/technologies9010003 ◽

2020 ◽

Vol 9 (1) ◽

pp. 3

Author(s):

Gábor Antal ◽

Zoltán Tóth ◽

Péter Hegedűs ◽

Rudolf Ferenc

Keyword(s):

Software Maintenance ◽

Positive Impact ◽

Source Code ◽

Code Analysis ◽

Static Source ◽

Static Code Analysis ◽

Function Calls ◽

Hybrid Code ◽

Code Metrics ◽

Scripting Language

Bug prediction aims at finding source code elements in a software system that are likely to contain defects. Being aware of the most error-prone parts of the program, one can efficiently allocate the limited amount of testing and code review resources. Therefore, bug prediction can support software maintenance and evolution to a great extent. In this paper, we propose a function level JavaScript bug prediction model based on static source code metrics with the addition of a hybrid (static and dynamic) code analysis based metric of the number of incoming and outgoing function calls (HNII and HNOI). Our motivation for this is that JavaScript is a highly dynamic scripting language for which static code analysis might be very imprecise; therefore, using a purely static source code features for bug prediction might not be enough. Based on a study where we extracted 824 buggy and 1943 non-buggy functions from the publicly available BugsJS dataset for the ESLint JavaScript project, we can confirm the positive impact of hybrid code metrics on the prediction performance of the ML models. Depending on the ML algorithm, applied hyper-parameters, and target measures we consider, hybrid invocation metrics bring a 2–10% increase in model performances (i.e., precision, recall, F-measure). Interestingly, replacing static NOI and NII metrics with their hybrid counterparts HNOI and HNII in itself improves model performances; however, using them all together yields the best results.

Download Full-text

Application of Wide-Band Ultrasound for the Detection of Angled Crack Features in Oil and Gas Pipelines

Volume 1: Pipeline and Facilities Integrity ◽

10.1115/ipc2018-78521 ◽

2018 ◽

Author(s):

Willem Vos ◽

Petter Norli ◽

Emilie Vallee

Keyword(s):

Large Scale ◽

Crack Detection ◽

Oil And Gas ◽

Wide Band ◽

Theoretical Background ◽

Proof Of Concept ◽

Offshore Pipelines ◽

Large Scale Testing ◽

Theoretical Predictions ◽

Set Up

This paper describes a novel technique for the detection of cracks in pipelines. The proposed in-line inspection technique has the ability to detect crack features at random angles in the pipeline, such as axial, circumferential, and any angle in between. This ability is novel to the current ILI technology offering and will also add value by detecting cracks in deformed pipes (i.e. in dents), and cracks associated with the girth weld (mid weld cracks, rapid cooling cracks and cracks parallel to the weld). Furthermore, the technology is suitable for detection of cracks in spiral welded pipes, both parallel to the spiral weld as well as perpendicular to the weld. Integrity issues around most features described above are not addressed with ILI tools, often forcing operators to perform hydrostatic tests to ensure pipeline safety. The technology described here is based on the use of wideband ultrasound inline inspection tools that are already in operation. They are designed for the inspection of structures operating in challenging environments such as offshore pipelines. Adjustments to the front-end analog system and data collection from a grid of transducers allow the tools to detect cracks in any orientation in the line. Description of changes to the test set-up are presented as well as the theoretical background behind crack detection. Historical development of the technology will be presented, such as early laboratory testing and proof of concept. The proof of concept data will be compared to the theoretical predictions. A detailed set of results are presented. These are from tests that were performed on samples sourced from North America and Europe which contain SCC features. Results from ongoing testing will be presented, which involved large-scale testing on SCC features in gas-filled pipe spools.

Download Full-text

Automatic detection of Long Method and God Class code smells through neural source code embeddings

10.36227/techrxiv.17206010.v1 ◽

2021 ◽

Author(s):

Aleksandar Kovačević ◽

Jelena Slivka ◽

Dragan Vidaković ◽

Katarina-Glorija Grujić ◽

Nikola Luburić ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Negative Impact ◽

Source Code ◽

Systematic Evaluation ◽

Small Scale ◽

Code Smells ◽

Code Metrics ◽

Code Smell ◽

F Measure

Code smells are structures in code that often have a negative impact on its quality. Manually detecting code smells is challenging and researchers proposed many automatic code smell detectors. Most of the studies propose detectors based on code metrics and heuristics. However, these studies have several limitations, including evaluating the detectors using small-scale case studies and an inconsistent experimental setting. Furthermore, heuristic-based detectors suffer from limitations that hinder their adoption in practice. Thus, researchers have recently started experimenting with machine learning (ML) based code smell detection. This paper compares the performance of multiple ML-based code smell detection models against multiple traditionally employed metric-based heuristics for detection of God Class and Long Method code smells. We evaluate the effectiveness of different source code representations for machine learning: traditionally used code metrics and code embeddings (code2vec, code2seq, and CuBERT). We perform our experiments on the large-scale, manually labeled MLCQ dataset. We consider the binary classification problem – we classify the code samples as smelly or non-smelly and use the F1-measure of the minority (smell) class as a measure of performance. In our experiments, the ML classifier trained using CuBERT source code embeddings achieved the best performance for both God Class (F-measure of 0.53) and Long Method detection (F-measure of 0.75). With the help of a domain expert, we perform the error analysis to discuss the advantages of the CuBERT approach. This study is the first to evaluate the effectiveness of pre-trained neural source code embeddings for code smell detection to the best of our knowledge. A secondary contribution of our study is the systematic evaluation of the effectiveness of multiple heuristic-based approaches on the same large-scale, manually labeled MLCQ dataset.

Download Full-text

Results from a large-scale study of performance optimization techniques for source code analyses based on graph reachability algorithms

Proceedings Third IEEE International Workshop on Source Code Analysis and Manipulation ◽

10.1109/scam.2003.1238046 ◽

2004 ◽

Cited By ~ 9

Author(s):

D. Binkley ◽

M. Harman

Keyword(s):

Performance Optimization ◽

Large Scale ◽

Source Code ◽

Optimization Techniques ◽

Large Scale Study ◽

Graph Reachability

Download Full-text

The Secret Life of Software Vulnerabilities: A Large-Scale Empirical Study

IEEE Transactions on Software Engineering ◽

10.1109/tse.2022.3140868 ◽

2022 ◽

pp. 1-1

Author(s):

Emanuele Iannone ◽

Roberta Guadagni ◽

Filomena Ferrucci ◽

Andrea De Lucia ◽

Fabio Palomba

Keyword(s):

Empirical Study ◽

Large Scale ◽

Software Vulnerabilities

Download Full-text

Detection of Redundant Condition Expression for Large Scale Source Code

Communications in Computer and Information Science - Geo-Spatial Knowledge and Intelligence ◽

10.1007/978-981-13-0893-2_33 ◽

2018 ◽

pp. 312-317

Author(s):

Dandan Gong ◽

Wensheng Xu ◽

Chunfang Qiu ◽

Libei Zhou

Keyword(s):

Large Scale ◽

Source Code

Download Full-text

Non-Intrusive Adaptation of System Execution Traces for Performance Analysis of Software Systems

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Handbook of Research on Emerging Advancements and Technologies in Software Engineering ◽

10.4018/978-1-4666-6026-7.ch021 ◽

2014 ◽

pp. 473-493

Author(s):

Manjula Peiris ◽

James H. Hill

Keyword(s):

Performance Analysis ◽

System Performance ◽

Source Code ◽

Software Systems ◽

Software System ◽

Original Source ◽

Performance Properties ◽

Execution Traces ◽

Execution Trace ◽

Analysis Of Performance

This chapter discusses how to adapt system execution traces to support analysis of software system performance properties, such as end-to-end response time, throughput, and service time. This is important because system execution traces contain complete snapshots of a systems execution—making them useful artifacts for analyzing software system performance properties. Unfortunately, if system execution traces do not contain the required properties, then analysis of performance properties is hard. In this chapter, the authors discuss: (1) what properties are required to analysis performance properties in a system execution trace; (2) different approaches for injecting the required properties into a system execution trace to support performance analysis; and (3) show, by example, the solution for one approach that does not require modifying the original source code of the system that produced the system execution.

Download Full-text