code analysis
Recently Published Documents





2022 ◽  
Vol 31 (2) ◽  
pp. 1-25
Ryan Williams ◽  
Tongwei Ren ◽  
Lorenzo De Carli ◽  
Long Lu ◽  
Gillian Smith

IoT firmware oftentimes incorporates third-party components, such as network-oriented middleware and media encoders/decoders. These components consist of large and mature codebases, shipping with a variety of non-critical features. Feature bloat increases code size, complicates auditing/debugging, and reduces stability. This is problematic for IoT devices, which are severely resource-constrained and must remain operational in the field for years. Unfortunately, identification and complete removal of code related to unwanted features requires familiarity with codebases of interest, cumbersome manual effort, and may introduce bugs. We address these difficulties by introducing PRAT, a system that takes as input the codebase of software of interest, identifies and maps features to code, presents this information to a human analyst, and removes all code belonging to unwanted features. PRAT solves the challenge of identifying feature-related code through a novel form of differential dynamic analysis and visualizes results as user-friendly feature graphs . Evaluation on diverse codebases shows superior code removal compared to both manual feature deactivation and state-of-art debloating tools, and generality across programming languages. Furthermore, a user study comparing PRAT to manual code analysis shows that it can significantly simplify the feature identification workflow.

2021 ◽  
Vol 24 (4) ◽  
pp. 1-31
Luca Demetrio ◽  
Scott E. Coull ◽  
Battista Biggio ◽  
Giovanni Lagorio ◽  
Alessandro Armando ◽  

Recent work has shown that adversarial Windows malware samples—referred to as adversarial EXE mples in this article—can bypass machine learning-based detection relying on static code analysis by perturbing relatively few input bytes. To preserve malicious functionality, previous attacks either add bytes to existing non-functional areas of the file, potentially limiting their effectiveness, or require running computationally demanding validation steps to discard malware variants that do not correctly execute in sandbox environments. In this work, we overcome these limitations by developing a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks based on practical, functionality-preserving manipulations to the Windows Portable Executable file format. These attacks, named Full DOS , Extend , and Shift , inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section. Our experimental results show that these attacks outperform existing ones in both white-box and black-box scenarios, achieving a better tradeoff in terms of evasion rate and size of the injected payload, while also enabling evasion of models that have been shown to be robust to previous attacks. To facilitate reproducibility of our findings, we open source our framework and all the corresponding attack implementations as part of the secml-malware Python library. We conclude this work by discussing the limitations of current machine learning-based malware detectors, along with potential mitigation strategies based on embedding domain knowledge coming from subject-matter experts directly into the learning process.

2021 ◽  
Vol 2021 ◽  
pp. 1-15
Jing Ge Feng ◽  
Ye Ping He ◽  
Qiu Ming Tao

Automatic vectorization is an important technique for compilers to improve the parallelism of programs. With the widespread usage of SIMD (Single Instruction Multiple Data) extensions in modern processors, automatic vectorization has become a hot topic in the research of compiler techniques. Accurately evaluating the effectiveness of automatic vectorization in typical compilers is quite valuable for compiler optimization and design. This paper evaluates the effectiveness of automatic vectorization, analyzes the limitation of automatic vectorization and the main causes, and improves the automatic vectorization technology. This paper firstly classifies the programs by two main factors: program characteristics and transformation methods. Then, it evaluates the effectiveness of automatic vectorization in three well-known compilers (GCC, LLVM, and ICC, including their multiple versions in recent 5 years) through TSVC (Test Suite for Vectorizing Compilers) benchmark. Furthermore, this paper analyzes the limitation of automatic vectorization based on source code analysis, and introduces the differences between academic research and engineering practice in automatic vectorization and the main causes, Finally, it gives some suggestions as to how to improve automatic vectorization capability.

2021 ◽  
Vol 38 (1) ◽  
pp. 159-168

Tools that focus on static code analysis for early error detection are of utmost importance in software development, especially since the propagation of errors is strongly related to higher costs in the development process. Formal Concept Analysis is a prominent field of applied mathematics that uses conceptual landscapes to discover and represent maximal clusters of data. Its expressive visualization method makes it suitable for exploratory analyses in different fields. In this paper we present a Formal Concept Analysis framework for static code analysis that can serve as a model for quantitative and qualitative exploration and interpretation of such results.

2021 ◽  
Vol 14 (3 (41)) ◽  
pp. 23-41
Srđan Mladenov JOVANOVIĆ ◽  

Since late 2019, Serbia has been gripped in a wave of protests against, as scholarly research has dubbed it, the semi-authoritarian regime of President Aleksandar Vučić. Having in mind that the President’s regime has by known been uncovered to rule by direct and indirect control of the media, the arguably main government-supporting daily newspaper, the Informer, has been covering the protests avidly, and with significant vitriol. With the understanding a headline is seen by the reader more commonly than the whole body of the article and having in mind the Informer’s pro-clivity towards exaggeration and hyperbole, we have analyzed all of the daily’s headlines that refer to the protests, protesters, or protest/opposition leaders during the so-called ‘First phase of the protests’ via the methodo-logical position of Operational Code Analysis. The paper shows a fairly extreme OPCODE for the Informer.

2021 ◽  
Vol 5 (OOPSLA) ◽  
pp. 1-32
Peisen Yao ◽  
Qingkai Shi ◽  
Heqing Huang ◽  
Charles Zhang

This paper concerns the scalability challenges of symbolic abstraction: given a formula ϕ in a logic L and an abstract domain A , find a most precise element in the abstract domain that over-approximates the meaning of ϕ. Symbolic abstraction is an important point in the space of abstract interpretation, as it allows for automatically synthesizing the best abstract transformers. However, current techniques for symbolic abstraction can have difficulty delivering on its practical strengths, due to performance issues. In this work, we introduce two algorithms for the symbolic abstraction of quantifier-free bit-vector formulas, which apply to the bit-vector interval domain and a certain kind of polyhedral domain, respectively. We implement and evaluate the proposed techniques on two machine code analysis clients, namely static memory corruption analysis and constrained random fuzzing. Using a suite of 57,933 queries from the clients, we compare our approach against a diverse group of state-of-the-art algorithms. The experiments show that our algorithms achieve a substantial speedup over existing techniques and illustrate significant precision advantages for the clients. Our work presents strong evidence that symbolic abstraction of numeric domains can be efficient and practical for large and realistic programs.

Sign in / Sign up

Export Citation Format

Share Document