abstract syntax tree
Recently Published Documents


TOTAL DOCUMENTS

89
(FIVE YEARS 38)

H-INDEX

8
(FIVE YEARS 2)

2021 ◽  
Vol 5 (ICFP) ◽  
pp. 1-29
Author(s):  
Chaitanya Koparkar ◽  
Mike Rainey ◽  
Michael Vollmer ◽  
Milind Kulkarni ◽  
Ryan R. Newton

Recent work showed that compiling functional programs to use dense, serialized memory representations for recursive algebraic datatypes can yield significant constant-factor speedups for sequential programs. But serializing data in a maximally dense format consequently serializes the processing of that data, yielding a tension between density and parallelism. This paper shows that a disciplined, practical compromise is possible. We present Parallel Gibbon, a compiler that obtains the benefits of dense data formats and parallelism. We formalize the semantics of the parallel location calculus underpinning this novel implementation strategy, and show that it is type-safe. Parallel Gibbon exceeds the parallel performance of existing compilers for purely functional programs that use recursive algebraic datatypes, including, notably, abstract-syntax-tree traversals as in compilers.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Yingjie Xu ◽  
Gengran Hu ◽  
Lin You ◽  
Chengtang Cao

In recent years, a lot of vulnerabilities of smart contracts have been found. Hackers used these vulnerabilities to attack the corresponding contracts developed in the blockchain system such as Ethereum, and it has caused lots of economic losses. Therefore, it is very important to find out the potential problems of the smart contracts and develop more secure smart contracts. As blockchain security events have raised more important issues, more and more smart contract security analysis methods have been developed. Most of these methods are based on traditional static analysis or dynamic analysis methods. There are only a few methods that use emerging technologies, such as machine learning. Some models that use machine learning to detect smart contract vulnerabilities cost much time in extracting features manually. In this paper, we introduce a novel machine learning-based analysis model by introducing the shared child nodes for smart contract vulnerabilities. We build the Abstract-Syntax-Tree (AST) for smart contracts with some vulnerabilities from two data sets including SmartBugs and SolidiFI-benchmark. Then, we build the Abstract-Syntax-Tree (AST) of the labeled smart contract for data sets named Smartbugs-wilds. Next, we get the shared child nodes from both of the ASTs to obtain the structural similarity, and then, we construct a feature vector composed of the values that measure structural similarity automatically to build our machine learning model. Finally, we get a KNN model that can predict eight types of vulnerabilities including Re-entrancy, Arithmetic, Access Control, Denial of Service, Unchecked Low Level Calls, Bad Randomness, Front Running, and Denial of Service. The accuracy, recall, and precision of our KNN model are all higher than 90%. In addition, compared with some other analysis tools including Oyente and SmartCheck, our model has higher accuracy. In addition, we spent less time for training .


2021 ◽  
Vol 2021 (4) ◽  
pp. 464-479
Author(s):  
Edwin Dauber ◽  
Robert Erbacher ◽  
Gregory Shearer ◽  
Michael Weisman ◽  
Frederica Nelson ◽  
...  

Abstract Source code authorship attribution can be used for many types of intelligence on binaries and executables, including forensics, but introduces a threat to the privacy of anonymous programmers. Previous work has shown how to attribute individually authored code files and code segments. In this work, we examine authorship segmentation, in which we determine authorship of arbitrary parts of a program. While previous work has performed segmentation at the textual level, we attempt to attribute subtrees of the abstract syntax tree (AST). We focus on two primary problems: identifying the primary author of an arbitrary AST subtree and identifying on which edges of the AST primary authorship changes. We demonstrate that the former is a difficult problem but the later is much easier. We also demonstrate methods by which we can leverage the easier problem to improve accuracy for the harder problem. We show that while identifying the author of subtrees is difficult overall, this is primarily due to the abundance of small subtrees: in the validation set we can attribute subtrees of at least 25 nodes with accuracy over 80% and at least 33 nodes with accuracy over 90%, while in the test set we can attribute subtrees of at least 33 nodes with accuracy of 70%. While our baseline accuracy for single AST nodes is 20.21% for the validation set and 35.66% for the test set, we present techniques by which we can increase this accuracy to 42.01% and 49.21% respectively. We further present observations about collaborative code found on GitHub that may drive further research.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Yao Meng

The intelligent code search with natural language queries has become an important researching area in software engineering. In this paper, we propose a novel deep learning framework At-CodeSM for source code search. The powerful code encoder in At-CodeSM, which is implemented with an abstract syntax tree parsing algorithm (Tree-LSTM) and token-level encoders, maintains both the lexical and structural features of source code in the process of code vectorizing. Both the representative and discriminative models are implemented with deep neural networks. Our experiments on the CodeSearchNet dataset show that At-CodeSM yields better performance in the task of intelligent code searching than previous approaches.


2021 ◽  
Vol 12 (3) ◽  
pp. 17-31
Author(s):  
Amandeep Kaur ◽  
Munish Saini

In the software system, the code snippets that are copied and pasted in the same software or another software result in cloning. The basic cause of cloning is either a programmer‘s constraint or language constraints. An increase in the maintenance cost of software is the major drawback of code clones. So, clone detection techniques are required to remove or refactor the code clone. Recent studies exhibit the abstract syntax tree (AST) captures the structural information of source code appropriately. Many researchers used tree-based convolution for identifying the clone, but this technique has certain drawbacks. Therefore, in this paper, the authors propose an approach that finds the semantic clone through square-based convolution by taking abstract syntax representation of source code. Experimental results show the effectiveness of the approach to the popular BigCloneBench benchmark.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Hang Xu ◽  
Ganyu Qin ◽  
Junhu Zhu ◽  
Zimian Liu ◽  
Zhiqiang Liu

Coverage-based greybox fuzzing has strong capabilities in discovering virtualization software vulnerabilities. Efficiency is one of the most important indicators while evaluating greybox fuzzing. However, the interference of virtual hardware state conditions on testcase evaluation severely impairs the efficiency of greybox fuzzing. In order to reduce the interference of virtual hardware state conditions and increase the efficiency of fuzzing, we propose a state-based virtual hardware fuzzing framework, named SAVHF (State-Aware Virtual Hardware Fuzzing). In this framework, a source-to-source instrumentation method based on the abstract syntax tree is proposed to detect the state condition of virtual hardware. Based on the source-to-source instrumentation, we afterwards propose a state-based fuzzing strategy to adapt to the state conditions of virtual hardware. We realize the prototype system of SAVHF and use it to evaluate 17 popular virtual hardware of Qemu and find 16 bugs with 1 CVE (Common Vulnerabilities and Exposures) number assigned. Evaluation results demonstrate that the proposed SAVHF framework covers an average of more than 61% of virtual hardware code branches in the 18 hours testing and can improve the average code coverage by 11.04% compared with the path-based fuzzing strategy.


Sign in / Sign up

Export Citation Format

Share Document