scholarly journals Statically Detecting Vulnerabilities by Processing Programming Languages as Natural Languages

2022 ◽  
pp. 1-24
Author(s):  
Iberia Medeiros ◽  
Nuno Neves ◽  
Miguel Correia
Author(s):  
Xiaoqing Wu ◽  
Marjan Mernik ◽  
Barrett R. Bryant ◽  
Jeff Gray

Unlike natural languages, programming languages are strictly stylized entities created to facilitate human communication with computers. In order to make programming languages recognizable by computers, one of the key challenges is to describe and implement language syntax and semantics such that the program can be translated into machine-readable code. This process is normally considered as the front-end of a compiler, which is mainly related to the programming language, but not the target machine. This article will address the most important aspects in building a compiler front-end; that is, syntax and semantic analysis, including related theories, technologies and tools, as well as existing problems and future trends. As the main focus, formal syntax and semantic specifications will be discussed in detail. The article provides the reader with a high-level overview of the language implementation process, as well as some commonly used terms and development practices.


2020 ◽  
Vol 2020 (8) ◽  
pp. 309-1-309-6
Author(s):  
Xunyu Pan ◽  
Colin Crowe ◽  
Toby Myers ◽  
Emily Jetton

Mobile devices typically support input from virtual keyboards or pen-based technologies, allowing handwriting to be a potentially viable text input solution for programming on touchscreen devices. The major problem, however, is that handwriting recognition systems are built to take advantage of the rules of natural languages rather than programming languages. In addition, mobile devices are also inherently restricted by the limitation of screen size and the inconvenient use of a virtual keyboard. In this work, we create a novel handwriting-to-code transformation system on a mobile platform to recognize and analyze source code written directly on a whiteboard or a piece of paper. First, the system recognizes and further compiles the handwritten source code into an executable program. Second, a friendly graphical user interface (GUI) is provided to visualize how manipulating different sections of code impacts the program output. Finally, the coding system supports an automatic error detection and correction mechanism to help address the common syntax and spelling errors during the process of whiteboard coding. The mobile application provides a flexible and user-friendly solution for realtime handwriting-based programming for learners under various environments where the keyboard or touchscreen input is not preferred.


2020 ◽  
Vol 34 (01) ◽  
pp. 1169-1176
Author(s):  
Huangzhao Zhang ◽  
Zhuo Li ◽  
Ge Li ◽  
Lei Ma ◽  
Yang Liu ◽  
...  

Automated processing, analysis, and generation of source code are among the key activities in software and system lifecycle. To this end, while deep learning (DL) exhibits a certain level of capability in handling these tasks, the current state-of-the-art DL models still suffer from non-robust issues and can be easily fooled by adversarial attacks.Different from adversarial attacks for image, audio, and natural languages, the structured nature of programming languages brings new challenges. In this paper, we propose a Metropolis-Hastings sampling-based identifier renaming technique, named \fullmethod (\method), which generates adversarial examples for DL models specialized for source code processing. Our in-depth evaluation on a functionality classification benchmark demonstrates the effectiveness of \method in generating adversarial examples of source code. The higher robustness and performance enhanced through our adversarial training with \method further confirms the usefulness of DL models-based method for future fully automated source code processing.


Author(s):  
Karan Aggarwal ◽  
Mohammad Salameh ◽  
Abram Hindle

In this paper, we have tried to use statistical machine translation in order to convert Python 2 code to Python 3 code. We use data from two projects and achieve a high BLEU score. We also investigate the cross-project training and testing to analyze the errors so as to ascertain differences with previous case. We have described a pilot study on modeling programming languages as natural language to build translation models on the lines of natural languages. This can be further worked on to translate between versions of a programming language or cross-programming-languages code translation.


eLife ◽  
2020 ◽  
Vol 9 ◽  
Author(s):  
Yun-Fei Liu ◽  
Judy Kim ◽  
Colin Wilson ◽  
Marina Bedny

Despite the importance of programming to modern society, the cognitive and neural bases of code comprehension are largely unknown. Programming languages might ‘recycle’ neurocognitive mechanisms originally developed for natural languages. Alternatively, comprehension of code could depend on fronto-parietal networks shared with other culturally-invented symbol systems, such as formal logic and symbolic math such as algebra. Expert programmers (average 11 years of programming experience) performed code comprehension and memory control tasks while undergoing fMRI. The same participants also performed formal logic, symbolic math, executive control, and language localizer tasks. A left-lateralized fronto-parietal network was recruited for code comprehension. Patterns of activity within this network distinguish between ‘for’ loops and ‘if’ conditional code functions. In terms of the underlying neural basis, code comprehension overlapped extensively with formal logic and to a lesser degree math. Overlap with executive processes and language was low, but laterality of language and code covaried across individuals. Cultural symbol systems, including code, depend on a distinctive fronto-parietal cortical network.


2019 ◽  
Author(s):  
James Grimmelmann

Smart contracts are written in programming languages rather than in natural languages. This might seem to insulate them from ambiguity, because the meaning of a program is determined by technical facts rather than by social ones. It does not. Smart contracts can be ambiguous, too, because technical facts depend on socially determined ones. To give meaning to a computer program, a community of programmers and users must agree on the semantics of the programming language in which it is written. This is a social process, and a review of some famous controversies involving blockchains and smart contracts shows that it regularly creates serious ambiguities. In the most famous case, The DAO hack, more than $150 million in virtual currency turned on the contested semantics of a blockchain-based smart-contract programming language.


2015 ◽  
Author(s):  
Karan Aggarwal ◽  
Mohammad Salameh ◽  
Abram Hindle

In this paper, we have tried to use statistical machine translation in order to convert Python 2 code to Python 3 code. We use data from two projects and achieve a high BLEU score. We also investigate the cross-project training and testing to analyze the errors so as to ascertain differences with previous case. We have described a pilot study on modeling programming languages as natural language to build translation models on the lines of natural languages. This can be further worked on to translate between versions of a programming language or cross-programming-languages code translation.


Sign in / Sign up

Export Citation Format

Share Document