code understanding Latest Research Papers

Software systems evolve over their lifetime. Changing requirements make it inevitable to modify and extend the underlying source code. Understanding software systems embodies a crucial task, which needs to be addressed in an appropriate way to face inevitable challenges while performing software changes. In this thesis, we introduce three complementary approaches to support the evolution and particularly understanding of software systems in different aspects. Our main contributions are (i) an approach named CORAL for enabling collaborative reengineering and modularization of software systems, (ii) a gesture-based, collaborative, and multi-user-featuring Virtual Reality approach named ExplorViz VR for the software city metaphor, and (iii) a database behavior live-visualization approach named RACCOON for database comprehension of software systems. An extensive case study shows that our CORAL approach is capable of supporting reengineering and modularization processes. Furthermore, several evaluations demonstrate the high usability, and efficiency and effectiveness for solving comprehension tasks when using our multi-user VR approach ExplorViz VR.

Download Full-text

Image-based many-language programming language identification

PeerJ Computer Science ◽

10.7717/peerj-cs.631 ◽

2021 ◽

Vol 7 ◽

pp. e631

Author(s):

Francesca Del Bonifro ◽

Maurizio Gabbrielli ◽

Antonio Lategano ◽

Stefano Zacchiroli

Keyword(s):

Neural Networks ◽

Programming Languages ◽

Programming Language ◽

Real World ◽

Program Comprehension ◽

Language Identification ◽

Specific Character ◽

Automatic Program ◽

Video Tutorials ◽

Code Understanding

Programming language identification (PLI) is a common need in automatic program comprehension as well as a prerequisite for deeper forms of code understanding. Image-based approaches to PLI have recently emerged and are appealing due to their applicability to code screenshots and programming video tutorials. However, they remain limited to the recognition of a small amount of programming languages (up to 10 languages in the literature). We show that it is possible to perform image-based PLI on a large number of programming languages (up to 149 in our experiments) with high (92%) precision and recall, using convolutional neural networks (CNNs) and transfer learning, starting from readily-available pretrained CNNs. Results were obtained on a large real-world dataset of 300,000 code snippets extracted from popular GitHub repositories. By scrambling specific character classes and comparing identification performances we also show that the characters that contribute the most to the visual recognizability of programming languages are symbols (e.g., punctuation, mathematical operators and parentheses), followed by alphabetic characters, with digits and indentation having a negligible impact.

Download Full-text

UHNVM: A Universal Heterogeneous Cache Design with Non-Volatile Memory

Electronics ◽

10.3390/electronics10151760 ◽

2021 ◽

Vol 10 (15) ◽

pp. 1760

Author(s):

Xiaochang Li ◽

Zhengjun Zhai

Keyword(s):

Scale Up ◽

Main Memory ◽

Storage Devices ◽

Fine Grained ◽

Non Volatile Memory ◽

Application Data ◽

Volatile Memory ◽

Legacy Applications ◽

Application Codes ◽

Code Understanding

During the recent decades, non-volatile memory (NVM) has been anticipated to scale up the main memory size, improve the performance of applications, and reduce the speed gap between main memory and storage devices, while supporting persistent storage to cope with power outages. However, to fit NVM, all existing DRAM-based applications have to be rewritten by developers. Therefore, the developer must have a good understanding of targeted application codes, so as to manually distinguish and store data fit for NVM. In order to intelligently facilitate NVM deployment for existing legacy applications, we propose a universal heterogeneous cache hierarchy which is able to automatically select and store the appropriate data of applications for non-volatile memory (UHNVM), without compulsory code understanding. In this article, a program context (PC) technique is proposed in the user space to help UHNVM to classify data. Comparing to the conventional hot or cold files categories, the PC technique can categorize application data in a fine-grained manner, enabling us to store them either in NVM or SSDs efficiently for better performance. Our experimental results using a real Optane dual-inline-memory-module (DIMM) card show that our new heterogeneous architecture reduces elapsed times by about 11% compared to the conventional kernel memory configuration without NVM.

Download Full-text

The Mind Is a Powerful Place: How Showing Code Comprehensibility Metrics Influences Code Understanding

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) ◽

10.1109/icse43902.2021.00055 ◽

2021 ◽

Author(s):

Marvin Wyrich ◽

Andreas Preikschat ◽

Daniel Graziotin ◽

Stefan Wagner

Keyword(s):

The Mind ◽

Code Understanding

Download Full-text

MulCode: A Multi-task Learning Approach for Source Code Understanding

2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) ◽

10.1109/saner50967.2021.00014 ◽

2021 ◽

Author(s):

Deze Wang ◽

Yue Yu ◽

Shanshan Li ◽

Wei Dong ◽

Ji Wang ◽

...

Keyword(s):

Source Code ◽

Learning Approach ◽

Task Learning ◽

Code Understanding

Download Full-text

A Survey and Taxonomy of Intent-Based Code Search

International Journal of Software Innovation ◽

10.4018/ijsi.2021010106 ◽

2021 ◽

Vol 9 (1) ◽

pp. 69-110

Author(s):

Shailesh Kumar Shivakumar

Keyword(s):

State Of The Art ◽

Source Code ◽

The Novel ◽

User Intent ◽

Code Search ◽

Search Field ◽

Query Type ◽

Novel Concept ◽

Code Understanding ◽

Source Code Search

In this paper, the authors introduce the novel concept of intent-based code search that categorizes code search goals into a hierarchy. They will explore state-of-the-art techniques in source code search covering various tools, techniques, and algorithms related to source code search. They will survey the code search field through the core use cases of code search such as code reusability, code understanding, and code repair. They propose a user intent-based taxonomy based on the code search goals. The code search goal taxonomy is derived based on deep analysis of literature survey of code search, and the taxonomy is validated based on their exclusive developer survey conducted as part of this paper. The code search goal taxonomy is based on logical categorization of code search goals and shared characteristics (query type, expected response, and such) for each of the categories in the taxonomy. The paper also details the latest trends and surveys the code search tools and the implications on tool design.

Download Full-text