scholarly journals Graph-Augmented Code Summarization in Computational Notebooks

Author(s):  
April Wang ◽  
Dakuo Wang ◽  
Xuye Liu ◽  
Lingfei Wu

Computational notebooks allow data scientists to express their ideas through a combination of code and documentation. However, data scientists often pay attention only to the code and neglect the creation of the documentation in a notebook. In this work, we present a human-centered automation system, Themisto, that can support users to easily create documentation via three approaches: 1) We have developed and reported a GNN-augmented code documentation generation algorithm in a previous paper, which can generate documentation for a given source code; 2) Themisto also implements a query-based approach to retrieve the online API documentation as the summary for certain types of source code; 3) Lastly, Themistoalso enables a user prompt approach to motivate users to write documentation for some use cases that automation does not work well.

2022 ◽  
Vol 29 (2) ◽  
pp. 1-33
Author(s):  
April Yi Wang ◽  
Dakuo Wang ◽  
Jaimie Drozdal ◽  
Michael Muller ◽  
Soya Park ◽  
...  

Computational notebooks allow data scientists to express their ideas through a combination of code and documentation. However, data scientists often pay attention only to the code, and neglect creating or updating their documentation during quick iterations. Inspired by human documentation practices learned from 80 highly-voted Kaggle notebooks, we design and implement Themisto, an automated documentation generation system to explore how human-centered AI systems can support human data scientists in the machine learning code documentation scenario. Themisto facilitates the creation of documentation via three approaches: a deep-learning-based approach to generate documentation for source code, a query-based approach to retrieve online API documentation for source code, and a user prompt approach to nudge users to write documentation. We evaluated Themisto in a within-subjects experiment with 24 data science practitioners, and found that automated documentation generation techniques reduced the time for writing documentation, reminded participants to document code they would have ignored, and improved participants’ satisfaction with their computational notebook.


2021 ◽  
Author(s):  
Eduardo Marmitt ◽  
Helmo Alan Batista de Araújo ◽  
Mariângela Mendes Recco ◽  
Matheus Lorenzato Braga

The ExpeRT Platform is a system created to assist in the development of pedagogical experimental. After tests accomplished in classrooms, deficits were pointed out in the Data Viewer System (DVS), by the teacher and creator of the ExpeRT Platform. This work consists of enhancing the ExpeRT Platform by using the Java language to modify the data viewer system (DVS) source code and solve the issues pointed out, leading to an update for the seventh version. In addition, provide the creation of a web page as your portal to supply the system’s download.


Author(s):  
Raquel Fialho de Queiroz Lafetá ◽  
Thiago Fialho de Queiroz Lafetá ◽  
Marcelo de Almeida Maia

A substantial effort, in general, is required for understanding APIs of application frameworks. High-quality API documentation may alleviate the effort, but the production of such documentation still poses a major challenge for modern frameworks. To facilitate the production of framework instantiation documentation, we hypothesize that the framework code itself and the code of existing instantiations provide useful information. However, given the size and complexity of existent code, automated approaches are required to assist the documentation production. Our goal is to assess an automated approach for constructing relevant documentation for framework instantiation based on source code analysis of the framework itself and of existing instantiations. The criterion for defining whether documentation is relevant would be to compare the documentation with an traditional framework documentation, considering the time spent and correctness during instantiation activities, information usefulness, complexity of the activity, navigation, satisfaction, information localization and clarity. We propose an automated approach for constructing relevant documentation for framework instantiation based on source code analysis of the framework itself and of existing instantiations. The proposed approach generates documentation in a cookbook style, where the recipes are programming activities using the necessary API elements driven by the framework features. We performed an empirical study, consisting of three experiments with 44 human subjects executing real framework instantiations aimed at comparing the use of the proposed cookbooks to traditional manual framework documentation (baseline). Our empirical assessment shows that the generated cookbooks performed better or, at least, with non-significant difference when compared to the traditional documentation, evidencing the effectiveness of the approach.


2021 ◽  
Vol 3 ◽  
Author(s):  
David Rozas ◽  
Nigel Gilbert ◽  
Paul Hodkinson ◽  
Samer Hassan

Peer production communities are based on the collaboration of communities of people, mediated by the Internet, typically to create digital commons, as in Wikipedia or free software. The contribution activities around the creation of such commons (e.g., source code, articles, or documentation) have been widely explored. However, other types of contribution whose focus is directed toward the community have remained significantly less visible (e.g., the organization of events or mentoring). This work challenges the notion of contribution in peer production through an in-depth qualitative study of a prominent “code-centric” example: the case of the free software project Drupal. Involving the collaboration of more than a million participants, the Drupal project supports nearly 2% of websites worldwide. This research (1) offers empirical evidence of the perception of “community-oriented” activities as contributions, and (2) analyzes their lack of visibility in the digital platforms of collaboration. Therefore, through the exploration of a complex and “code-centric” case, this study aims to broaden our understanding of the notion of contribution in peer production communities, incorporating new kinds of contributions customarily left invisible.


Author(s):  
Ulrike Schultze ◽  
Anita D. Bhappu

Co-production, which is the generation of value through the direct involvement of customers in the creation of a service context and in the design, delivery, and marketing of goods and services that they themselves consume, implies customer-firm collaboration. The nature of this collaboration, however, is highly dependent on the organization’s service design, which increasingly includes Internet technology, as well as customer communities. Whereas dyadic co-production implies a single customer’s involvement with a firm, community-based co-production implies multiple customers simultaneously engaged in value-adding activities with a firm. In order to build a theoretical understanding of these modes of customer collaboration and to explore the role and implications of Internet technologies within them, we develop a contingency theory of customer co-production designs. We then use cases of Internet-based services to highlight the benefits and challenges of relying on Internet technology to implement customer co-production.


2020 ◽  
Vol 10 (20) ◽  
pp. 7253
Author(s):  
Tong Li ◽  
Shiheng Wang ◽  
David Lillis ◽  
Zhen Yang

Maintaining traceability links of software systems is a crucial task for software management and development. Unfortunately, dealing with traceability links are typically taken as afterthought due to time pressure. Some studies attempt to use information retrieval-based methods to automate this task, but they only concentrate on calculating the textual similarity between various software artifacts and do not take into account the properties of such artifacts. In this paper, we propose a novel traceability link recovery approach, which comprehensively measures the similarity between use cases and source code by exploring their particular properties. To this end, we leverage and combine machine learning and logical reasoning techniques. On the one hand, our method extracts features by considering the semantics of the use cases and source code, and uses a classification algorithm to train the classifier. On the other hand, we utilize the relationships between artifacts and define a series of rules to recover traceability links. In particular, we not only leverage source code’s structural information, but also take into account the interrelationships between use cases. We have conducted a series of experiments on multiple datasets to evaluate our approach against existing approaches, the results of which show that our approach is substantially better than other methods.


2006 ◽  
Vol 79 (11) ◽  
pp. 1588-1598 ◽  
Author(s):  
Lu Zhang ◽  
Tao Qin ◽  
Zhiying Zhou ◽  
Dan Hao ◽  
Jiasu Sun
Keyword(s):  

2019 ◽  
Vol 8 (2) ◽  
pp. 5888-5895

Natural language processing on software systems usually contain high dimensional noisy and irrelevant features which lead to inaccurate and poor contextual similarity between the project source code and its API documentation. Most of the traditional source code analysis models are independent of finding and extracting the relevant features for contextual similarity. As the size of the project source code and its related API documentation increases, these models incorporate the contextual similarity between the source code and API documentation for code analysis. One of the best solutions for this problem is finding the essential features using the source code dependency graph. In this paper, the dependency graph is used to compute the contextual similarity computation between the source code metrics and its API documents. A novel contextual similarity measure is used to find the relationship between the project source code metrics to the API documents. Proposed model is evaluated on different project source codes and API documents in terms of pre-processing, context similarity and runtime. Experimental results show that the proposed model has high computational efficiency compared to the existing models on the large size datasets


Sign in / Sign up

Export Citation Format

Share Document