An Empirical Investigation of Relevant Changes and Automation Needs in Modern Code Review

Abstract Recent research has shown that available tools for Modern Code Review (MCR) are still far from meeting the current expectations of developers. The objective of this paper is to investigate the approaches and tools that, from a developer’s point of view, are still needed to facilitate MCR activities. To that end, we first empirically elicited a taxonomy of recurrent review change types that characterize MCR. The taxonomy was designed by performing three steps: (i) we generated an initial version of the taxonomy by qualitatively and quantitatively analyzing 211 review changes/commits and 648 review comments of ten open-source projects; then (ii) we integrated into this initial taxonomy, topics, and MCR change types of an existing taxonomy available from the literature; finally, (iii) we surveyed 52 developers to integrate eventually missing change types in the taxonomy. Results of our study highlight that the availability of new emerging development technologies (e.g., Cloud-based technologies) and practices (e.g., Continuous delivery) has pushed developers to perform additional activities during MCR and that additional types of feedback are expected by reviewers. Our participants provided recommendations, specified techniques to employ, and highlighted the data to analyze for building recommender systems able to automate the code review activities composing our taxonomy. We surveyed 14 additional participants (12 developers and 2 researchers), not involved in the previous survey, to qualitatively assess the relevance and completeness of the identified MCR change types as well as assess how critical and feasible to implement are some of the identified techniques to support MCR activities. Thus, with a study involving 21 additional developers, we qualitatively assess the feasibility and usefulness of leveraging natural language feedback (automation considered critical/feasible to implement) in supporting developers during MCR activities. In summary, this study sheds some more light on the approaches and tools that are still needed to facilitate MCR activities, confirming the feasibility and usefulness of using summarization techniques during MCR activities. We believe that the results of our work represent an essential step for meeting the expectations of developers and supporting the vision of full or partial automation in MCR.

Download Full-text

Licensing Schemes in the Production and Distribution of Open Source Software: An Empirical Investigation

SSRN Electronic Journal ◽

10.2139/ssrn.432641 ◽

2003 ◽

Cited By ~ 10

Author(s):

Andrea Bonaccorsi ◽

Cristina Rossi

Keyword(s):

Open Source ◽

Open Source Software ◽

Empirical Investigation ◽

Production And Distribution

Download Full-text

A Sequential and Intensive Weighted Language Modeling Scheme for Multi-Task Learning-Based Natural Language Understanding

Applied Sciences ◽

10.3390/app11073095 ◽

2021 ◽

Vol 11 (7) ◽

pp. 3095

Author(s):

Suhyune Son ◽

Seonjeong Hwang ◽

Sohyeun Bae ◽

Soo Jun Park ◽

Jang-Hwan Choi

Keyword(s):

Natural Language ◽

Language Processing ◽

Empirical Investigation ◽

Natural Language Understanding ◽

Language Modeling ◽

Language Understanding ◽

Task Learning ◽

Language Representation ◽

Internal Transfer ◽

Two Stages

Multi-task learning (MTL) approaches are actively used for various natural language processing (NLP) tasks. The Multi-Task Deep Neural Network (MT-DNN) has contributed significantly to improving the performance of natural language understanding (NLU) tasks. However, one drawback is that confusion about the language representation of various tasks arises during the training of the MT-DNN model. Inspired by the internal-transfer weighting of MTL in medical imaging, we introduce a Sequential and Intensive Weighted Language Modeling (SIWLM) scheme. The SIWLM consists of two stages: (1) Sequential weighted learning (SWL), which trains a model to learn entire tasks sequentially and concentrically, and (2) Intensive weighted learning (IWL), which enables the model to focus on the central task. We apply this scheme to the MT-DNN model and call this model the MTDNN-SIWLM. Our model achieves higher performance than the existing reference algorithms on six out of the eight GLUE benchmark tasks. Moreover, our model outperforms MT-DNN by 0.77 on average on the overall task. Finally, we conducted a thorough empirical investigation to determine the optimal weight for each GLUE task.

Download Full-text

Why Do Developers Reject Refactorings in Open-Source Projects?

ACM Transactions on Software Engineering and Methodology ◽

10.1145/3487062 ◽

2022 ◽

Vol 31 (2) ◽

pp. 1-23

Author(s):

Jevgenija Pantiuchina ◽

Bin Lin ◽

Fiorella Zampetti ◽

Massimiliano Di Penta ◽

Michele Lanza ◽

...

Keyword(s):

Open Source ◽

Software Quality ◽

Good Practice ◽

Source Code ◽

Code Review ◽

Code Quality ◽

Shed Light

Refactoring operations are behavior-preserving changes aimed at improving source code quality. While refactoring is largely considered a good practice, refactoring proposals in pull requests are often rejected after the code review. Understanding the reasons behind the rejection of refactoring contributions can shed light on how such contributions can be improved, essentially benefiting software quality. This article reports a study in which we manually coded rejection reasons inferred from 330 refactoring-related pull requests from 207 open-source Java projects. We surveyed 267 developers to assess their perceived prevalence of these identified rejection reasons, further complementing the reasons. Our study resulted in a comprehensive taxonomy consisting of 26 refactoring-related rejection reasons and 21 process-related rejection reasons. The taxonomy, accompanied with representative examples and highlighted implications, provides developers with valuable insights on how to ponder and polish their refactoring contributions, and indicates a number of directions researchers can pursue toward better refactoring recommenders.

Download Full-text

Associating Natural Language Comment and Source Code Entities

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6382 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8592-8599

Author(s):

Sheena Panthaplackel ◽

Milos Gligoric ◽

Raymond J. Mooney ◽

Junyi Jessy Li

Keyword(s):

Software Development ◽

Natural Language ◽

Open Source ◽

Source Code ◽

Initial Step ◽

Binary Classifier ◽

Sequence Labeling ◽

Evaluation Dataset ◽

Revision Histories

Comments are an integral part of software development; they are natural language descriptions associated with source code elements. Understanding explicit associations can be useful in improving code comprehensibility and maintaining the consistency between code and comments. As an initial step towards this larger goal, we address the task of associating entities in Javadoc comments with elements in Java source code. We propose an approach for automatically extracting supervised data using revision histories of open source projects and present a manually annotated evaluation dataset for this task. We develop a binary classifier and a sequence labeling model by crafting a rich feature set which encompasses various aspects of code, comments, and the relationships between them. Experiments show that our systems outperform several baselines learning from the proposed supervision.

Download Full-text

Semantic computational analysis of anticoagulation use in atrial fibrillation from real world data

10.1101/19011643 ◽

2019 ◽

Author(s):

Daniel M. Bean ◽

James Teo ◽

Honghan Wu ◽

Ricardo Oliveira ◽

Raj Patel ◽

...

Keyword(s):

Atrial Fibrillation ◽

Natural Language Processing ◽

Natural Language ◽

Electronic Health Record ◽

Open Source ◽

Language Processing ◽

Risk Scores ◽

Free Text ◽

Health Record ◽

Electronic Health

AbstractAtrial fibrillation (AF) is the most common arrhythmia and significantly increases stroke risk. This risk is effectively managed by oral anticoagulation. Recent studies using national registry data indicate increased use of anticoagulation resulting from changes in guidelines and the availability of newer drugs.The aim of this study is to develop and validate an open source risk scoring pipeline for free-text electronic health record data using natural language processing.AF patients discharged from 1st January 2011 to 1st October 2017 were identified from discharge summaries (N=10,030, 64.6% male, average age 75.3 ± 12.3 years). A natural language processing pipeline was developed to identify risk factors in clinical text and calculate risk for ischaemic stroke (CHA2DS2-VASc) and bleeding (HAS-BLED). Scores were validated vs two independent experts for 40 patients.Automatic risk scores were in strong agreement with the two independent experts for CHA2DS2-VASc (average kappa 0.78 vs experts, compared to 0.85 between experts). Agreement was lower for HAS-BLED (average kappa 0.54 vs experts, compared to 0.74 between experts).In high-risk patients (CHA2DS2-VASc ≥2) OAC use has increased significantly over the last 7 years, driven by the availability of DOACs and the transitioning of patients from AP medication alone to OAC. Factors independently associated with OAC use included components of the CHA2DS2-VASc and HAS-BLED scores as well as discharging specialty and frailty. OAC use was highest in patients discharged under cardiology (69%).Electronic health record text can be used for automatic calculation of clinical risk scores at scale. Open source tools are available today for this task but require further validation. Analysis of routinely-collected EHR data can replicate findings from large-scale curated registries.

Download Full-text

Type coercion from a natural language generation point of view

Mediating between Concepts and Grammar ◽

10.1515/9783110919585.323 ◽

2011 ◽

Author(s):

Markus Egg ◽

Kristina Striegnitz

Keyword(s):

Natural Language ◽

Natural Language Generation ◽

Point Of View ◽

Language Generation

Download Full-text

A Case Study of Mifos Implementation at Asomi

Advanced Technologies for Microfinance - Advances in Finance, Accounting, and Economics ◽

10.4018/978-1-61520-993-4.ch005 ◽

2010 ◽

pp. 72-91 ◽

Cited By ~ 1

Author(s):

Puspadhar Das

Keyword(s):

Open Source ◽

The State ◽

Point Of View ◽

The Other ◽

Loan Portfolio ◽

Operational Strategies ◽

Microfinance Institution

Mifos is an open source enterprise solution for microfinance. This chapter is a practitioner’s point of view on implementation of Mifos in an organization, based on the author’s experience in implementing Mifos at Asomi, a microfinance institution operating in the state of Assam, India. The factors to be considered in selection and implementation of Mifos are discussed. Various inputs, analyses and resources required for implementation are discussed. Any organization must have a concrete set of operational strategies that enables it to track its borrowers and loan portfolio effectively and on time in order to succeed. Wrong assumptions and choice of wrong technology may only aggravate MIS implementation. Development of technology has removed all the barriers to technologies and has enabled organizations to develop computerised systems streamlined to their operational needs and not the other way round. It is attempted to justify this by using the case of Mifos.

Download Full-text

Sociocultural Implications of Wikipedia

Encyclopedia of Multimedia Technology and Networking, Second Edition ◽

10.4018/978-1-60566-014-1.ch179 ◽

2009 ◽

pp. 1333-1338 ◽

Cited By ~ 1

Author(s):

Ramanjit Singh

Keyword(s):

Open Source ◽

Point Of View ◽

Cultural Relativism ◽

Communication Process ◽

The Internet ◽

Time And Space ◽

Technological Infrastructure ◽

Group Collaboration ◽

Cultural Climate ◽

Collaborative Efforts

Wikipedia is a free encyclopedia that operates worldwide on the Internet. Articles on Wikipedia are developed with close collaboration of volunteers and anyone can edit the content (Wikipedia, 2006e). Although there are many advantages of using Wikipedia as a group collaboration tool, there are important implications. First, Wikipedia community is diverse and intercultural differences can distort the communication process. Second, the neutral point of view (NPOV) policy can lead to disputes. Third, lack of supervision and open source policy can be another source of conflict. Forth, administration of articles can be complex due to differing cultural and political stand points (Smith & Kollock, 1999). Laslty, differences in time and space as well as low level of access to the Internet can significantly impede collaboration efforts at Wikipedia (Berry, 2006; Madon, 2000; Parayil, 2006; Sahay, Nicholson, & Krishna, 2003). Hence, the aim of this paper is to examine sociocultural implications of using Wikipedia as a group collaboration tool spanning multiple countries and how social and cultural climate, differences in time and space, as well as technological infrastructure of countries affect collaboration between individuals given the distinctive operational and administration policies at Wikipedia. It is believed that findings from this research will increase the awareness of the underlying cause of many disputes arising at Wikipedia. In addition, this research will lead to cultural relativism and provide neutral grounds for collaborative efforts at Wikipedia in the future.

Download Full-text

An Empirical Study on the Migration to OpenOffice.org in a Public Administration

Handbook of Research on Public Information Technology ◽

10.4018/978-1-59904-857-4.ch071 ◽

2008 ◽

pp. 818-832

Author(s):

B. Rossi ◽

M. Scotto ◽

A. Sillitti ◽

G. Succi

Keyword(s):

Empirical Study ◽

Open Source ◽

Public Administration ◽

Open Source Software ◽

Quantitative Data ◽

Office Automation ◽

Point Of View ◽

Qualitative And Quantitative ◽

It Managers

The aim of the article is to report the results of a migration to Open Source Software (OSS) in one public administration. The migration focuses on the office automation field and, in particular, on the OpenOffice.org suite. We have analysed the transition to OSS considering qualitative and quantitative data collected with the aid of different tools. All the data have been always considered from the point of view of the different stakeholders involved, IT managers, IT technicians, and users. The results of the project have been largely satisfactory. However the results cannot be generalised due to some constraints, like the environment considered and the parallel use of the old solution. Nevertheless, we think that the data collected can be of valuable aid to managers wishing to evaluate a possible transition to OSS.

Download Full-text

The impact of preprocessing in natural language for open source intelligence and criminal investigation

2019 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata47090.2019.9006006 ◽

2019 ◽

Author(s):

Jan William Johnsen ◽

Katrin Franke

Keyword(s):

Natural Language ◽

Open Source ◽

Criminal Investigation ◽

Open Source Intelligence ◽

The Impact

Download Full-text