Analysing GoLang Projects’ Architecture Using Code Metrics and Code Smell

Automatic detection of Long Method and God Class code smells through neural source code embeddings

10.36227/techrxiv.17206010.v1 ◽

2021 ◽

Author(s):

Aleksandar Kovačević ◽

Jelena Slivka ◽

Dragan Vidaković ◽

Katarina-Glorija Grujić ◽

Nikola Luburić ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Negative Impact ◽

Source Code ◽

Systematic Evaluation ◽

Small Scale ◽

Code Smells ◽

Code Metrics ◽

Code Smell ◽

F Measure

Code smells are structures in code that often have a negative impact on its quality. Manually detecting code smells is challenging and researchers proposed many automatic code smell detectors. Most of the studies propose detectors based on code metrics and heuristics. However, these studies have several limitations, including evaluating the detectors using small-scale case studies and an inconsistent experimental setting. Furthermore, heuristic-based detectors suffer from limitations that hinder their adoption in practice. Thus, researchers have recently started experimenting with machine learning (ML) based code smell detection. This paper compares the performance of multiple ML-based code smell detection models against multiple traditionally employed metric-based heuristics for detection of God Class and Long Method code smells. We evaluate the effectiveness of different source code representations for machine learning: traditionally used code metrics and code embeddings (code2vec, code2seq, and CuBERT). We perform our experiments on the large-scale, manually labeled MLCQ dataset. We consider the binary classification problem – we classify the code samples as smelly or non-smelly and use the F1-measure of the minority (smell) class as a measure of performance. In our experiments, the ML classifier trained using CuBERT source code embeddings achieved the best performance for both God Class (F-measure of 0.53) and Long Method detection (F-measure of 0.75). With the help of a domain expert, we perform the error analysis to discuss the advantages of the CuBERT approach. This study is the first to evaluate the effectiveness of pre-trained neural source code embeddings for code smell detection to the best of our knowledge. A secondary contribution of our study is the systematic evaluation of the effectiveness of multiple heuristic-based approaches on the same large-scale, manually labeled MLCQ dataset.

Download Full-text

Automatic detection of Long Method and God Class code smells through neural source code embeddings

10.36227/techrxiv.17206010 ◽

2021 ◽

Author(s):

Aleksandar Kovačević ◽

Jelena Slivka ◽

Dragan Vidaković ◽

Katarina-Glorija Grujić ◽

Nikola Luburić ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Negative Impact ◽

Source Code ◽

Systematic Evaluation ◽

Small Scale ◽

Code Smells ◽

Code Metrics ◽

Code Smell ◽

F Measure

Code smells are structures in code that often have a negative impact on its quality. Manually detecting code smells is challenging and researchers proposed many automatic code smell detectors. Most of the studies propose detectors based on code metrics and heuristics. However, these studies have several limitations, including evaluating the detectors using small-scale case studies and an inconsistent experimental setting. Furthermore, heuristic-based detectors suffer from limitations that hinder their adoption in practice. Thus, researchers have recently started experimenting with machine learning (ML) based code smell detection. This paper compares the performance of multiple ML-based code smell detection models against multiple traditionally employed metric-based heuristics for detection of God Class and Long Method code smells. We evaluate the effectiveness of different source code representations for machine learning: traditionally used code metrics and code embeddings (code2vec, code2seq, and CuBERT). We perform our experiments on the large-scale, manually labeled MLCQ dataset. We consider the binary classification problem – we classify the code samples as smelly or non-smelly and use the F1-measure of the minority (smell) class as a measure of performance. In our experiments, the ML classifier trained using CuBERT source code embeddings achieved the best performance for both God Class (F-measure of 0.53) and Long Method detection (F-measure of 0.75). With the help of a domain expert, we perform the error analysis to discuss the advantages of the CuBERT approach. This study is the first to evaluate the effectiveness of pre-trained neural source code embeddings for code smell detection to the best of our knowledge. A secondary contribution of our study is the systematic evaluation of the effectiveness of multiple heuristic-based approaches on the same large-scale, manually labeled MLCQ dataset.

Download Full-text

A User Feedback Centric Approach for Detecting and Mitigating God Class Code Smell Using Frequent Usage Patterns

Journal of Communications Software and Systems ◽

10.24138/jcomss.v15i3.720 ◽

2019 ◽

Vol 15 (3) ◽

Author(s):

Randeep Singh ◽

Amit Bindal ◽

Ashok Kumar

Keyword(s):

Software Evolution ◽

User Feedback ◽

Software Systems ◽

Code Smells ◽

New Approach ◽

Design Choice ◽

Usage Patterns ◽

Code Metrics ◽

Code Smell ◽

Software Evolution And Maintenance

Code smells are the fragments in the source code that indicates deeper problems in the underlying software design. These code smells can hinder software evolution and maintenance. Out of different code smell types, the God Class (GC) code smell is one of the many important code smells that directly affects the software evolution and maintenance. The GC is commonly defined as a much larger class in systems that either know too much or do too much as compared to other classes in the system. God Classes are generally accidentally created overtime during software evolution because of the incremental addition of functionalities to it. Generally, a GC indicates a bad design choice and it must be detected and mitigated in order to enhance the quality of the underlying software. However, sometimes the presence of a GC is also considered a good design choice, especially in compiler design, interpreter design and parser implementation. This makes the developer’s feedback important for the correct classification of a class as a GC or a normal class. Therefore, this paper proposes a new approach that detects and proposes refactoring opportunities for GC code smell. The proposed approach makes use of different code metrics in combination along with utilizing user feedback as an important aspect while correctly identifying the GC code smell. The proposed approach that considers combined use of code metrics, is based on two newly proposed code metrics in this paper. The first newly proposed metric is a new approach of measuring the connectivity of a given class with other classes in the system (also termed as coupling). The second newly proposed code metric is proposed to measure the extent to which a given classes make use of foreign member variables. Finally, the proposed approach is also empirically evaluated on two standard open-source commonly used software systems. The obtained result indicates that the proposed approach is capable of correctly identifying the GC code smell.

Download Full-text

A Study on Different Tools for Code Smell Detection

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i7.762764 ◽

2018 ◽

Vol 6 (7) ◽

pp. 762-764

Author(s):

S.James Benedict Felix ◽

Viji Vinod

Keyword(s):

Code Smell

Download Full-text

A Review on Machine-Learning Based Code Smell Detection Techniquesin Object-Oriented Software System(s)

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2352096513999200922125839 ◽

2020 ◽

Vol 13 ◽

Author(s):

Amandeep Kaur ◽

Sushma Jain ◽

Shivani Goel ◽

Gaurav Dhiman

Keyword(s):

Machine Learning ◽

Programming Languages ◽

Software Quality ◽

Empirical Studies ◽

Statistical Testing ◽

Machine Learning Techniques ◽

Support Vector ◽

Code Smells ◽

Detection Techniques ◽

Code Smell

Context: Code smells are symptoms, that something may be wrong in software systems that can cause complications in maintaining software quality. In literature, there exists many code smells and their identification is far from trivial. Thus, several techniques have also been proposed to automate code smell detection in order to improve software quality. Objective: This paper presents an up-to-date review of simple and hybrid machine learning based code smell detection techniques and tools. Methods: We collected all the relevant research published in this field till 2020. We extracted the data from those articles and classified them into two major categories. In addition, we compared the selected studies based on several aspects like, code smells, machine learning techniques, datasets, programming languages used by datasets, dataset size, evaluation approach, and statistical testing. Results: Majority of empirical studies have proposed machine- learning based code smell detection tools. Support vector machine and decision tree algorithms are frequently used by the researchers. Along with this, a major proportion of research is conducted on Open Source Softwares (OSS) such as, Xerces, Gantt Project and ArgoUml. Furthermore, researchers paid more attention towards Feature Envy and Long Method code smells. Conclusion: We identified several areas of open research like, need of code smell detection techniques using hybrid approaches, need of validation employing industrial datasets, etc.

Download Full-text

A Novel four-way approach designed with Ensemble Feature Selection for Code Smell Detection

IEEE Access ◽

10.1109/access.2021.3049823 ◽

2021 ◽

pp. 1-1

Author(s):

Inderpreet Kaur ◽

Arvinder Kaur

Keyword(s):

Feature Selection ◽

Code Smell ◽

Selection For

Download Full-text

Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based Invocation Metrics

Technologies ◽

10.3390/technologies9010003 ◽

2020 ◽

Vol 9 (1) ◽

pp. 3

Author(s):

Gábor Antal ◽

Zoltán Tóth ◽

Péter Hegedűs ◽

Rudolf Ferenc

Keyword(s):

Software Maintenance ◽

Positive Impact ◽

Source Code ◽

Code Analysis ◽

Static Source ◽

Static Code Analysis ◽

Function Calls ◽

Hybrid Code ◽

Code Metrics ◽

Scripting Language

Bug prediction aims at finding source code elements in a software system that are likely to contain defects. Being aware of the most error-prone parts of the program, one can efficiently allocate the limited amount of testing and code review resources. Therefore, bug prediction can support software maintenance and evolution to a great extent. In this paper, we propose a function level JavaScript bug prediction model based on static source code metrics with the addition of a hybrid (static and dynamic) code analysis based metric of the number of incoming and outgoing function calls (HNII and HNOI). Our motivation for this is that JavaScript is a highly dynamic scripting language for which static code analysis might be very imprecise; therefore, using a purely static source code features for bug prediction might not be enough. Based on a study where we extracted 824 buggy and 1943 non-buggy functions from the publicly available BugsJS dataset for the ESLint JavaScript project, we can confirm the positive impact of hybrid code metrics on the prediction performance of the ML models. Depending on the ML algorithm, applied hyper-parameters, and target measures we consider, hybrid invocation metrics bring a 2–10% increase in model performances (i.e., precision, recall, F-measure). Interestingly, replacing static NOI and NII metrics with their hybrid counterparts HNOI and HNII in itself improves model performances; however, using them all together yields the best results.

Download Full-text