Context:
Code smells are symptoms, that something may be wrong in software systems that can cause
complications in maintaining software quality. In literature, there exists many code smells and their identification is far
from trivial. Thus, several techniques have also been proposed to automate code smell detection in order to improve
software quality.
Objective:
This paper presents an up-to-date review of simple and hybrid machine learning based code smell detection
techniques and tools.
Methods:
We collected all the relevant research published in this field till 2020. We extracted the data from those articles
and classified them into two major categories. In addition, we compared the selected studies based on several aspects like,
code smells, machine learning techniques, datasets, programming languages used by datasets, dataset size, evaluation
approach, and statistical testing.
Results:
Majority of empirical studies have proposed machine- learning based code smell detection tools. Support vector
machine and decision tree algorithms are frequently used by the researchers. Along with this, a major proportion of
research is conducted on Open Source Softwares (OSS) such as, Xerces, Gantt Project and ArgoUml. Furthermore,
researchers paid more attention towards Feature Envy and Long Method code smells.
Conclusion:
We identified several areas of open research like, need of code smell detection techniques using hybrid
approaches, need of validation employing industrial datasets, etc.