Machine Learning Techniques for Code Smells Detection: A Systematic Mapping Study

Author(s):  
Frederico Luiz Caram ◽  
Bruno Rafael De Oliveira Rodrigues ◽  
Amadeu Silveira Campanelli ◽  
Fernando Silva Parreiras

Code smells or bad smells are an accepted approach to identify design flaws in the source code. Although it has been explored by researchers, the interpretation of programmers is rather subjective. One way to deal with this subjectivity is to use machine learning techniques. This paper provides the reader with an overview of machine learning techniques and code smells found in the literature, aiming at determining which methods and practices are used when applying machine learning for code smells identification and which machine learning techniques have been used for code smells identification. A mapping study was used to identify the techniques used for each smell. We found that the Bloaters was the main kind of smell studied, addressed by 35% of the papers. The most commonly used technique was Genetic Algorithms (GA), used by 22.22% of the papers. Regarding the smells addressed by each technique, there was a high level of redundancy, in a way that the smells are covered by a wide range of algorithms. Nevertheless, Feature Envy stood out, being targeted by 63% of the techniques. When it comes to performance, the best average was provided by Decision Tree, followed by Random Forest, Semi-supervised and Support Vector Machine Classifier techniques. 5 out of the 25 analyzed smells were not handled by any machine learning techniques. Most of them focus on several code smells and in general there is no outperforming technique, except for a few specific smells. We also found a lack of comparable results due to the heterogeneity of the data sources and of the provided results. We recommend the pursuit of further empirical studies to assess the performance of these techniques in a standardized dataset to improve the comparison reliability and replicability.

Author(s):  
Amandeep Kaur ◽  
Sushma Jain ◽  
Shivani Goel ◽  
Gaurav Dhiman

Context: Code smells are symptoms, that something may be wrong in software systems that can cause complications in maintaining software quality. In literature, there exists many code smells and their identification is far from trivial. Thus, several techniques have also been proposed to automate code smell detection in order to improve software quality. Objective: This paper presents an up-to-date review of simple and hybrid machine learning based code smell detection techniques and tools. Methods: We collected all the relevant research published in this field till 2020. We extracted the data from those articles and classified them into two major categories. In addition, we compared the selected studies based on several aspects like, code smells, machine learning techniques, datasets, programming languages used by datasets, dataset size, evaluation approach, and statistical testing. Results: Majority of empirical studies have proposed machine- learning based code smell detection tools. Support vector machine and decision tree algorithms are frequently used by the researchers. Along with this, a major proportion of research is conducted on Open Source Softwares (OSS) such as, Xerces, Gantt Project and ArgoUml. Furthermore, researchers paid more attention towards Feature Envy and Long Method code smells. Conclusion: We identified several areas of open research like, need of code smell detection techniques using hybrid approaches, need of validation employing industrial datasets, etc.


2021 ◽  
Vol 2021 ◽  
pp. 1-24
Author(s):  
Abderrahim El hafidy ◽  
Taoufik Rachad ◽  
Ali Idri ◽  
Ahmed Zellou

Many research works and official reports approve that irresponsible driving behavior on the road is the main cause of accidents. Consequently, responsible driving behavior can significantly reduce accidents’ number and severity. Therefore, in the research area as well as in the industrial area, mobile technologies are widely exploited in assisting drivers in reducing accident rates and preventing accidents. For instance, several mobile apps are provided to assist drivers in improving their driving behavior. Recently and thanks to mobile cloud computing, smartphones can benefit from the computing power of servers in the cloud for executing machine learning algorithms. Therefore, many mobile applications of driving assistance and control are based on machine learning techniques to adjust their functioning automatically to driver history, context, and profile. Additionally, gamification is a key element in the design of these mobile applications that allow drivers to develop their engagement and motivation to improve their driving behavior. To have an overview concerning existing mobile apps that improve driving behavior, we have chosen to conduct a systematic mapping study about driving behavior mobile apps that exist in the most common mobile apps repositories or that were published as research works in digital libraries. In particular, we should explore their functionalities, the kinds of collected data, the used gamification elements, and the used machine learning techniques and algorithms. We have successfully identified 220 mobile apps that help to improve driving behavior. In this work, we will extract all the data that seem to be useful for the classification and analysis of the functionalities offered by these applications.


2021 ◽  
Vol 14 (1) ◽  
pp. 453-463
Author(s):  
Abdul Syukur ◽  
◽  
Deden Istiawan ◽  

LQ45 is an Indonesia Stock Exchange Index (ISX) incorporate of 45 companies that meet certain criteria to target investors for selecting certain stocks. The prediction of stock price direction in the financial world is a major issue. The implementation of machine learning and other algorithms for market price analysis and forecasting is a very promising field. Different types of classification algorithms were used to predict the stock market. However, when individual studies are considered separately there is no clear consensus that algorithms work best. In this research, a comparison framework is proposed, which aims to benchmark the performance of a wide range of classification models and use them to predict the LQ45 index. The data in this research contains the transaction level and capitalization size are obtained from the Indonesian Stock Exchange (ISX). For analysis purposes, we set out 10 classifiers that can be used to build classification models and test their performance in the LQ45 dataset. The performance criterion chosen to measure this effect is accuracy, recall, and precision. The results showed that the random forest algorithm had the best performance for predicting the LQ45 index. Whilst the classification and regression trees, C4.5, support vector machine, and logistic regression algorithms also perform well. Besides, the models based on traditional statisticalbased learners that are Naïve Bayes and linear discriminant analysis seem to underperform for predicting the LQ45 index. These results are not only beneficial to enrichment the machine learning techniques literature but also have a significant influence on the stock market prediction in terms of the ability to predict the LQ45 index.


Author(s):  
Anton Ovchinnikov ◽  
Scotiabank Scholar

This case, along with its B case (UVA-QA-0865), is an effective vehicle for introducing students to the use of machine-learning techniques for classification. The specific context is predicting customer retention based on a wide range of customer attributes/features. The specific techniques could include (but are not limited to): regressions (linear and logistic), variable selection (forward/backward and stepwise), regularizations (e.g., LASSO), classification and regression trees (CART), random forests, graduate boosted trees (xgboost), neural networks, and support vector machines (SVM).The case is suitable for an advanced data analysis (data science, machine learning, and artificial intelligence) class at all levels: upper-level business undergraduate, MBA, EMBA, as well as specialized graduate or undergraduate programs in analytics (e.g., masters of science in business analytics [MSBA] and masters of management analytics [MMA]) and/or in management (e.g., masters of science in management [MScM] and masters in management [MiM, MM]).The teaching note for the case contains the pedagogy and the analyses, alongside the detailed explanations of the various techniques and their implementations in R (code provided in Exhibits and supplementary files). Python code, as well as the spreadsheet implementation in XLMiner, are available upon request.


Author(s):  
Maad M. Mijwil ◽  
Israa Ezzat Salem ◽  
Rana A. Abttan

On our planet, chemical waste increases day after day, the emergence of new types of it, as well as the high level of toxic pollution, the difficulty of daily life, the increase in the psychological state of humans, and other factors all have led to the emergence of many diseases that affect humans, including deadly once like COVID-19 disease. Symptoms may appear on a person, and sometimes they may not; some people may know their condition, and others may neglect their health status due to lack of knowledge that may lead to death, or the disease may be chronic for life. In this regard, the author executes machine learning techniques (Support Vector Machine, C5.0 Decision Tree, K-Nearest Neighbours, and Random Forest) due to their influence in medical sciences to identify the best technique that gives the highest level of accuracy in detecting diseases. Thus, this technique will help to recognise symptoms and diagnose them correctly. This article covers a dataset from the UCI machine learning repository, namely the Wisconsin Breast Cancer dataset, Chronic Kidney disease dataset, Immunotherapy dataset, Cryotherapy dataset, Hepatitis dataset and COVID-19 dataset. In the results section, a comparison is made between the execution of each technique to find out which one is the best and which one is the worst in the performance of analysis related to the dataset of each disease.


2021 ◽  
Vol 101 ◽  
pp. 107050
Author(s):  
Michał Choraś ◽  
Konstantinos Demestichas ◽  
Agata Giełczyk ◽  
Álvaro Herrero ◽  
Paweł Ksieniewicz ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document