scholarly journals Engrossing Prosecution of Code Smells Type Identification and Rectification using Machine Learning AdaBoost Classifier

Software code smells are the structural features which reside in a software source code. Code smell detection is an established method to discover the problems in source code and reorganize the inner structure of object-oriented software for improving the quality of such software, particularly in terms of maintainability, reusability and cost minimization. The developer identified where the code smell is identified and rectified within a system is a major challenging issue. The various code smell detection technique has been designed but it failed to classify the code type and minimum rectification cost. In order to perform classification with minimum cost, an efficient technique called Machine Learning Ada-Boost Classifier (MLABC) technique is introduced. The MLABC technique improves the software quality by identifying and rectifying the different types of software code smell in source code. Initially, MLABC technique uses decision tree as base classifier to identify the code smell type. The decision tree is used to classify the code smell type based on the certain rule. After that, the base classifiers are combined to make a strong classifier using adaboost machine learning technique. The output of strong classifier is used to identify the code smell type. Finally, the code smell type rectification is performed by applying the refactoring technique where the code smell is identified with minimum cost and space complexity. Experimental results shows that the proposed MLABC technique improves the software code quality in terms of code smell type identification accuracy, false positive rate, code smell type rectification cost and space complexity with the source code

2021 ◽  
Author(s):  
Aleksandar Kovačević ◽  
Jelena Slivka ◽  
Dragan Vidaković ◽  
Katarina-Glorija Grujić ◽  
Nikola Luburić ◽  
...  

<p>Code smells are structures in code that often have a negative impact on its quality. Manually detecting code smells is challenging and researchers proposed many automatic code smell detectors. Most of the studies propose detectors based on code metrics and heuristics. However, these studies have several limitations, including evaluating the detectors using small-scale case studies and an inconsistent experimental setting. Furthermore, heuristic-based detectors suffer from limitations that hinder their adoption in practice. Thus, researchers have recently started experimenting with machine learning (ML) based code smell detection. </p><p>This paper compares the performance of multiple ML-based code smell detection models against multiple traditionally employed metric-based heuristics for detection of God Class and Long Method code smells. We evaluate the effectiveness of different source code representations for machine learning: traditionally used code metrics and code embeddings (code2vec, code2seq, and CuBERT).<br></p><p>We perform our experiments on the large-scale, manually labeled MLCQ dataset. We consider the binary classification problem – we classify the code samples as smelly or non-smelly and use the F1-measure of the minority (smell) class as a measure of performance. In our experiments, the ML classifier trained using CuBERT source code embeddings achieved the best performance for both God Class (F-measure of 0.53) and Long Method detection (F-measure of 0.75). With the help of a domain expert, we perform the error analysis to discuss the advantages of the CuBERT approach.<br></p><p>This study is the first to evaluate the effectiveness of pre-trained neural source code embeddings for code smell detection to the best of our knowledge. A secondary contribution of our study is the systematic evaluation of the effectiveness of multiple heuristic-based approaches on the same large-scale, manually labeled MLCQ dataset.<br></p>


2021 ◽  
Author(s):  
Aleksandar Kovačević ◽  
Jelena Slivka ◽  
Dragan Vidaković ◽  
Katarina-Glorija Grujić ◽  
Nikola Luburić ◽  
...  

<p>Code smells are structures in code that often have a negative impact on its quality. Manually detecting code smells is challenging and researchers proposed many automatic code smell detectors. Most of the studies propose detectors based on code metrics and heuristics. However, these studies have several limitations, including evaluating the detectors using small-scale case studies and an inconsistent experimental setting. Furthermore, heuristic-based detectors suffer from limitations that hinder their adoption in practice. Thus, researchers have recently started experimenting with machine learning (ML) based code smell detection. </p><p>This paper compares the performance of multiple ML-based code smell detection models against multiple traditionally employed metric-based heuristics for detection of God Class and Long Method code smells. We evaluate the effectiveness of different source code representations for machine learning: traditionally used code metrics and code embeddings (code2vec, code2seq, and CuBERT).<br></p><p>We perform our experiments on the large-scale, manually labeled MLCQ dataset. We consider the binary classification problem – we classify the code samples as smelly or non-smelly and use the F1-measure of the minority (smell) class as a measure of performance. In our experiments, the ML classifier trained using CuBERT source code embeddings achieved the best performance for both God Class (F-measure of 0.53) and Long Method detection (F-measure of 0.75). With the help of a domain expert, we perform the error analysis to discuss the advantages of the CuBERT approach.<br></p><p>This study is the first to evaluate the effectiveness of pre-trained neural source code embeddings for code smell detection to the best of our knowledge. A secondary contribution of our study is the systematic evaluation of the effectiveness of multiple heuristic-based approaches on the same large-scale, manually labeled MLCQ dataset.<br></p>


Author(s):  
Amandeep Kaur ◽  
Sushma Jain ◽  
Shivani Goel ◽  
Gaurav Dhiman

Context: Code smells are symptoms, that something may be wrong in software systems that can cause complications in maintaining software quality. In literature, there exists many code smells and their identification is far from trivial. Thus, several techniques have also been proposed to automate code smell detection in order to improve software quality. Objective: This paper presents an up-to-date review of simple and hybrid machine learning based code smell detection techniques and tools. Methods: We collected all the relevant research published in this field till 2020. We extracted the data from those articles and classified them into two major categories. In addition, we compared the selected studies based on several aspects like, code smells, machine learning techniques, datasets, programming languages used by datasets, dataset size, evaluation approach, and statistical testing. Results: Majority of empirical studies have proposed machine- learning based code smell detection tools. Support vector machine and decision tree algorithms are frequently used by the researchers. Along with this, a major proportion of research is conducted on Open Source Softwares (OSS) such as, Xerces, Gantt Project and ArgoUml. Furthermore, researchers paid more attention towards Feature Envy and Long Method code smells. Conclusion: We identified several areas of open research like, need of code smell detection techniques using hybrid approaches, need of validation employing industrial datasets, etc.


2017 ◽  
Vol 27 (09n10) ◽  
pp. 1529-1547 ◽  
Author(s):  
Zadia Codabux ◽  
Kazi Zakia Sultana ◽  
Byron J. Williams

It is important to maintain software quality as a software system evolves. Managing code smells in source code contributes towards quality software. While metrics have been used to pinpoint code smells in source code, we present an empirical study on the correlation of code smells with class-level (micro pattern) and method-level (nano-pattern) traceable code patterns. This study explores the relationship between code smells and class-level and method-level structural code constructs. We extracted micro patterns at the class level and nano-patterns at the method level from three versions of Apache Tomcat, three versions of Apache CXF and two J2EE web applications namely PersonalBlog and Roller from Stanford SecuriBench and then compared their distributions in code smell versus noncode smell classes and methods. We found that Immutable and Sink micro patterns are more frequent in classes having code smells compared to the noncode smell classes in the applications we analyzed. On the other hand, LocalReader and LocalWriter nano-patterns are more frequent in code smell methods compared to the noncode smell methods. We conclude that code smells are correlated with both micro and nano-patterns.


2013 ◽  
Vol 6 (1) ◽  
pp. 242-247 ◽  
Author(s):  
Amandeep Kaur ◽  
Himanshi Raperia

Software development is a field which is in action for decades. Preparing code for Software is not a difficult task, but preparing an efficient code is complicated one. To change the code is to make internal structure of the code easier to understand and economic to modify, without changing the behavior and desired response. More changes will make software patchy. No Software is free from smells especially the patchy one. Lots of work has been done for detecting and removing a few of the smells (Refactoring) from code. In this paper our main focus will be on tool SCSD (Software Code Smell Detector) developed, uses a bit classification, clustering approach with K-mean Clustering Algorithm to detect the code smells, which can implement completely different architecture if it discovers smell. 


2019 ◽  
Vol 18 (3) ◽  
pp. 742-766 ◽  
Author(s):  
Anna Kurtukova ◽  
Alexander Romanov

The paper is devoted to the analysis of the problem of determining the source code author , which is of interest to researchers in the field of information security, computer forensics, assessment of the quality of the educational process, protection of intellectual property. The paper presents a detailed analysis of modern solutions to the problem. The authors suggest two new identification techniques based on machine learning algorithms: support vector machine, fast correlation filter and informative features; the technique based on hybrid convolutional recurrent neural network. The experimental database includes samples of source codes written in Java, C ++, Python, PHP, JavaScript, C, C # and Ruby. The data was obtained using a web service for hosting IT-projects – Github. The total number of source codes exceeds 150 thousand samples. The average length of each of them is 850 characters. The case size is 542 authors. The experiments were conducted with source codes written in the most popular programming languages. Accuracy of the developed techniques for different numbers of authors was assessed using 10-fold cross-validation. An additional series of experiments was conducted with the number of authors from 2 to 50 for the most popular Java programming language. The graphs of the relationship between identification accuracy and case size are plotted. The analysis of result showed that the method based on hybrid neural network gives 97% accuracy, and it’s at the present time the best-known result. The technique based on the support vector machine made it possible to achieve 96% accuracy. The difference between the results of the hybrid neural network and the support vector machine was approximately 5%.


Entropy ◽  
2021 ◽  
Vol 23 (11) ◽  
pp. 1489
Author(s):  
Guangwu Hu ◽  
Bin Zhang ◽  
Xi Xiao ◽  
Weizhe Zhang ◽  
Long Liao ◽  
...  

Insecure applications (apps) are increasingly used to steal users’ location information for illegal purposes, which has aroused great concern in recent years. Although the existing methods, i.e., static and dynamic taint analysis, have shown great merit for identifying such apps, which mainly rely on statically analyzing source code or dynamically monitoring the location data flow, identification accuracy is still under research, since the analysis results contain a certain false positive or true negative rate. In order to improve the accuracy and reduce the misjudging rate in the process of vetting suspicious apps, this paper proposes SAMLDroid, a combined method of static code analysis and machine learning for identifying Android apps with location privacy leakage, which can effectively improve the identification rate compared with existing methods. SAMLDroid first uses static analysis to scrutinize source code to investigate apps with location acquiring intentions. Then it exploits a well-trained classifier and integrates an app’s multiple features to dynamically analyze the pattern and deliver the final verdict about the app’s property. Finally, it is proved by conducting experiments, that the accuracy rate of SAMLDroid is up to 98.4%, which is nearly 20% higher than Apparecium.


Author(s):  
Tran Thanh Luong ◽  
Le My Canh

JavaScript has become more and more popular in recent years because its wealthy features as being dynamic, interpreted and object-oriented with first-class functions. Furthermore, JavaScript is designed with event-driven and I/O non-blocking model that boosts the performance of overall application especially in the case of Node.js. To take advantage of these characteristics, many design patterns that implement asynchronous programming for JavaScript were proposed. However, choosing a right pattern and implementing a good asynchronous source code is a challenge and thus easily lead into less robust application and low quality source code. Extended from our previous works on exception handling code smells in JavaScript and exception handling code smells in JavaScript asynchronous programming with promise, this research aims at studying the impact of three JavaScript asynchronous programming patterns on quality of source code and application.


Author(s):  
M. Ilayaraja ◽  
S. Hemalatha ◽  
P. Manickam ◽  
K. Sathesh Kumar ◽  
K. Shankar

Cloud computing is characterized as the arrangement of assets or administrations accessible through the web to the clients on their request by cloud providers. It communicates everything as administrations over the web in view of the client request, for example operating system, organize equipment, storage, assets, and software. Nowadays, Intrusion Detection System (IDS) plays a powerful system, which deals with the influence of experts to get actions when the system is hacked under some intrusions. Most intrusion detection frameworks are created in light of machine learning strategies. Since the datasets, this utilized as a part of intrusion detection is Knowledge Discovery in Database (KDD). In this paper detect or classify the intruded data utilizing Machine Learning (ML) with the MapReduce model. The primary face considers Hadoop MapReduce model to reduce the extent of database ideal weight decided for reducer model and second stage utilizing Decision Tree (DT) classifier to detect the data. This DT classifier comprises utilizing an appropriate classifier to decide the class labels for the non-homogeneous leaf nodes. The decision tree fragment gives a coarse section profile while the leaf level classifier can give data about the qualities that influence the label inside a portion. From the proposed result accuracy for detection is 96.21% contrasted with existing classifiers, for example, Neural Network (NN), Naive Bayes (NB) and K Nearest Neighbor (KNN).


Author(s):  
Dhilsath Fathima.M ◽  
S. Justin Samuel ◽  
R. Hari Haran

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.


Sign in / Sign up

Export Citation Format

Share Document