Python Code Smell Refactoring Route Generation Based on Association Rule and Correlation

Author(s):  
Guanglei Wang ◽  
Junhua Chen ◽  
Jianhua Gao ◽  
Zijie Huang

Code smell is a software quality problem caused by software design flaws. Refactoring code smells can improve software maintainability. While prior works mostly focused on Java code smells, only a few prior researches detect and refactor code smells of Python. Therefore, we intend to outline a route (i.e. sequential refactoring operation) for refactoring Python code smells, including LC, LM, LMC, LPL, LSC, LBCL, LLF, MNC, CCC and LTCE. The route could instruct developers to save effort by refactoring the smell strongly correlated with other smells in advance. As a result, more smells could be resolved by a single refactoring. First, we reveal the co-occurrence and the inter-causation between smells. Then, we evaluate the smells’ correlation. Results highlight seven groups of smells with high co-occurrence. Meanwhile, 10 groups of smells correlate with each other in a significant level of Spearman’s correlation coefficient at 0.01. Finally, we generate the refactoring route based on the association rules, we exploit an empirical verification with 10 developers involved. The results of Kendall’s Tau show that the proposed refactoring route has a high inter-agreement with the developer’s perception. In conclusion, we propose four refactoring routes to provide guidance for practitioners, i.e. {LPL [Formula: see text] LLF}, {LPL [Formula: see text] LBCL}, {LPL [Formula: see text] LMC} and {LPL [Formula: see text] LM [Formula: see text] LC [Formula: see text] CCC [Formula: see text] MNC}.

Author(s):  
Amandeep Kaur ◽  
Sushma Jain ◽  
Shivani Goel ◽  
Gaurav Dhiman

Context: Code smells are symptoms, that something may be wrong in software systems that can cause complications in maintaining software quality. In literature, there exists many code smells and their identification is far from trivial. Thus, several techniques have also been proposed to automate code smell detection in order to improve software quality. Objective: This paper presents an up-to-date review of simple and hybrid machine learning based code smell detection techniques and tools. Methods: We collected all the relevant research published in this field till 2020. We extracted the data from those articles and classified them into two major categories. In addition, we compared the selected studies based on several aspects like, code smells, machine learning techniques, datasets, programming languages used by datasets, dataset size, evaluation approach, and statistical testing. Results: Majority of empirical studies have proposed machine- learning based code smell detection tools. Support vector machine and decision tree algorithms are frequently used by the researchers. Along with this, a major proportion of research is conducted on Open Source Softwares (OSS) such as, Xerces, Gantt Project and ArgoUml. Furthermore, researchers paid more attention towards Feature Envy and Long Method code smells. Conclusion: We identified several areas of open research like, need of code smell detection techniques using hybrid approaches, need of validation employing industrial datasets, etc.


Author(s):  
Ana Carla Bibiano ◽  
Alessandro Garcia

Up to 60% of the refactorings in software projects are constituted of a set of interrelated transformations, the so-called batches (or composite refactoring), rather than single transformations applied in isolation. However, a systematic characterization of batch characterization is missing, which hampers the elaboration of proper tooling support and empirical studies of how (batch) refactoring is applied in practice. This paper summarizes the research performed under the context of a Master's dissertation, which aimed at taming the aforementioned problems. To the best of our knowledge, our research is the first work published that provides a conceptual foundation, detection support and an large impact analysis of batch refactoring on code maintainability. To this end, we performed two complementary empirical studies as well as designed a first heuristic aimed at explicitly detecting batch refactorings. Our first study consisted of a literature review that synthesizes the otherwise scattered, partial conceptualization of batch refactoring mentioned in 29 studies with different purposes. We identified and defined seven batch characteristics such as the scope and typology of batches, plus seven types of batch effect on software maintainability, including code smell removal. All batches' characteristics and possible impacts were systematized in a conceptual framework, which assists, for instance, the proper design of batch refactoring studies and batch detection heuristics. We defined a new heuristic for batch detection, which made it possible to conduct a large study involving 4,607 batches discovered in 57 open and closed software projects. Amongst various findings, we reveal that most batches in practice occur entirely within one commit (93%), affect multiple methods (90%). Surprisingly, batches mostly end up introducing (51%) or not removing (38%) code smells. These findings contradict previous investigations limited to the impact analysis of each transformation in isolation. Our findings also enabled us to reveal beneficial or harmful patterns of batches that respectively induces the introduction or removal of certain code smells. These patterns: (i) were not previously documented even in Fowler's refactoring catalog, and (ii) provide concrete guidance for both researchers, tool designers, and practitioners.


2021 ◽  
Vol 13 (18) ◽  
pp. 10256
Author(s):  
Sara H. S. Almadi ◽  
Danial Hooshyar ◽  
Rodina Binti Ahmad

Gang of Four (GoF) design patterns are widely approved solutions for recurring software design problems, and their benefits to software quality are extensively studied. However, the occurrence of bad smells in design patterns increases the crisis of degenerating design patterns’ structure and behavior. Their occurrences are detrimental to the benefits of design patterns and they influence software sustainability by increasing maintenance costs and energy consumption. Despite the destructive roles of bad smells in such designs, there are an absence of studies systematically reviewing bad smells of GoF design patterns. This study systematically reviews a 10-year state of the art sample, identifying 16 studies investigating this phenomenon. Following a thorough evaluation of the full contents, we observed that the occurrence of bad smells have been investigated in proportion to four granularity levels of analysis: Design level, category level, pattern level, and role level. We identified 28 bad smells, categorized under code smells and grime symptoms, and emphasized their relationship with GoF pattern types and categories. The utilization of design pattern bad smell detection approaches and datasets were also discussed. Consequently, we observed that the research phenomenon is growing intensively, with a prominent focus of studies analyzing code smell occurrences rather than grime occurrences, at various granularity levels. Finally, we uncovered research gaps and areas with significant potentials for future research.


2018 ◽  
Vol 2018 ◽  
pp. 1-16 ◽  
Author(s):  
Zhicong Kou ◽  
Lifeng Xi

An effective data mining method to automatically extract association rules between manufacturing capabilities and product features from the available historical data is essential for an efficient and cost-effective product development and production. This paper proposes a new binary particle swarm optimization- (BPSO-) based association rule mining (BPSO-ARM) method for discovering the hidden relationships between machine capabilities and product features. In particular, BPSO-ARM does not need to predefine thresholds of minimum support and confidence, which improves its applicability in real-world industrial cases. Moreover, a novel overlapping measure indication is further proposed to eliminate those lower quality rules to further improve the applicability of BPSO-ARM. The effectiveness of BPSO-ARM is demonstrated on a benchmark case and an industrial case about the automotive part manufacturing. The performance comparison indicates that BPSO-ARM outperforms other regular methods (e.g., Apriori) for ARM. The experimental results indicate that BPSO-ARM is capable of discovering important association rules between machine capabilities and product features. This will help support planners and engineers for the new product design and manufacturing.


2005 ◽  
Vol 277-279 ◽  
pp. 287-292 ◽  
Author(s):  
Lu Na Byon ◽  
Jeong Hye Han

As electronic commerce progresses, temporal association rules are developed by time to offer personalized services for customer’s interests. In this article, we propose a temporal association rule and its discovering algorithm with exponential smoothing filter in a large transaction database. Through experimental results, we confirmed that this is more precise and consumes a shorter running time than existing temporal association rules.


2020 ◽  
Vol 36 (01) ◽  
pp. 43-46
Author(s):  
Simona Prokić
Keyword(s):  

Kod niskog kvaliteta sadrži strukture (code smells) koje otežavaju održavanje i dalji razvoj softvera. U ovom radu predstavljen je model zasnovan na mašinskom učenju za automatsku detekciju indikatora loše dizajniranog koda (code smell-ova) baziranu na istoriji promena koda. Ulaz modela su vrednosti metrika softverskog koda, izračunate u n revizija za posmatrani isečak koda. Izlaz iz modela je labela koja označava da li posmatrani isečak koda sadrži indikator loše dizajniranog koda ili ne. Studija slučaja izvršena je na detekciji klasa sa mnogo odgovornosti (God Class). Predloženi su koraci za poboljšanje i dalji razvoj arhitekture.


2021 ◽  
Author(s):  
Aleksandar Kovačević ◽  
Jelena Slivka ◽  
Dragan Vidaković ◽  
Katarina-Glorija Grujić ◽  
Nikola Luburić ◽  
...  

<p>Code smells are structures in code that often have a negative impact on its quality. Manually detecting code smells is challenging and researchers proposed many automatic code smell detectors. Most of the studies propose detectors based on code metrics and heuristics. However, these studies have several limitations, including evaluating the detectors using small-scale case studies and an inconsistent experimental setting. Furthermore, heuristic-based detectors suffer from limitations that hinder their adoption in practice. Thus, researchers have recently started experimenting with machine learning (ML) based code smell detection. </p><p>This paper compares the performance of multiple ML-based code smell detection models against multiple traditionally employed metric-based heuristics for detection of God Class and Long Method code smells. We evaluate the effectiveness of different source code representations for machine learning: traditionally used code metrics and code embeddings (code2vec, code2seq, and CuBERT).<br></p><p>We perform our experiments on the large-scale, manually labeled MLCQ dataset. We consider the binary classification problem – we classify the code samples as smelly or non-smelly and use the F1-measure of the minority (smell) class as a measure of performance. In our experiments, the ML classifier trained using CuBERT source code embeddings achieved the best performance for both God Class (F-measure of 0.53) and Long Method detection (F-measure of 0.75). With the help of a domain expert, we perform the error analysis to discuss the advantages of the CuBERT approach.<br></p><p>This study is the first to evaluate the effectiveness of pre-trained neural source code embeddings for code smell detection to the best of our knowledge. A secondary contribution of our study is the systematic evaluation of the effectiveness of multiple heuristic-based approaches on the same large-scale, manually labeled MLCQ dataset.<br></p>


2021 ◽  
Author(s):  
Luis Felipi Junionello ◽  
Rafael de Mello ◽  
Roberto Oliveira ◽  
Leonardo Sousa ◽  
Alexander López ◽  
...  

Identifying code smells is considered a subjective task. Unfortunately, current automated detection tools cannot deal with such subjectivity, requiring human validation. Developers tend to follow different, albeit complementary, strategies when validating the identified smells. Intending to find out developers' arguments when validating the incidence of code smells, we conducted a focus group session with developers familiar with identifying code smells. We distributed them among two groups, in which they had to argue about the incidence of a code smell: either accepting or rejecting its presence. Based on their arguments, we compiled a set of general heuristics that developers follow when validating smells. We then used these heuristics for composing validation items. We understand that the set of validation items proposed may support developers in reflecting on the incidence of code smells. However, further studies are needed for reaching a more comprehensive and optimized set. The experience of this study reveals that conducting focus group sessions is helpful to emerge the tacit knowledge of developers when validating code smells.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-14
Author(s):  
Hui Teng ◽  
Yukun Ma ◽  
Di Teng

Studying drug relationships can provide deeper information for the construction and maintenance of biomedical databases and provide more important references for disease treatment and drug development. The research model has expanded from the previous focus on a certain drug to the systematic analysis of the pharmaceutical network formed between drugs. Network model is suitable for the study of the nonlinear relationship of the pharmaceutical relationship by modeling the data learning. Association rule mining is used to find the potential correlations between the various sets of massive data. Therefore, based on the network model, this research proposed an algorithm for drug interaction under improved association rules, which achieved accurate analysis and decision-making of drug relationship. Meanwhile, this research applied the established association rule algorithm to discuss the relationship between Chinese medicine and mental illness medicine and conducted the algorithm research and simulation analysis of the association relationship. The results showed the association rule algorithm based on the network model constructed was better than other association algorithms. It had reliability and superiority in decision-making in improving the drug-drug relationship. It also promoted the rational use of medicines and played a guiding role in pharmaceutical research. This provides scientific research personnel with research basis and research ideas for disease-related diagnosis.


Sign in / Sign up

Export Citation Format

Share Document