The Connection between Antipatterns and Maintainability in Firefox

The notion that antipatterns have a detrimental effect on source code maintainability is widely accepted, but there is relatively little objective evidence to support it. We seek to investigate this issue by analyzing the connection between antipatterns and maintainability in an empirical study of Firefox, an open source browser application developed in C++. After extracting antipattern instances and maintainability information from 45 revisions, we looked for correlations to uncover a connection between the two concepts. We found statistically significant negative values for both Pearson and Spearman correlations, most of which were under -0.65. These values suggest there are strong, inverse relationships, thereby supporting our initial assumption that the more antipatterns the source code contains, the harder it is to maintain. Lastly, we combined these data into a table applicable for machine learning experiments, which we conducted using Weka [10] and several of its classifier algorithms. All five regression types we tried had correlation coefficients over 0.77 and used mostly negative weights for the antipattern predictors in the models we constructed. In conclusion, we can say that this empirical study is another step towards objectively demonstrating that antipatterns have an adverse effect on software maintainability.

Download Full-text

Effects of Source Code Regularity on Software Maintainability: An Empirical Study

Informatics ◽

10.2316/p.2010.725-063 ◽

2010 ◽

Author(s):

A. Ghazarian

Keyword(s):

Empirical Study ◽

Source Code ◽

Software Maintainability

Download Full-text

Empirical Study on Robustness of Machine Learning Approaches for Fault Diagnosis under Railway Operational Conditions

2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC) ◽

10.1109/itsc45102.2020.9294269 ◽

2020 ◽

Author(s):

Dachuan Shi ◽

Yunguang Ye ◽

Marco Gillwald ◽

Markus Hecht

Keyword(s):

Machine Learning ◽

Fault Diagnosis ◽

Empirical Study ◽

Learning Approaches ◽

Operational Conditions

Download Full-text

Predicted Number of Pregnant Women in Aichi Prefecture, Japan: Estimation by Machine Learning Database Construction for Disaster Preparation

Disaster Medicine and Public Health Preparedness ◽

10.1017/dmp.2020.417 ◽

2021 ◽

pp. 1-9

Author(s):

Kanetoshi Hattori ◽

Ritsuko Hattori

Keyword(s):

Machine Learning ◽

Pregnant Women ◽

Disaster Preparedness ◽

Correlation Coefficients ◽

Database Construction ◽

Aichi Prefecture ◽

Cascade Correlation ◽

Correlation Learning ◽

Disaster Preparation ◽

Area Data

Abstract Aichi prefecture, Japan is predicted to be hit by Mega-earthquake. Aichi Prefectural Association of Midwives has been making efforts to improve disaster preparedness for pregnant women. This project aims to acquire area data of pregnant women for simulated studies of rescue activities. Number of women in census survey areas in Nagoya City was acquired from nationwide data of pregnant women by machine learning (Cascade-Correlation Learning Architecture). Quite high correlation coefficients between actual data and estimation data were observed. Rescue simulations have been carried out based on the data acquired by this study.

Download Full-text

On the Co-evolution of ML Pipelines and Source Code - Empirical Study of DVC Projects

2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) ◽

10.1109/saner50967.2021.00046 ◽

2021 ◽

Author(s):

Amine Barrak ◽

Ellis E. Eghan ◽

Bram Adams

Keyword(s):

Empirical Study ◽

Source Code

Download Full-text

An Empirical Study of Refactorings and Technical Debt in Machine Learning Systems

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) ◽

10.1109/icse43902.2021.00033 ◽

2021 ◽

Author(s):

Yiming Tang ◽

Raffi Khatchadourian ◽

Mehdi Bagherzadeh ◽

Rhia Singh ◽

Ajani Stewart ◽

...

Keyword(s):

Machine Learning ◽

Empirical Study ◽

Learning Systems ◽

Technical Debt

Download Full-text

Class Imbalance Issue in Software Defect Prediction Models by various Machine Learning Techniques: An Empirical Study

10.1109/icscc51209.2021.9528170 ◽

2021 ◽

Author(s):

Sushant Kumar Pandey ◽

Anil Kumar Tripathi

Keyword(s):

Machine Learning ◽

Empirical Study ◽

Prediction Models ◽

Class Imbalance ◽

Machine Learning Techniques ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Learning Techniques ◽

Defect Prediction Models

Download Full-text

An Empirical Study to Detect Cyberbullying with TF-IDF and Machine Learning Algorithms

10.1109/icecit54077.2021.9641251 ◽

2021 ◽

Author(s):

Shagoto Rahman ◽

Kamrul Hasan Talukder ◽

Sabia Khatun Mithila

Keyword(s):

Machine Learning ◽

Empirical Study ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

Automatic detection of Long Method and God Class code smells through neural source code embeddings

10.36227/techrxiv.17206010.v1 ◽

2021 ◽

Author(s):

Aleksandar Kovačević ◽

Jelena Slivka ◽

Dragan Vidaković ◽

Katarina-Glorija Grujić ◽

Nikola Luburić ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Negative Impact ◽

Source Code ◽

Systematic Evaluation ◽

Small Scale ◽

Code Smells ◽

Code Metrics ◽

Code Smell ◽

F Measure

Code smells are structures in code that often have a negative impact on its quality. Manually detecting code smells is challenging and researchers proposed many automatic code smell detectors. Most of the studies propose detectors based on code metrics and heuristics. However, these studies have several limitations, including evaluating the detectors using small-scale case studies and an inconsistent experimental setting. Furthermore, heuristic-based detectors suffer from limitations that hinder their adoption in practice. Thus, researchers have recently started experimenting with machine learning (ML) based code smell detection. This paper compares the performance of multiple ML-based code smell detection models against multiple traditionally employed metric-based heuristics for detection of God Class and Long Method code smells. We evaluate the effectiveness of different source code representations for machine learning: traditionally used code metrics and code embeddings (code2vec, code2seq, and CuBERT). We perform our experiments on the large-scale, manually labeled MLCQ dataset. We consider the binary classification problem – we classify the code samples as smelly or non-smelly and use the F1-measure of the minority (smell) class as a measure of performance. In our experiments, the ML classifier trained using CuBERT source code embeddings achieved the best performance for both God Class (F-measure of 0.53) and Long Method detection (F-measure of 0.75). With the help of a domain expert, we perform the error analysis to discuss the advantages of the CuBERT approach. This study is the first to evaluate the effectiveness of pre-trained neural source code embeddings for code smell detection to the best of our knowledge. A secondary contribution of our study is the systematic evaluation of the effectiveness of multiple heuristic-based approaches on the same large-scale, manually labeled MLCQ dataset.

Download Full-text

Encoding Health Records into Pathway Representations for Deep Learning

10.3233/shti210800 ◽

2021 ◽

Author(s):

Marco Luca Sbodio ◽

Natasha Mulligan ◽

Stefanie Speichert ◽

Vanessa Lopez ◽

Joao Bettencourt-Silva

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Source Code ◽

Training Dataset ◽

Health Records ◽

Learning Tasks ◽

Patient Pathways ◽

Computational Resources ◽

The Impact

There is a growing trend in building deep learning patient representations from health records to obtain a comprehensive view of a patient’s data for machine learning tasks. This paper proposes a reproducible approach to generate patient pathways from health records and to transform them into a machine-processable image-like structure useful for deep learning tasks. Based on this approach, we generated over a million pathways from FAIR synthetic health records and used them to train a convolutional neural network. Our initial experiments show the accuracy of the CNN on a prediction task is comparable or better than other autoencoders trained on the same data, while requiring significantly less computational resources for training. We also assess the impact of the size of the training dataset on autoencoders performances. The source code for generating pathways from health records is provided as open source.

Download Full-text

AN EMPIRICAL STUDY ON MACHINE LEARNING ALGORITHM FOR PLANT DISEASE PREDICTION

Journal of Critical Reviews ◽

10.31838/jcr.07.05.125 ◽

2020 ◽

Vol 7 (05) ◽

Keyword(s):

Machine Learning ◽

Empirical Study ◽

Plant Disease ◽

Learning Algorithm ◽

Disease Prediction ◽

Machine Learning Algorithm

Download Full-text