Predicting unstable software benchmarks using static source code features

AbstractSoftware benchmarks are only as good as the performance measurements they yield. Unstable benchmarks show high variability among repeated measurements, which causes uncertainty about the actual performance and complicates reliable change assessment. However, if a benchmark is stable or unstable only becomes evident after it has been executed and its results are available. In this paper, we introduce a machine-learning-based approach to predict a benchmark’s stability without having to execute it. Our approach relies on 58 statically-computed source code features, extracted for benchmark code and code called by a benchmark, related to (1) meta information, e.g., lines of code (LOC), (2) programming language elements, e.g., conditionals or loops, and (3) potentially performance-impacting standard library calls, e.g., file and network input/output (I/O). To assess our approach’s effectiveness, we perform a large-scale experiment on 4,461 Go benchmarks coming from 230 open-source software (OSS) projects. First, we assess the prediction performance of our machine learning models using 11 binary classification algorithms. We find that Random Forest performs best with good prediction performance from 0.79 to 0.90, and 0.43 to 0.68, in terms of AUC and MCC, respectively. Second, we perform feature importance analyses for individual features and feature categories. We find that 7 features related to meta-information, slice usage, nested loops, and synchronization application programming interfaces (APIs) are individually important for good predictions; and that the combination of all features of the called source code is paramount for our model, while the combination of features of the benchmark itself is less important. Our results show that although benchmark stability is affected by more than just the source code, we can effectively utilize machine learning models to predict whether a benchmark will be stable or not ahead of execution. This enables spending precious testing time on reliable benchmarks, supporting developers to identify unstable benchmarks during development, allowing unstable benchmarks to be repeated more often, estimating stability in scenarios where repeated benchmark execution is infeasible or impossible, and warning developers if new benchmarks or existing benchmarks executed in new environments will be unstable.

Download Full-text

Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data

Water Research ◽

10.1016/j.watres.2019.115454 ◽

2020 ◽

Vol 171 ◽

pp. 115454 ◽

Cited By ~ 9

Author(s):

Kangyang Chen ◽

Hexia Chen ◽

Chuanlong Zhou ◽

Yichao Huang ◽

Xiangyang Qi ◽

...

Keyword(s):

Machine Learning ◽

Water Quality ◽

Big Data ◽

Surface Water Quality ◽

Prediction Performance ◽

Quality Prediction ◽

Learning Models ◽

Water Parameters ◽

Water Quality Prediction ◽

Machine Learning Models

Download Full-text

ML-CB: Machine Learning Canvas Block

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2021-0056 ◽

2021 ◽

Vol 2021 (3) ◽

pp. 453-473

Author(s):

Nathan Reitinger ◽

Michelle L. Mazurek

Keyword(s):

Machine Learning ◽

Supervised Learning ◽

Semantic Representation ◽

Source Code ◽

Online Privacy ◽

Learning Approach ◽

Learning Models ◽

One Step ◽

The Web ◽

Machine Learning Models

Abstract With the aim of increasing online privacy, we present a novel, machine-learning based approach to blocking one of the three main ways website visitors are tracked online—canvas fingerprinting. Because the act of canvas fingerprinting uses, at its core, a JavaScript program, and because many of these programs are reused across the web, we are able to fit several machine learning models around a semantic representation of a potentially offending program, achieving accurate and robust classifiers. Our supervised learning approach is trained on a dataset we created by scraping roughly half a million websites using a custom Google Chrome extension storing information related to the canvas. Classification leverages our key insight that the images drawn by canvas fingerprinting programs have a facially distinct appearance, allowing us to manually classify files based on the images drawn; we take this approach one step further and train our classifiers not on the malleable images themselves, but on the more-difficult-to-change, underlying source code generating the images. As a result, ML-CB allows for more accurate tracker blocking.

Download Full-text

Machine learning-based prediction model for responses of bDMARDs in patients with rheumatoid arthritis and ankylosing spondylitis

Arthritis Research & Therapy ◽

10.1186/s13075-021-02635-3 ◽

2021 ◽

Vol 23 (1) ◽

Author(s):

Seulkee Lee ◽

Seonyoung Kang ◽

Yeonghee Eun ◽

Hong-Hee Won ◽

Hyungjin Kim ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Machine Learning ◽

Logistic Regression ◽

Ankylosing Spondylitis ◽

Regression Model ◽

Logistic Regression Model ◽

Prediction Performance ◽

Learning Models ◽

Independent Test ◽

Machine Learning Models

Abstract Background Few studies on rheumatoid arthritis (RA) have generated machine learning models to predict biologic disease-modifying antirheumatic drugs (bDMARDs) responses; however, these studies included insufficient analysis on important features. Moreover, machine learning is yet to be used to predict bDMARD responses in ankylosing spondylitis (AS). Thus, in this study, machine learning was used to predict such responses in RA and AS patients. Methods Data were retrieved from the Korean College of Rheumatology Biologics therapy (KOBIO) registry. The number of RA and AS patients in the training dataset were 625 and 611, respectively. We prepared independent test datasets that did not participate in any process of generating machine learning models. Baseline clinical characteristics were used as input features. Responders were defined as those who met the ACR 20% improvement response criteria (ACR20) and ASAS 20% improvement response criteria (ASAS20) in RA and AS, respectively, at the first follow-up. Multiple machine learning methods, including random forest (RF-method), were used to generate models to predict bDMARD responses, and we compared them with the logistic regression model. Results The RF-method model had superior prediction performance to logistic regression model (accuracy: 0.726 [95% confidence interval (CI): 0.725–0.730] vs. 0.689 [0.606–0.717], area under curve (AUC) of the receiver operating characteristic curve (ROC) 0.638 [0.576–0.658] vs. 0.565 [0.493–0.605], F1 score 0.841 [0.837–0.843] vs. 0.803 [0.732–0.828], AUC of the precision-recall curve 0.808 [0.763–0.829] vs. 0.754 [0.714–0.789]) with independent test datasets in patients with RA. However, machine learning and logistic regression exhibited similar prediction performance in AS patients. Furthermore, the patient self-reporting scales, which are patient global assessment of disease activity (PtGA) in RA and Bath Ankylosing Spondylitis Functional Index (BASFI) in AS, were revealed as the most important features in both diseases. Conclusions RF-method exhibited superior prediction performance for responses of bDMARDs to a conventional statistical method, i.e., logistic regression, in RA patients. In contrast, despite the comparable size of the dataset, machine learning did not outperform in AS patients. The most important features of both diseases, according to feature importance analysis were patient self-reporting scales.

Download Full-text

Comparing the prediction performance of a Deep Learning Neural Network model with conventional machine learning models in landslide susceptibility assessment

CATENA ◽

10.1016/j.catena.2019.104426 ◽

2020 ◽

Vol 188 ◽

pp. 104426 ◽

Cited By ~ 14

Author(s):

Dieu Tien Bui ◽

Paraskevas Tsangaratos ◽

Viet-Tien Nguyen ◽

Ngo Van Liem ◽

Phan Trong Trinh

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Prediction Performance ◽

Susceptibility Assessment ◽

Learning Models ◽

Landslide Susceptibility Assessment ◽

Conventional Machine ◽

Deep Learning Neural Network ◽

Machine Learning Models

Download Full-text

Evaluation of Short-Term Freeway Speed Prediction Based on Periodic Analysis Using Statistical Models and Machine Learning Models

Journal of Advanced Transportation ◽

10.1155/2020/9628957 ◽

2020 ◽

Vol 2020 ◽

pp. 1-16 ◽

Cited By ~ 19

Author(s):

Xiaoxue Yang ◽

Yajie Zou ◽

Jinjun Tang ◽

Jian Liang ◽

Muhammad Ijaz

Keyword(s):

Machine Learning ◽

Statistical Models ◽

Prediction Performance ◽

Periodic Component ◽

Learning Approaches ◽

Learning Models ◽

Short Term ◽

Speed Prediction ◽

The Impact ◽

Machine Learning Models

Accurate prediction of traffic information (i.e., traffic flow, travel time, traffic speed, etc.) is a key component of Intelligent Transportation System (ITS). Traffic speed is an important indicator to evaluate traffic efficiency. Up to date, although a few studies have considered the periodic feature in traffic prediction, very few studies comprehensively evaluate the impact of periodic component on statistical and machine learning prediction models. This paper selects several representative statistical models and machine learning models to analyze the influence of periodic component on short-term speed prediction under different scenarios: (1) multi-horizon ahead prediction (5, 15, 30, 60 minutes ahead predictions), (2) with and without periodic component, (3) two data aggregation levels (5-minute and 15-minute), (4) peak hours and off-peak hours. Specifically, three statistical models (i.e., space time (ST) model, vector autoregressive (VAR) model, autoregressive integrated moving average (ARIMA) model) and three machine learning approaches (i.e., support vector machines (SVM) model, multi-layer perceptron (MLP) model, recurrent neural network (RNN) model) are developed and examined. Furthermore, the periodic features of the speed data are considered via a hybrid prediction method, which assumes that the data consist of two components: a periodic component and a residual component. The periodic component is described by a trigonometric regression function, and the residual component is modeled by the statistical models or the machine learning approaches. The important conclusions can be summarized as follows: (1) the multi-step ahead prediction accuracy improves when considering the periodic component of speed data for both three statistical models and three machine learning models, especially in the peak hours; (2) considering the impact of periodic component for all models, the prediction performance improvement gradually becomes larger as the time step increases; (3) under the same prediction horizon, the prediction performance of all models for 15-minute speed data is generally better than that for 5-minute speed data. Overall, the findings in this paper suggest that the proposed hybrid prediction approach is effective for both statistical and machine learning models in short-term speed prediction.

Download Full-text

A novel approach to ensemble MLP and random forest for network security

ITM Web of Conferences ◽

10.1051/itmconf/20203203003 ◽

2020 ◽

Vol 32 ◽

pp. 03003

Author(s):

Bhushan Deore ◽

Aditya Kyatham ◽

Shubham Narkhede

Keyword(s):

Machine Learning ◽

Random Forest ◽

Network Security ◽

Multiple Models ◽

Testing Time ◽

Learning Models ◽

The Third ◽

Novel Approach ◽

Management Concept ◽

Machine Learning Models

The following paper provides a novel approach for Network Intrusion Detection System using Machine Learning and Deep Learning. This approach uses two MLP (Multi-Layer Perceptron) models one having 3 layers and other having 6 layers. Random Forest is also used for classification. These models are ensembled in such a way that the final accuracy is boosted and also the testing time is reduced. Researchers have implemented various ways for the ensemble of multiple models but we are using contradiction management concept to ensemble machine learning models. Contradiction Management concept means if two machine learning models are contradicting in their decisions (in our case 3-layer MLP and Random Forest), then the third model’s (6-layer MLP) decision is considered whose accuracy is higher than the previous models. The third model is only used for testing when the previous two models contradict in their decision because the testing time of third model is higher than the two previous models as the third model has complex architecture. This approach increased the final accuracy as ensemble of multiple models is done and also testing time has reduced. The novelty of this paper is the choice and the combination of the models for the purpose of Network security.

Download Full-text

Performance Comparison of Machine Learning Models for Annual Precipitation Prediction Using Different Decomposition Methods

Remote Sensing ◽

10.3390/rs13051018 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1018

Author(s):

Chao Song ◽

Xiaohong Chen

Keyword(s):

Machine Learning ◽

Wavelet Transform ◽

Empirical Mode Decomposition ◽

Prediction Accuracy ◽

Decomposition Methods ◽

Prediction Performance ◽

Learning Models ◽

Precipitation Prediction ◽

Mode Decomposition ◽

Machine Learning Models

It has become increasingly difficult in recent years to predict precipitation scientifically and accurately due to the dual effects of human activities and climatic conditions. This paper focuses on four aspects to improve precipitation prediction accuracy. Five decomposition methods (time-varying filter-based empirical mode decomposition (TVF-EMD), robust empirical mode decomposition (REMD), complementary ensemble empirical mode decomposition (CEEMD), wavelet transform (WT), and extreme-point symmetric mode decomposition (ESMD) combined with the Elman neural network (ENN)) are used to construct five prediction models, i.e., TVF-EMD-ENN, REMD-ENN, CEEMD-ENN, WT-ENN, and ESMD-ENN. The variance contribution rate (VCR) and Pearson correlation coefficient (PCC) are utilized to compare the performances of the five decomposition methods. The wavelet transform coherence (WTC) is used to determine the reason for the poor prediction performance of machine learning algorithms in individual years and the relationship with climate indicators. A secondary decomposition of the TVF-EMD is used to improve the prediction accuracy of the models. The proposed methods are used to predict the annual precipitation in Guangzhou. The subcomponents obtained from the TVF-EMD are the most stable among the four decomposition methods, and the North Atlantic Oscillation (NAO) index, the Nino 3.4 index, and sunspots have a smaller influence on the first subcomponent (Sc-1) than the other subcomponents. The TVF-EMD-ENN model has the best prediction performance and outperforms traditional machine learning models. The secondary decomposition of the Sc-1 of the TVF-EMD model significantly improves the prediction accuracy.

Download Full-text

Machine Learning Models for COVID-19 Confirmed Cases Prediction: A Meta-Analysis Approach

Journal of Physics Conference Series ◽

10.1088/1742-6596/2084/1/012013 ◽

2021 ◽

Vol 2084 (1) ◽

pp. 012013

Author(s):

Wan Fairos Wan Yaacob ◽

Norafefah Mohamad Sobri ◽

Syerina Azlin Md Nasir ◽

Noor Ilanie Nordin ◽

Wan Faizah Wan Yaacob ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Confidence Interval ◽

Meta Analysis ◽

Random Effect Model ◽

Good Prediction ◽

Learning Models ◽

Chi Square ◽

Heterogeneity Index ◽

Machine Learning Models

Abstract COVID-19, CoronaVirus Disease – 2019, belongs to the genus of Coronaviridae. COVID-19 is no longer pandemic but rather endemic with the number of deaths around the world of more than 3,166,516 cases. This reality has placed a massive burden on limited healthcare systems. Thus, many researchers try to develop a prediction model to further understand this phenomenon. One of the recent methods used is machine learning models that learn from the historical data and make predictions about the events. These data mining techniques have been used to predict the number of confirmed cases of COVID-19. This paper investigated the variability of the effect size on the correlation performance of machine learning models in predicting confirmed cases of COVID-19 using meta-analysis. It explored the correlation between actual and predicted COVID-19 cases from different Neural Network machine learning models by means of estimated variance, chi-square heterogeneity (Q), heterogeneity index (I2) and random effect model. The results gave a good summary effect of 95% confidence interval. Based on chi-square heterogeneity (Q) and heterogeneity index (I2), it was found that the correlations were heterogeneous among the studies. The 95% confidence interval of effect summary also supported the difference in correlation between actual and predicted number of confirmed COVID-19 cases among the studies. There was no evidence of publication bias based on funnel plot and Egger and Begg’s test. Hence, findings from this study provide evidence of good prediction performance from the Neural Network model based on a combination of studies that can later serve in the prediction of COVID-19 confirmed cases.

Download Full-text

Machine Learning Based Prediction of Complex Bugs in Source Code

The International Arab Journal of Information Technology ◽

10.34028/iajit/17/1/4 ◽

2019 ◽

pp. 26-37

Author(s):

Ishrat-Un-Nisa Uqaili ◽

Syed Nadeem Ahsan

Keyword(s):

Machine Learning ◽

Software Metrics ◽

Source Code ◽

Supervised Machine Learning ◽

Learning Models ◽

Software Module ◽

Different Types ◽

Software Modules ◽

Highly Correlated ◽

Machine Learning Models

During software development and maintenance phases, the fixing of severe bugs are mostly very challenging and needs more efforts to fix them on a priority basis. Several research works have been performed using software metrics and predict fault-prone software module. In this paper, we propose an approach to categorize different types of bugs according to their severity and priority basis and then use them to label software metrics’ data. Finally, we used labeled data to train the supervised machine learning models for the prediction of fault prone software modules. Moreover, to build an effective prediction model, we used genetic algorithm to search those sets of metrics which are highly correlated with severe bugs.

Download Full-text

Evidence of Inflated Prediction Performance: A Commentary on Machine Learning and Suicide Research

Clinical Psychological Science ◽

10.1177/2167702620954216 ◽

2021 ◽

Vol 9 (1) ◽

pp. 129-134 ◽

Cited By ~ 1

Author(s):

Ross Jacobucci ◽

Andrew K. Littlefield ◽

Alexander J. Millner ◽

Evan M. Kleiman ◽

Douglas Steinley

Keyword(s):

Machine Learning ◽

Random Forests ◽

Linear Models ◽

Prediction Performance ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

Learning Models ◽

Internal Validation ◽

Flexible Machine ◽

Machine Learning Models

The use of machine learning is increasing in clinical psychology, yet it is unclear whether these approaches enhance the prediction of clinical outcomes. Several studies show that machine-learning algorithms outperform traditional linear models. However, many studies that have found such an advantage use the same algorithm, random forests with the optimism-corrected bootstrap, for internal validation. Through both a simulation and empirical example, we demonstrate that the pairing of nonlinear, flexible machine-learning approaches, such as random forests with the optimism-corrected bootstrap, provide highly inflated prediction estimates. We find no advantage for properly validated machine-learning models over linear models.

Download Full-text