Increasing the Prediction Quality of Software Defective Modules with Automatic Feature Engineering

Author(s):  
Alexandre Moreira Nascimento ◽  
Vinícius Veloso de Melo ◽  
Luiz Alberto Vieira Dias ◽  
Adilson Marques da Cunha
2021 ◽  
Vol 893 (1) ◽  
pp. 012028
Author(s):  
Robi Muharsyah ◽  
Dian Nur Ratri ◽  
Damiana Fitria Kussatiti

Abstract Prediction of Sea Surface Temperature (SST) in Niño3.4 region (170 W - 120 W; 5S - 5N) is important as a valuable indicator to identify El Niño Southern Oscillation (ENSO), i.e., El Niño, La Niña, and Neutral condition for coming months. More accurate prediction Niño3.4 SST can be used to determine the response of ENSO phenomenon to rainfall over Indonesia region. SST predictions are routinely released by meteorological institutions such as the European Center for Medium-Range Weather Forecasts (ECMWF). However, SST predictions from the direct output (RAW) of global models such as ECMWF seasonal forecast is suffering from bias that affects the poor quality of SST predictions. As a result, it also increases the potential errors in predicting the ENSO events. This study uses SST from the output Ensemble Prediction System (EPS) of ECMWF seasonal forecast, namely SEAS5. SEAS5 SST is downloaded from The Copernicus Climate Change Service (C3S) for period 1993-2020. One value representing SST over Niño3.4 region is calculated for each lead-time (LT), LT0-LT6. Bayesian Model Averaging (BMA) is selected as one of the post-processing methods to improve the prediction quality of SEAS5-RAW. The advantage of BMA over other post-processing methods is its ability to quantify the uncertainty in EPS, which is expressed as probability density function (PDF) predictive. It was found that the BMA calibration process reaches optimal performance using 160 months training window. The result show, prediction quality of Niño3.4 SST of BMA output is superior to SEAS5-RAW, especially for LT0, LT1, and LT2. In term deterministic prediction, BMA shows a lower Root Mean Square Error (RMSE), higher Proportion of Correct (PC). In term probabilistic prediction, the error rate of BMA, which is showed by the Brier Score is lower than RAW. Moreover, BMA shows a good ability to discriminating ENSO events which indicates by AUC ROC close to a perfect score.


2021 ◽  
Vol 2070 (1) ◽  
pp. 012042
Author(s):  
Mykhailo Seleznov

Abstract The paper proposes an algorithm for forming a small training set, which will provide a reasonable quality of a surrogate ML-model for the problem of elastoplastic deformation of a metal rod under the action of a longitudinal load pulse. This dynamic physical problem is computationally simple and convenient for testing various approaches, but at the same time it is physically quite complex, because it contains a significant range of effects. So, the methods tested on this problem can be further applied to other areas. This work demonstrates the possibility of a surrogate ML-model to provide a reasonable prediction quality for a dynamic physical problem with a small training set size.


Mathematics ◽  
2020 ◽  
Vol 8 (5) ◽  
pp. 662 ◽  
Author(s):  
Husein Perez ◽  
Joseph H. M. Tah

In the field of supervised machine learning, the quality of a classifier model is directly correlated with the quality of the data that is used to train the model. The presence of unwanted outliers in the data could significantly reduce the accuracy of a model or, even worse, result in a biased model leading to an inaccurate classification. Identifying the presence of outliers and eliminating them is, therefore, crucial for building good quality training datasets. Pre-processing procedures for dealing with missing and outlier data, commonly known as feature engineering, are standard practice in machine learning problems. They help to make better assumptions about the data and also prepare datasets in a way that best expose the underlying problem to the machine learning algorithms. In this work, we propose a multistage method for detecting and removing outliers in high-dimensional data. Our proposed method is based on utilising a technique called t-distributed stochastic neighbour embedding (t-SNE) to reduce high-dimensional map of features into a lower, two-dimensional, probability density distribution and then use a simple descriptive statistical method called interquartile range (IQR) to identifying any outlier values from the density distribution of the features. t-SNE is a machine learning algorithm and a nonlinear dimensionality reduction technique well-suited for embedding high-dimensional data for visualisation in a low-dimensional space of two or three dimensions. We applied this method on a dataset containing images for training a convolutional neural network model (ConvNet) for an image classification problem. The dataset contains four different classes of images: three classes contain defects in construction (mould, stain, and paint deterioration) and a no-defect class (normal). We used the transfer learning technique to modify a pre-trained VGG-16 model. We used this model as a feature extractor and as a benchmark to evaluate our method. We have shown that, when using this method, we can identify and remove the outlier images in the dataset. After removing the outlier images from the dataset and re-training the VGG-16 model, the results have also shown that the accuracy of the classification has significantly improved and the number of misclassified cases has also dropped. While many feature engineering techniques for handling missing and outlier data are common in predictive machine learning problems involving numerical or categorical data, there is little work on developing techniques for handling outliers in high-dimensional data which can be used to improve the quality of machine learning problems involving images such as ConvNet models for image classification and object detection problems.


2020 ◽  
Vol 35 (5) ◽  
pp. 1871-1889
Author(s):  
M. S. Alvarez ◽  
C. A. S. Coelho ◽  
M. Osman ◽  
M. Â. F. Firpo ◽  
C. S. Vera

AbstractThe demand of subseasonal predictions (from one to about four weeks in advance) has been considerably increasing as these predictions can potentially help prepare for the occurrence of high-impact events such as heat or cold waves that affect both social and economic activities. This study aims to assess the subseasonal temperature prediction quality of the European Centre for Medium-Range Weather Forecasts (ECMWF) against the Japan Meteorological Agency reanalyses. Two consecutive weeks of July 2017 were analyzed, which presented anomalously cold and warm conditions over central South America. The quality of 20 years of hindcasts for the two investigated weeks was compared to that for similar weeks during the JJA season and of 3 years of real-time forecasts for the same season. Anomalously cold temperatures observed during the week of 17–23 July 2017 were well predicted one week in advance. Moreover, the warm anomalies observed during the following week of 24–30 July 2017 were well predicted two weeks in advance. Higher linear association and discrimination (ability to distinguish events from nonevents), but reduced reliability, was found for the 20 years of hindcasts for the target week than for the hindcasts produced for all of the JJA season. In addition, the real-time forecasts showed generally better performance over some regions of South America than the hindcasts. The assessment provides robust evidence about temperature prediction quality to build confidence in regional subseasonal forecasts as well as to identify regions in which the predictions have better performance.


Author(s):  
Thomas D. Krüger ◽  
Sauro Liberatore ◽  
Eric Knopf ◽  
Alastair Clark

In rotordynamic analyses, support structures are commonly represented by lumped mass systems (single-degree-of-freedom, SDOF). This representation is easy to implement using standard rotordynamic tools. However, in reality the dynamic behaviour of the support structure (e.g. pedestals, casings, foundations) are in general much more complex. Only a multi-degree-of-freedom (MDOF) representation provides modelling close to reality. For many applications the dynamic behaviour of the support structure significantly influences the rotordynamic characteristics of the shaft train and therefore needs to be included in the assessment. Due to this impact, a good quality of the dynamic model used for the support structure is imperative. Regarding the rotor itself, the modelling is well understood and the prediction quality is excellent, not least due to the jointless welded rotor design. Numerous theoretical approaches exist for considering the complex dynamic behaviour of the support structure, all coming along with both drawbacks and opportunities. By discussing the characteristics of established approaches for modelling the support structure, the paper particularly presents an advanced theoretical approach based on a state-space representation using modal parameters. A case study of a real shaft train is shown, including a comparison of achieved results using the SDOF and the presented MDOF approach. By validating with experimental results, the excellent prediction quality of the MDOF approach is confirmed. The implementation of this approach enabled to further improve the reliability and the efficiency, which means high accuracy combined with low computation time, in performing rotordynamic assessments.


Author(s):  
Yuanqing Tang ◽  
Zhi Li ◽  
Mansoor Ani Najeeb Nellikkal ◽  
Hamed Eramian ◽  
Emory M. Chan ◽  
...  

2019 ◽  
Author(s):  
Emir Efendic ◽  
Philippe van de Calseyde ◽  
Anthony M Evans

Algorithms consistently perform well on various prediction tasks, but people often mistrust their advice. Here, we demonstrate one component that affects people’s trust in algorithmic predictions: response time. In seven studies (total N = 1928 with 14,184 observations), we find that people judge slowly generated predictions from algorithms as less accurate and they are less willing to rely on them. This effect reverses for human predictions, where slowly generated predictions are judged to be more accurate. In explaining this asymmetry, we find that slower response times signal the exertion of effort for both humans and algorithms. However, the relationship between perceived effort and prediction quality differs for humans and algorithms. For humans, prediction tasks are seen as difficult and effort is therefore positively correlated with the perceived quality of predictions. For algorithms, however, prediction tasks are seen as easy and effort is therefore uncorrelated to the quality of algorithmic predictions. These results underscore the complex processes and dynamics underlying people’s trust in algorithmic (and human) predictions and the cues that people use to evaluate their quality.


Sign in / Sign up

Export Citation Format

Share Document