prediction quality
Recently Published Documents


TOTAL DOCUMENTS

81
(FIVE YEARS 35)

H-INDEX

9
(FIVE YEARS 3)

2021 ◽  
Vol 3 ◽  
Author(s):  
Julia Granacher ◽  
Ivan Daniel Kantor ◽  
François Maréchal

Simulation-based optimization models are widely applied to find optimal operating conditions of processes. Often, computational challenges arise from model complexity, making the generation of reliable design solutions difficult. We propose an algorithm for replacing non-linear process simulation models integrated in multi-level optimization of a process and energy system superstructure with surrogate models, applying an active learning strategy to continuously enrich the database on which the surrogate models are trained and evaluated. Surrogate models are generated and trained on an initial data set, each featuring the ability to quantify the uncertainty with which a prediction is made. Until a defined prediction quality is met, new data points are continuously labeled and added to the training set. They are selected from a pool of unlabeled data points based on the predicted uncertainty, ensuring a rapid improvement of surrogate quality. When applied in the optimization superstructure, the surrogates can only be used when the prediction quality for the given data point reaches a specified threshold, otherwise the original simulation model is called for evaluating the process performance and the newly obtained data points are used to improve the surrogates. The method is tested on three simulation models, ranging in size and complexity. The proposed approach yields mean squared errors of the test prediction below 2% for all cases. Applying the active learning approach leads to better predictions compared to random sampling for the same size of database. When integrated in the optimization framework, simpler surrogates are favored in over 60% of cases, while the more complex ones are enabled by using simulation results generated during optimization for improving the surrogates after the initial generation. Significant time savings are recorded when using complex process simulations, though the advantage gained for simpler processes is marginal. Overall, we show that the proposed method saves time and adds flexibility to complex superstructure optimization problems that involve optimizing process operating conditions. Computational time can be greatly reduced without penalizing result quality, while the continuous improvement of surrogates when simulation is used in the optimization leads to a natural refinement of the model.


2021 ◽  
Vol 2070 (1) ◽  
pp. 012042
Author(s):  
Mykhailo Seleznov

Abstract The paper proposes an algorithm for forming a small training set, which will provide a reasonable quality of a surrogate ML-model for the problem of elastoplastic deformation of a metal rod under the action of a longitudinal load pulse. This dynamic physical problem is computationally simple and convenient for testing various approaches, but at the same time it is physically quite complex, because it contains a significant range of effects. So, the methods tested on this problem can be further applied to other areas. This work demonstrates the possibility of a surrogate ML-model to provide a reasonable prediction quality for a dynamic physical problem with a small training set size.


2021 ◽  
Vol 893 (1) ◽  
pp. 012028
Author(s):  
Robi Muharsyah ◽  
Dian Nur Ratri ◽  
Damiana Fitria Kussatiti

Abstract Prediction of Sea Surface Temperature (SST) in Niño3.4 region (170 W - 120 W; 5S - 5N) is important as a valuable indicator to identify El Niño Southern Oscillation (ENSO), i.e., El Niño, La Niña, and Neutral condition for coming months. More accurate prediction Niño3.4 SST can be used to determine the response of ENSO phenomenon to rainfall over Indonesia region. SST predictions are routinely released by meteorological institutions such as the European Center for Medium-Range Weather Forecasts (ECMWF). However, SST predictions from the direct output (RAW) of global models such as ECMWF seasonal forecast is suffering from bias that affects the poor quality of SST predictions. As a result, it also increases the potential errors in predicting the ENSO events. This study uses SST from the output Ensemble Prediction System (EPS) of ECMWF seasonal forecast, namely SEAS5. SEAS5 SST is downloaded from The Copernicus Climate Change Service (C3S) for period 1993-2020. One value representing SST over Niño3.4 region is calculated for each lead-time (LT), LT0-LT6. Bayesian Model Averaging (BMA) is selected as one of the post-processing methods to improve the prediction quality of SEAS5-RAW. The advantage of BMA over other post-processing methods is its ability to quantify the uncertainty in EPS, which is expressed as probability density function (PDF) predictive. It was found that the BMA calibration process reaches optimal performance using 160 months training window. The result show, prediction quality of Niño3.4 SST of BMA output is superior to SEAS5-RAW, especially for LT0, LT1, and LT2. In term deterministic prediction, BMA shows a lower Root Mean Square Error (RMSE), higher Proportion of Correct (PC). In term probabilistic prediction, the error rate of BMA, which is showed by the Brier Score is lower than RAW. Moreover, BMA shows a good ability to discriminating ENSO events which indicates by AUC ROC close to a perfect score.


Geophysics ◽  
2021 ◽  
pp. 1-52
Author(s):  
Ole Edvard Aaker ◽  
Adriana Citlali Ramírez ◽  
Emin Sadikhov

Incorrect imaging of internal multiples can lead to substantial imaging artefacts. It is estimatedthat the majority of seismic images available to exploration and production companies have had nodirect attempt at internal multiple removal. In Part I of this article we considered the role of spar-sity promoting transforms for improving practical prediction quality for algorithms derived fromthe inverse scattering series (ISS). Furthermore, we proposed a demigration-migration approach toperform multidimensional internal multiple prediction with migrated data and provided a syntheticproof of concept. In this paper (Part II) we consider application of the demigration-migration approach to field data from the Norwegian Sea, and provide a comparison to a post-stack method (froma previous related work). Beyond application to a wider range of data with the proposed approach,we consider algorithmic and implementational optimizations of the ISS prediction algorithms tofurther improve the applicability of the multidimensional formulations.


Author(s):  
Leonardo Augusto Coelho Ribeiro ◽  
Tiago Bresolin ◽  
Guilherme Jordão de Magalhães Rosa ◽  
Daniel Rume Casagrande ◽  
Marina de Arruda Camargo Danes ◽  
...  

Abstract Wearable sensors have been explored as an alternative for real-time monitoring of cattle feeding behavior in grazing systems. To evaluate the performance of predictive models such as machine learning (ML) techniques, data cross-validation (CV) approaches are often employed. However, due to data dependencies and confounding effects, poorly performed validation strategies may significantly inflate the prediction quality. In this context, our objective was to evaluate the effect of different CV strategies on the prediction of grazing activities in cattle using wearable sensor (accelerometer) data and ML algorithms. Six Nellore bulls (average live weight of 345 ± 21 kg) had their behavior visually classified as grazing or not-grazing for a period of 15 days. Elastic Net Generalized Linear Model (GLM), Random Forest (RF), and Artificial Neural Network (ANN) were employed to predict grazing activity (grazing or not-grazing) using 3-axis accelerometer data. For each analytical method, three CV strategies were evaluated: holdout, leave-one-animal-out (LOAO), and leave-one-day-out (LODO). Algorithms were trained using similar dataset sizes (holdout: n = 57,862; LOAO: n = 56,786; LODO: n = 56,672). Overall, GLM delivered the worst prediction accuracy (53%) compared to the ML techniques (65% for both RF and ANN), and ANN performed slightly better than RF for LOAO (73%) and LODO (64%) across CV strategies. The holdout yielded the highest nominal accuracy values for all three ML approaches (GLM: 59%, RF: 76%, and ANN: 74%), followed by LODO (GLM: 49%, RF: 61%, and ANN: 63%) and LOAO (GLM: 52%, RF: 57%, and ANN: 57%). With a larger dataset (i.e., more animals and grazing management scenarios), it is expected that accuracy could be increased. Most importantly, the greater prediction accuracy observed for holdout CV may simply indicate a lack of data independence and the presence of carry-over effects from animals and grazing management. Our results suggest that generalizing predictive models to unknown (not used for training) animals or grazing management may incur poor prediction quality. The results highlight the need for using management knowledge to define the validation strategy that is closer to the real-life situation, i.e., the intended application of the predictive model.


2021 ◽  
Author(s):  
Joon-Sang Park

Protein-peptide interactions are of great interest to the research community not only because they serve as mediators in many protein-protein interactions but also because of the increasing demand for peptide-based pharmaceutical products. Protein-peptide docking is a major tool for studying protein-peptide interactions, and several docking methods are currently available. Among various protein-peptide docking algorithms, template-based approaches, which utilize known protein-peptide complexes or templates to predict a new one, have been shown to yield more reliable results than template-free methods in recent comparative research. To obtain reliable results with a template-based docking method, the template database must be comprehensive enough; that is, there must be similar templates of protein-peptide complexes to the protein and peptide being investigated. Thus, the template database must be updated to leverage recent advances in structural biology. However, the template database distributed with GalaxyPepDock, one of the most widely used peptide docking programs, is outdated, limiting the prediction quality of the method. Here, we present an up-to-date protein-peptide complex database called YAPP-CD, which can be directly plugged into the GalaxyPepDock binary package to improve GalaxyPepDock's prediction quality by drawing on recent discoveries in structural biology. Experimental results show that YAPP-CD significantly improves GalaxyPepDock's prediction quality, e.g., the average Ligand/Interface RMSD of a benchmark set is reduced from 7.60 A/3.62 A to 3.47 A/1.71 A.


Author(s):  
Jayalath Ekanayake ◽  

Reported bugs of software systems are classified into different severity levels before fixing them. The number of bug reports may not be equally distributed according to the severity levels of bugs. However, most of the severity prediction models developed in the literature assumed that the underlying data distribution is evenly distributed, which may not correct at all instances and hence, the aim of this study is to develop bug classification models from unevenly distributed datasets and tested them accordingly. To that end first, the topics or keywords of developer descriptions of bug reports are extracted using Rapid Keyword Extraction (RAKE) algorithm and then transferred them into numerical attributes, which combined with severity levels constructs datasets. These datasets are used to build classification models; Naïve Bayes, Logistic Regression, and Decision Tree Learner algorithms. The models’ prediction quality is measured using Area Under Recursive Operative Characteristics Curves (AUC) as the models learnt from more skewed environments. According to the results, the prediction quality of the Logistics Regression model is 0.65 AUC whereas the other two models recorded maximum 0.60 AUC. Though the datasets contain comparatively less number of instances from the high severity classes; Blocking and High, the Logistic Regression models predict the two classes with a decent AUC value of 0.65 AUC. Hence, this projects shows that the models can be trained from highly skewed datasets so that the models prediction quality is equally well over all the classes regardless of number of instances representing the class. Further, this project emphasizes that the models should be evaluated using the appropriate metrics when the models are trained from imbalance learning environments. Also, this work uncovers that the Logistic Regression model is also capable of classifying documents as Naïve Bayes, which is well known for this task.


Sign in / Sign up

Export Citation Format

Share Document