Automatic selection of reliability estimates for individual regression predictions

2010 ◽  
Vol 25 (1) ◽  
pp. 27-47 ◽  
Author(s):  
Zoran Bosnić ◽  
Igor Kononenko

AbstractIn machine learning and its risk-sensitive applications (e.g. medicine, engineering, business), the reliability estimates for individual predictions provide more information about the individual prediction error (the difference between the true label and regression prediction) than the average accuracy of predictive model (e.g. relative mean squared error). Furthermore, they enable the users to distinguish between more and less reliable predictions. The empirical evaluations of the existing individual reliability estimates revealed that the successful estimates’ performance depends on the used regression model and on the particular problem domain. In the current paper, we focus on that problem as such and propose and empirically evaluate two approaches for automatic selection of the most appropriate estimate for a given domain and regression model: the internal cross-validation approach and the meta-learning approach. The testing results of both approaches demonstrated an advantage in the performance of dynamically chosen reliability estimates to the performance of the individual reliability estimates. The best results were achieved using the internal cross-validation procedure, where reliability estimates significantly positively correlated with the prediction error in 73% of experiments. In addition, the preliminary testing of the proposed methodology on a medical domain demonstrated the potential for its usage in practice.

Author(s):  
A. Shchebel

The potential of the enterprise may have a number of components that are heterogeneous in their economic and managerial nature. This requires the selection of criteria that would be common to assess all components of capacity. From the standpoint of the resource approach, such criteria could be the cost of resources to build the potential of the enterprise, as well as the value created by using existing capacity. This criterion is easily consistent with the goals of formation and implementation of the potential of the enterprise, and therefore can have a quantitative and temporal dimension of achievement. The cost of resources is a cost measurement of the criterion. In turn, time, on the one hand, is one of the dimensions of the selected criterion, and, on the other hand, a separate criterion. After all, the same result, which is obtained for different periods of time, usually has a different assessment. It is substantiated that the assessment of the rationality of enterprise capacity management should be carried out on the basis of comparing the cost of resources that were involved in the formation of potential with the value created as a result of its use. It is proved that the significance of the difference between these values depends on the time factor. Reducing the analyzed period and increasing the difference between the studied values increases the rationality of management. Applying the provisions of the theory of neural networks, a regression model is constructed, which assumes the use of a recurrent function. This ensured the accuracy of forecasting the resulting parameters and increased the informativeness and objectivity of the proposed method.


Author(s):  
Zoran Bosnic ◽  
Igor Kononenko

In machine learning, the reliability estimates for individual predictions provide more information about individual prediction error than the average accuracy of predictive model (e.g. relative mean squared error). Such reliability estimates may represent decisive information in the risk-sensitive applications of machine learning (e.g. medicine, engineering, and business), where they enable the users to distinguish between more and less reliable predictions. In the atuhors’ previous work they proposed eight reliability estimates for individual examples in regression and evaluated their performance. The results showed that the performance of each estimate strongly varies depending on the domain and regression model properties. In this paper they empirically analyze the dependence of reliability estimates’ performance on the data set and model properties. They present the results which show that the reliability estimates perform better when used with more accurate regression models, in domains with greater number of examples and in domains with less noisy data.


Author(s):  
Syafruddin Side ◽  
Wahidah Sanusi ◽  
Mustati'atul Waidah Maksum

Abstrak. Regresi semiparametrik merupakan model regresi yang memuat komponen parametrik dan komponen nonparametrik dalam suatu model. Pada penelitian ini digunakan model regresi semiparametrik spline untuk data longitudinal dengan studi kasus penderita Demam Berdarah Dengue (DBD) di Rumah Sakit Universitas Hasanuddin Makassar periode bulan  Januari sampai bulan Maret 2018. Estimasi model regresi terbaik didapat dari pemilihan titik knot optimal dengan melihat nilai Generalized Cross Validation (GCV) dan Mean Square Error (MSE) yang minimum. Komponen parametrik pada penelitian ini adalah hemoglobin (g/dL) dan umur (tahun), suhu tubuh ( ), trombosit ( ) sebagai komponen nonparametrik dengan nilai GCV minimum sebesar 221,67745153 dicapai pada titik knot yaitu 14,552; 14,987; dan 15,096; nilai MSE sebesar 199,1032; dan nilai koefisien determinasi sebesar 75,3% yang diperoleh dari model regresi semiparametrik spline linear dengan tiga titik knot..Kata Kunci: regresi semiparametrik, spline, knot, Generalized Cross Validation, Demam Berdarah Dengue.Abstract. Semiparametric regression is a regression model that includes parametric and nonparametric components in it. The regression model in this research is spline semiparametric regression with case studies of patients with Dengue Hemorrahagic Fever (DHF) at University of Hasanuddin Makassar Hospital during the period of January to March 2018. The best regression model estimation is obtained from the selection of optimal knot which has minimum Generalized Cross Validation (GCV) and Mean Square Error (MSE). Parametric component in this research is hemoglobin (g/dL) and age (years), body temperature ( ), platelets ( ) as a nonparametric components. The minimum value of GCV is 221,67745153 achieved at the point 14,552; 14,987; and 15,096 knot; MSE value of 199,1032; and the value of coefficient determination is 75,3% obtained from semiparametric regression model linear spline with third point of knots.Keywords: semiparametric regression, spline, knot, Generalized Cross Validation, Dengue Hemorrahagic Fever.


2021 ◽  
Author(s):  
Achim Langenbucher ◽  
Nóra Szentmáry ◽  
Alan Cayless ◽  
Michael Müller ◽  
Timo Eppig ◽  
...  

Purpose: To present strategies for optimization of lens power formula constants and to show options how to present the results adequately. Methods: A dataset of N=1601 preoperative biometric values, lens power data and postoperative refraction data was split into a training set and a test set using a random sequence. Based on the training set we calculated the formula constants for established lens calculation formulae with different methods. Based on the test set we derived the formula prediction error as difference of the achieved refraction from the formula predicted refraction. Results: For formulae with 1 constant it is possible to back-calculate the individual constant for each case using formula inversion. However, this is not possible for formulae with more than 1 constant. In these cases, more advanced concepts such as nonlinear optimization strategies are necessary to derive the formula constants. During cross-validation, measures such as the mean absolute or the root mean squared prediction error or the ratio of cases within mean absolute prediction error limits could be used as quality measures. Conclusions: Different constant optimization concepts yield different results. To test the performance of optimized formula constants a cross-validation strategy is mandatory. We recommend performance curves, where the ratio of cases within absolute prediction error limits is plotted against the mean absolute prediction error.


2020 ◽  
Author(s):  
Morio YAMAUCHI ◽  
Kazuhisa NAKANO ◽  
Yoshiya TANAKA ◽  
Keiichi HORIO

In this article, we implemented a regression model and conducted experiments for predicting disease activity using data from 1929 rheumatoid arthritis patients to assist in the selection of biologics for rheumatoid arthritis. On modelling, the missing variables in the data were completed by three different methods, mean value, self-organizing map and random value. Experimental results showed that the prediction error of the regression model was large regardless of the missing completion method, making it difficult to predict the prognosis of rheumatoid arthritis patients.


Author(s):  
Carlos Alberto Huaira Contreras ◽  
Carlos Cristiano Hasenclever Borges ◽  
Camila Borelli Zeller ◽  
Amanda Romanelli

The paper proposes a weighted cross-validation (WCV) algorithm  to select a linear regression model with change-point under a scale mixtures of normal (SMN) distribution that yields the best prediction results. SMN distributions are used to construct robust regression models to the influence of outliers on the parameter estimation process. Thus, we relaxed the usual assumption of normality of the regression models and considered that the random errors follow a SMN distribution, specifically the Student-t distribution. In addition, we consider the fact that the parameters of the regression model can change from a specific and unknown point, called change-point. In this context, the estimations of the model parameters, which include the change-point, are obtained via the EM-type algorithm (Expectation-Maximization). The WCV method is used in the selection of the model that presents greater robustness and that offers a smaller prediction error, considering that the weighting values come from step E of the EM-type algorithm. Finally, numerical examples considering simulated and real data (data from television audiences) are presented to illustrate the proposed methodology.


Energies ◽  
2020 ◽  
Vol 13 (17) ◽  
pp. 4474 ◽  
Author(s):  
Dominik Mamcarz ◽  
Paweł Albrechtowicz ◽  
Natalia Radwan-Pragłowska ◽  
Bartosz Rozegnał

This paper presents an analysis of the short-circuit currents of a synchronous generator with a rated power of 16 kVA. For this purpose, the authors carried out measurements of real short-circuit currents during laboratory tests. Additionally, a simulation model of the generator was developed according to the individual machines data from the catalog and field calculations in ANSYS Maxwell software. Based on the mentioned research, the authors compared waveforms of the symmetrical short-circuit currents. In this paper, the last compared family of short-circuit current waveforms was obtained using analytic calculations. As the presented comparison shows, the assumed method of selecting short-circuit waveforms impacted their values. However, the difference in energy related to short-circuit currents did not influence on the selection of the short-circuit protections, especially at low values of steady-state short-circuit currents and short time constants characteristic for the generators, performing the functions of an alternative power supply. Regardless of the research method, the results presented in the article show that the selection of the short-circuit protections is complicated in case of hybrid electrical systems equipped with low power generators.


2017 ◽  
Vol 6 (1) ◽  
pp. 65
Author(s):  
NI WAYAN MERRY NIRMALA YANI ◽  
I GUSTI AYU MADE SRINADI ◽  
I WAYAN SUMARJAYA

Semiparametric regression is a regression model that includes parametric components and nonparametric components in a model. The regression model in this research is truncated spline semiparametric regression with case studies of patients with Dengue Hemorrhagic Fever (DHF) at Puri Raharja Hospital during the period of January to March 2015. The best regression model estimation is obtained from the selection of optimal knots which has minimum Generalized Cross Validation (GCV) is. Parametric components in this research include age (years), body temperature (0C), platelets and hematocrit (%) as a nonparametric component. The minimum value of GCV is 0.03552045 achieved at the point of 39.6 knots, MSE value of 0.0296922; and the value of coefficient determination is 98.91%, obtained from semiparametric regression model truncated linear spline (order 2) with a single point of knots.


2019 ◽  
Vol 1 (1) ◽  
pp. 11
Author(s):  
Bidayani Bidayani ◽  
Mustika Hadijati ◽  
Nurul Fitriyani

This study was conducted with the aim of determining the semiparametric spline regression model in the analysis of factors that influence rice production in East Lombok District in 2014 and finding out what factors influence the rice production results. The method used was semiparametric spline regression, with the selection of the optimum knot points using Generalized Cross Validation. The results obtained indicate that the variable that significantly affects rice production was the height of the area above sea level, with the determination coefficient value of 99.71% and the RMSEP value of 41.65.


Sign in / Sign up

Export Citation Format

Share Document