scholarly journals A bias/variance decomposition for models using collective inference

2008 ◽  
Vol 73 (1) ◽  
pp. 87-106 ◽  
Author(s):  
Jennifer Neville ◽  
David Jensen
1998 ◽  
Vol 10 (6) ◽  
pp. 1425-1433 ◽  
Author(s):  
Tom Heskes

The bias/variance decomposition of mean-squared error is well understood and relatively straightforward. In this note, a similar simple decomposition is derived, valid for any kind of error measure that, when using the appropriate probability model, can be derived from a Kullback-Leibler divergence or log-likelihood.


2020 ◽  
Vol 25 (2) ◽  
pp. 37 ◽  
Author(s):  
Vicente-Josué Aguilera-Rueda ◽  
Nicandro Cruz-Ramírez ◽  
Efrén Mezura-Montes

We present a novel bi-objective approach to address the data-driven learning problem of Bayesian networks. Both the log-likelihood and the complexity of each candidate Bayesian network are considered as objectives to be optimized by our proposed algorithm named Nondominated Sorting Genetic Algorithm for learning Bayesian networks (NS2BN) which is based on the well-known NSGA-II algorithm. The core idea is to reduce the implicit selection bias-variance decomposition while identifying a set of competitive models using both objectives. Numerical results suggest that, in stark contrast to the single-objective approach, our bi-objective approach is useful to find competitive Bayesian networks especially in the complexity. Furthermore, our approach presents the end user with a set of solutions by showing different Bayesian network and their respective MDL and classification accuracy results.


2018 ◽  
Vol 67 (2) ◽  
pp. 268-283 ◽  
Author(s):  
Liran Lerman ◽  
Nikita Veshchikov ◽  
Olivier Markowitch ◽  
Francois-Xavier Standaert

2000 ◽  
Vol 29 (550) ◽  
Author(s):  
Jakob Vogdrup Hansen

The most important theoretical tool in connection with machine learning is the bias/variance decomposition of error functions. Together with Tom Heskes, I have found the family of error functions with a natural bias/variance decomposition that has target independent variance. It is shown that no other group of error functions can be decomposed in the same way. An open problem in the machine learning community is thereby solved. The error functions are derived from the deviance measure on distributions in the one-parameter exponential family. It is therefore called the deviance error family.<br /> <br /> A bias/variance decomposition can also be viewed as an ambiguity decomposition for an ensemble method. The family of error functions with a natural bias/variance decomposition that has target independent variance can therefore be of use in connection with ensemble methods.<br /> <br /> The logarithmic opinion pool ensemble method has been developed together with Anders Krogh. It is based on the logarithmic opinion pool ambiguity decomposition using the Kullback-Leibler error function. It has been extended to the cross-validation logarithmic opinion pool ensemble method. The advantage of the cross-validation logarithmic opinion pool ensemble method is that it can use unlabeled data to estimate the generalization error, while it still uses the entire labeled example set for training.<br /> <br /> The cross-validation logarithmic opinion pool ensemble method is easily reformulated for another error function, as long as the error function has an ambiguity decomposition with target independent ambiguity. It is therefore possible to use the cross-validation ensemble method on all error functions in the deviance error family.


2016 ◽  
Vol 14 (1) ◽  
pp. 62-80 ◽  
Author(s):  
Taras Kowaliw ◽  
René Doursat

AbstractWe study properties of Linear Genetic Programming (LGP) through several regression and classification benchmarks. In each problem, we decompose the results into bias and variance components, and explore the effect of varying certain key parameters on the overall error and its decomposed contributions. These parameters are the maximum program size, the initial population, and the function set used. We confirm and quantify several insights into the practical usage of GP, most notably that (a) the variance between runs is primarily due to initialization rather than the selection of training samples, (b) parameters can be reasonably optimized to obtain gains in efficacy, and (c) functions detrimental to evolvability are easily eliminated, while functions well-suited to the problem can greatly improve performance—therefore, larger and more diverse function sets are always preferable.


Sign in / Sign up

Export Citation Format

Share Document