scholarly journals Active Preference Learning using Maximum Regret

Author(s):  
Nils Wilde ◽  
Dana Kulic ◽  
Stephen L. Smith
2020 ◽  
Author(s):  
Alberto Bemporad ◽  
Dario Piga

AbstractThis paper proposes a method for solving optimization problems in which the decision-maker cannot evaluate the objective function, but rather can only express a preference such as “this is better than that” between two candidate decision vectors. The algorithm described in this paper aims at reaching the global optimizer by iteratively proposing the decision maker a new comparison to make, based on actively learning a surrogate of the latent (unknown and perhaps unquantifiable) objective function from past sampled decision vectors and pairwise preferences. A radial-basis function surrogate is fit via linear or quadratic programming, satisfying if possible the preferences expressed by the decision maker on existing samples. The surrogate is used to propose a new sample of the decision vector for comparison with the current best candidate based on two possible criteria: minimize a combination of the surrogate and an inverse weighting distance function to balance between exploitation of the surrogate and exploration of the decision space, or maximize a function related to the probability that the new candidate will be preferred. Compared to active preference learning based on Bayesian optimization, we show that our approach is competitive in that, within the same number of comparisons, it usually approaches the global optimum more closely and is computationally lighter. Applications of the proposed algorithm to solve a set of benchmark global optimization problems, for multi-objective optimization, and for optimal tuning of a cost-sensitive neural network classifier for object recognition from images are described in the paper. MATLAB and a Python implementations of the algorithms described in the paper are available at http://cse.lab.imtlucca.it/~bemporad/glis.


2021 ◽  
pp. 1-16
Author(s):  
Pegah Alizadeh ◽  
Emiliano Traversi ◽  
Aomar Osmani

Markov Decision Process Models (MDPs) are a powerful tool for planning tasks and sequential decision-making issues. In this work we deal with MDPs with imprecise rewards, often used when dealing with situations where the data is uncertain. In this context, we provide algorithms for finding the policy that minimizes the maximum regret. To the best of our knowledge, all the regret-based methods proposed in the literature focus on providing an optimal stochastic policy. We introduce for the first time a method to calculate an optimal deterministic policy using optimization approaches. Deterministic policies are easily interpretable for users because for a given state they provide a unique choice. To better motivate the use of an exact procedure for finding a deterministic policy, we show some (theoretical and experimental) cases where the intuitive idea of using a deterministic policy obtained after “determinizing” the optimal stochastic policy leads to a policy far from the exact deterministic policy.


2019 ◽  
Vol 67 (2) ◽  
pp. 1268-1283 ◽  
Author(s):  
Yanxiang Jiang ◽  
Miaoli Ma ◽  
Mehdi Bennis ◽  
Fu-Chun Zheng ◽  
Xiaohu You

2012 ◽  
Vol 302 (10) ◽  
pp. R1119-R1133 ◽  
Author(s):  
Anthony Sclafani ◽  
Karen Ackroff

The discovery of taste and nutrient receptors (chemosensors) in the gut has led to intensive research on their functions. Whereas oral sugar, fat, and umami taste receptors stimulate nutrient appetite, these and other chemosensors in the gut have been linked to digestive, metabolic, and satiating effects that influence nutrient utilization and inhibit appetite. Gut chemosensors may have an additional function as well: to provide positive feedback signals that condition food preferences and stimulate appetite. The postoral stimulatory actions of nutrients are documented by flavor preference conditioning and appetite stimulation produced by gastric and intestinal infusions of carbohydrate, fat, and protein. Recent findings suggest an upper intestinal site of action, although postabsorptive nutrient actions may contribute to flavor preference learning. The gut chemosensors that generate nutrient conditioning signals remain to be identified; some have been excluded, including sweet (T1R3) and fatty acid (CD36) sensors. The gut-brain signaling pathways (neural, hormonal) are incompletely understood, although vagal afferents are implicated in glutamate conditioning but not carbohydrate or fat conditioning. Brain dopamine reward systems are involved in postoral carbohydrate and fat conditioning but less is known about the reward systems mediating protein/glutamate conditioning. Continued research on the postoral stimulatory actions of nutrients may enhance our understanding of human food preference learning.


2015 ◽  
Vol 27 (7) ◽  
pp. 1549-1553
Author(s):  
Wojciech Rejchel ◽  
Hong Li ◽  
Chuanbao Ren ◽  
Luoqing Li

This note corrects an error in the proof of corollary 1 of Li et al. ( 2014 ). The original claim of the contraction principle in appendix D of Li et al. no longer holds.


PLoS ONE ◽  
2015 ◽  
Vol 10 (10) ◽  
pp. e0141129 ◽  
Author(s):  
Nisheeth Srivastava ◽  
Paul Schrater

Sign in / Sign up

Export Citation Format

Share Document