Active Preference Learning using Maximum Regret

AbstractThis paper proposes a method for solving optimization problems in which the decision-maker cannot evaluate the objective function, but rather can only express a preference such as “this is better than that” between two candidate decision vectors. The algorithm described in this paper aims at reaching the global optimizer by iteratively proposing the decision maker a new comparison to make, based on actively learning a surrogate of the latent (unknown and perhaps unquantifiable) objective function from past sampled decision vectors and pairwise preferences. A radial-basis function surrogate is fit via linear or quadratic programming, satisfying if possible the preferences expressed by the decision maker on existing samples. The surrogate is used to propose a new sample of the decision vector for comparison with the current best candidate based on two possible criteria: minimize a combination of the surrogate and an inverse weighting distance function to balance between exploitation of the surrogate and exploration of the decision space, or maximize a function related to the probability that the new candidate will be preferred. Compared to active preference learning based on Bayesian optimization, we show that our approach is competitive in that, within the same number of comparisons, it usually approaches the global optimum more closely and is computationally lighter. Applications of the proposed algorithm to solve a set of benchmark global optimization problems, for multi-objective optimization, and for optimal tuning of a cost-sensitive neural network classifier for object recognition from images are described in the paper. MATLAB and a Python implementations of the algorithms described in the paper are available at http://cse.lab.imtlucca.it/~bemporad/glis.

Download Full-text

Activity recognition and user preference learning for automated configuration of IoT environments

Proceedings of the 10th International Conference on the Internet of Things ◽

10.1145/3410992.3411003 ◽

2020 ◽

Author(s):

Fahed Alkhabbas ◽

Sadi Alawadi ◽

Romina Spalazzese ◽

Paul Davidsson

Keyword(s):

Activity Recognition ◽

User Preference ◽

Preference Learning

Download Full-text

Data-driven preference learning in multiple criteria decision making in the evidential reasoning context

Applied Soft Computing ◽

10.1016/j.asoc.2021.107109 ◽

2021 ◽

Vol 102 ◽

pp. 107109

Author(s):

Chao Fu ◽

Min Xue ◽

Weiyong Liu ◽

Dongling Xu ◽

Jianbo Yang

Keyword(s):

Decision Making ◽

Multiple Criteria Decision Making ◽

Multiple Criteria ◽

Evidential Reasoning ◽

Data Driven ◽

Preference Learning

Download Full-text

Deterministic policies based on maximum regrets in MDPs with imprecise rewards

AI Communications ◽

10.3233/aic-190632 ◽

2021 ◽

pp. 1-16

Author(s):

Pegah Alizadeh ◽

Emiliano Traversi ◽

Aomar Osmani

Keyword(s):

Decision Making ◽

Decision Process ◽

Process Models ◽

Sequential Decision Making ◽

Sequential Decision ◽

Exact Procedure ◽

Markov Decision ◽

Intuitive Idea ◽

First Time ◽

Maximum Regret

Markov Decision Process Models (MDPs) are a powerful tool for planning tasks and sequential decision-making issues. In this work we deal with MDPs with imprecise rewards, often used when dealing with situations where the data is uncertain. In this context, we provide algorithms for finding the policy that minimizes the maximum regret. To the best of our knowledge, all the regret-based methods proposed in the literature focus on providing an optimal stochastic policy. We introduce for the first time a method to calculate an optimal deterministic policy using optimization approaches. Deterministic policies are easily interpretable for users because for a given state they provide a unique choice. To better motivate the use of an exact procedure for finding a deterministic policy, we show some (theoretical and experimental) cases where the intuitive idea of using a deterministic policy obtained after “determinizing” the optimal stochastic policy leads to a policy far from the exact deterministic policy.

Download Full-text

User Preference Learning-Based Edge Caching for Fog Radio Access Network

IEEE Transactions on Communications ◽

10.1109/tcomm.2018.2880482 ◽

2019 ◽

Vol 67 (2) ◽

pp. 1268-1283 ◽

Cited By ~ 40

Author(s):

Yanxiang Jiang ◽

Miaoli Ma ◽

Mehdi Bennis ◽

Fu-Chun Zheng ◽

Xiaohu You

Keyword(s):

Access Network ◽

User Preference ◽

Preference Learning ◽

Radio Access Network ◽

Radio Access ◽

Fog Radio Access Network ◽

Edge Caching

Download Full-text

Role of gut nutrient sensing in stimulating appetite and conditioning food preferences

AJP Regulatory Integrative and Comparative Physiology ◽

10.1152/ajpregu.00038.2012 ◽

2012 ◽

Vol 302 (10) ◽

pp. R1119-R1133 ◽

Cited By ~ 108

Author(s):

Anthony Sclafani ◽

Karen Ackroff

Keyword(s):

Food Preferences ◽

Nutrient Utilization ◽

Nutrient Sensing ◽

Vagal Afferents ◽

Preference Learning ◽

Flavor Preference ◽

Reward Systems ◽

Additional Function ◽

Umami Taste ◽

Flavor Preference Conditioning

The discovery of taste and nutrient receptors (chemosensors) in the gut has led to intensive research on their functions. Whereas oral sugar, fat, and umami taste receptors stimulate nutrient appetite, these and other chemosensors in the gut have been linked to digestive, metabolic, and satiating effects that influence nutrient utilization and inhibit appetite. Gut chemosensors may have an additional function as well: to provide positive feedback signals that condition food preferences and stimulate appetite. The postoral stimulatory actions of nutrients are documented by flavor preference conditioning and appetite stimulation produced by gastric and intestinal infusions of carbohydrate, fat, and protein. Recent findings suggest an upper intestinal site of action, although postabsorptive nutrient actions may contribute to flavor preference learning. The gut chemosensors that generate nutrient conditioning signals remain to be identified; some have been excluded, including sweet (T1R3) and fatty acid (CD36) sensors. The gut-brain signaling pathways (neural, hormonal) are incompletely understood, although vagal afferents are implicated in glutamate conditioning but not carbohydrate or fat conditioning. Brain dopamine reward systems are involved in postoral carbohydrate and fat conditioning but less is known about the reward systems mediating protein/glutamate conditioning. Continued research on the postoral stimulatory actions of nutrients may enhance our understanding of human food preference learning.

Download Full-text

Comments and Correction on ”U-Processes and Preference Learning” (Neural Computation Vol. 26, pp. 2896–2924, 2014)

Neural Computation ◽

10.1162/neco_c_00746 ◽

2015 ◽

Vol 27 (7) ◽

pp. 1549-1553

Author(s):

Wojciech Rejchel ◽

Hong Li ◽

Chuanbao Ren ◽

Luoqing Li

Keyword(s):

Neural Computation ◽

Contraction Principle ◽

Preference Learning

This note corrects an error in the proof of corollary 1 of Li et al. ( 2014 ). The original claim of the contraction principle in appendix D of Li et al. no longer holds.

Download Full-text

Learning What to Want: Context-Sensitive Preference Learning

PLoS ONE ◽

10.1371/journal.pone.0141129 ◽

2015 ◽

Vol 10 (10) ◽

pp. e0141129 ◽

Cited By ~ 4

Author(s):

Nisheeth Srivastava ◽

Paul Schrater

Keyword(s):

Preference Learning ◽

Context Sensitive

Download Full-text