Optimistic initialization and greediness lead to polynomial time learning in factored MDPs

Author(s):  
István Szita ◽  
András Lőrincz
2004 ◽  
Vol 329 (1-3) ◽  
pp. 203-221 ◽  
Author(s):  
Yasuhiro Tajima ◽  
Etsuji Tomita ◽  
Mitsuo Wakatsuki ◽  
Matsuaki Terada

2000 ◽  
Vol 18 (3) ◽  
pp. 217-242 ◽  
Author(s):  
Satoru Miyano ◽  
Ayumi Shinohara ◽  
Takeshi Shinohara

2018 ◽  
Vol 60 (2) ◽  
pp. 360-375
Author(s):  
A. V. Vasil'ev ◽  
D. V. Churikov

10.29007/v68w ◽  
2018 ◽  
Author(s):  
Ying Zhu ◽  
Mirek Truszczynski

We study the problem of learning the importance of preferences in preference profiles in two important cases: when individual preferences are aggregated by the ranked Pareto rule, and when they are aggregated by positional scoring rules. For the ranked Pareto rule, we provide a polynomial-time algorithm that finds a ranking of preferences such that the ranked profile correctly decides all the examples, whenever such a ranking exists. We also show that the problem to learn a ranking maximizing the number of correctly decided examples (also under the ranked Pareto rule) is NP-hard. We obtain similar results for the case of weighted profiles when positional scoring rules are used for aggregation.


Sign in / Sign up

Export Citation Format

Share Document