Search Control, Utility, and Concept Induction**This research was supported by NASA Ames grant NCC 2–645.

Author(s):  
Brian Carlson ◽  
Jerry Weinberg ◽  
Doug Fisher
1993 ◽  
Author(s):  
Jihie Kim ◽  
Paul S. Rosenbloom
Keyword(s):  

2021 ◽  
Vol 13 (8) ◽  
pp. 4293
Author(s):  
Yuqing Lin ◽  
Jingjing Wu ◽  
Yongqing Xiong

With the background of China’s new energy vehicles (NEVs) subsidies declining, there is an important practical significance to effectively play the role of the nonsubsidized consumption promotion mechanisms. The nonsubsidized mechanisms for NEVs are classified into two types—concept induction and policy incentives, and differences in the sensitivity of the two types of mechanisms on potential consumer purchase intentions due to differences in urban traffic patterns and consumer education levels are analyzed. The results show that consumers in cities with medium to high traffic pressure are more sensitive to the right-of-way privileges component of the policy incentives, and consumers in cities with low traffic pressure are more sensitive to the charging guarantee component of the policy incentives. Consumers with medium to high education are more sensitive to the pro-environmental component in concept induction, and consumers with low education are more sensitive to the charging guarantee policy component of the policy incentives. Therefore, the implementation of the nonsubsidized mechanisms for NEVs in China should adopt differentiated strategies based on local conditions and vary with each individual.


Author(s):  
Yangchen Pan ◽  
Hengshuai Yao ◽  
Amir-massoud Farahmand ◽  
Martha White

Dyna is an architecture for model based reinforcement learning (RL), where simulated experience from a model is used to update policies or value functions. A key component of Dyna is search control, the mechanism to generate the state and action from which the agent queries the model, which remains largely unexplored. In this work, we propose to generate such states by using the trajectory obtained from Hill Climbing (HC) the current estimate of the value function. This has the effect of propagating value from high value regions and of preemptively updating value estimates of the regions that the agent is likely to visit next. We derive a noisy projected natural gradient algorithm for hill climbing, and highlight a connection to Langevin dynamics. We provide an empirical demonstration on four classical domains that our algorithm, HC Dyna, can obtain significant sample efficiency improvements. We study the properties of different sampling distributions for search control, and find that there appears to be a benefit specifically from using the samples generated by climbing on current value estimates from low value to high value region.


2006 ◽  
Vol 25 ◽  
pp. 17-74 ◽  
Author(s):  
S. Thiebaux ◽  
C. Gretton ◽  
J. Slaney ◽  
D. Price ◽  
F. Kabanza

A decision process in which rewards depend on history rather than merely on the current state is called a decision process with non-Markovian rewards (NMRDP). In decision-theoretic planning, where many desirable behaviours are more naturally expressed as properties of execution sequences rather than as properties of states, NMRDPs form a more natural model than the commonly adopted fully Markovian decision process (MDP) model. While the more tractable solution methods developed for MDPs do not directly apply in the presence of non-Markovian rewards, a number of solution methods for NMRDPs have been proposed in the literature. These all exploit a compact specification of the non-Markovian reward function in temporal logic, to automatically translate the NMRDP into an equivalent MDP which is solved using efficient MDP solution methods. This paper presents NMRDPP (Non-Markovian Reward Decision Process Planner), a software platform for the development and experimentation of methods for decision-theoretic planning with non-Markovian rewards. The current version of NMRDPP implements, under a single interface, a family of methods based on existing as well as new approaches which we describe in detail. These include dynamic programming, heuristic search, and structured methods. Using NMRDPP, we compare the methods and identify certain problem features that affect their performance. NMRDPP's treatment of non-Markovian rewards is inspired by the treatment of domain-specific search control knowledge in the TLPlan planner, which it incorporates as a special case. In the First International Probabilistic Planning Competition, NMRDPP was able to compete and perform well in both the domain-independent and hand-coded tracks, using search control knowledge in the latter.


Author(s):  
Nicola Fanizzi

This paper presents an approach to ontology construction pursued through the induction of concept descriptions expressed in Description Logics. The author surveys the theoretical foundations of the standard representations for formal ontologies in the Semantic Web. After stating the learning problem in this peculiar context, a FOIL-like algorithm is presented that can be applied to learn DL concept descriptions. The algorithm performs a search through a space of candidate concept definitions by means of refinement operators. This process is guided by heuristics that are based on the available examples. The author discusses related theoretical aspects of learning with the inherent incompleteness underlying the semantics of this representation. The experimental evaluation of the system DL-Foil, which implements the learning algorithm, was carried out in two series of sessions on real ontologies from standard repositories for different domains expressed in diverse description logics.


Sign in / Sign up

Export Citation Format

Share Document