value function
Recently Published Documents


TOTAL DOCUMENTS

1118
(FIVE YEARS 260)

H-INDEX

39
(FIVE YEARS 4)

2022 ◽  
Vol 6 (1) ◽  
pp. 37-52
Author(s):  
Aaron Anil Chadee ◽  
Xsitaaz T. Chadee ◽  
Clyde Chadee ◽  
Festus Otuloge

The tilted S-shaped utility function proposed in Prospect Theory (PT) relied fundamentally on the geometrical notion that there is a discontinuity between gains and losses, and that individual preferences change relative to a reference point. This results in PT having three distinct parameters; concavity, convexity and the reference point represented as a disjoint between the concavity and convexity sections of the curve. The objective of this paper is to examine the geometrical violations of PT at the zero point of reference. This qualitative study adopted a theoretical review of PT and Markowitz’s triply inflected value function concept to unravel methodological assumptions which were not fully addressed by either PT or cumulative PT. Our findings suggest a need to account for continuity and to resolve this violation of PT at the reference point. In so doing, an alternative preference transition theory, was proposed as a solution that includes a phase change space to cojoin these three separate parameters into one continuous nonlinear model. This novel conceptual model adds new knowledge of risk and uncertainty in decision making. Through a better understanding of an individual’s reference point in decision making behaviour, we add to contemporary debate by complementing empirical studies and harmonizing research in this field. Doi: 10.28991/ESJ-2022-06-01-03 Full Text: PDF


2022 ◽  
Vol 2022 (1) ◽  
Author(s):  
Jun Moon

AbstractWe consider the optimal control problem for stochastic differential equations (SDEs) with random coefficients under the recursive-type objective functional captured by the backward SDE (BSDE). Due to the random coefficients, the associated Hamilton–Jacobi–Bellman (HJB) equation is a class of second-order stochastic PDEs (SPDEs) driven by Brownian motion, which we call the stochastic HJB (SHJB) equation. In addition, as we adopt the recursive-type objective functional, the drift term of the SHJB equation depends on the second component of its solution. These two generalizations cause several technical intricacies, which do not appear in the existing literature. We prove the dynamic programming principle (DPP) for the value function, for which unlike the existing literature we have to use the backward semigroup associated with the recursive-type objective functional. By the DPP, we are able to show the continuity of the value function. Using the Itô–Kunita’s formula, we prove the verification theorem, which constitutes a sufficient condition for optimality and characterizes the value function, provided that the smooth (classical) solution of the SHJB equation exists. In general, the smooth solution of the SHJB equation may not exist. Hence, we study the existence and uniqueness of the solution to the SHJB equation under two different weak solution concepts. First, we show, under appropriate assumptions, the existence and uniqueness of the weak solution via the Sobolev space technique, which requires converting the SHJB equation to a class of backward stochastic evolution equations. The second result is obtained under the notion of viscosity solutions, which is an extension of the classical one to the case for SPDEs. Using the DPP and the estimates of BSDEs, we prove that the value function is the viscosity solution to the SHJB equation. For applications, we consider the linear-quadratic problem, the utility maximization problem, and the European option pricing problem. Specifically, different from the existing literature, each problem is formulated by the generalized recursive-type objective functional and is subject to random coefficients. By applying the theoretical results of this paper, we obtain the explicit optimal solution for each problem in terms of the solution of the corresponding SHJB equation.


Author(s):  
Amin Asadi ◽  
Sarah Nurre Pinkley

There is a growing interest in using electric vehicles (EVs) and drones for many applications. However, battery-oriented issues, including range anxiety and battery degradation, impede adoption. Battery swap stations are one alternative to reduce these concerns that allow the swap of depleted for full batteries in minutes. We consider the problem of deriving actions at a battery swap station when explicitly considering the uncertain arrival of swap demand, battery degradation, and replacement. We model the operations at a battery swap station using a finite horizon Markov decision process model for the stochastic scheduling, allocation, and inventory replenishment problem (SAIRP), which determines when and how many batteries are charged, discharged, and replaced over time. We present theoretical proofs for the monotonicity of the value function and monotone structure of an optimal policy for special SAIRP cases. Because of the curses of dimensionality, we develop a new monotone approximate dynamic programming (ADP) method, which intelligently initializes a value function approximation using regression. In computational tests, we demonstrate the superior performance of the new regression-based monotone ADP method compared with exact methods and other monotone ADP methods. Furthermore, with the tests, we deduce policy insights for drone swap stations.


Author(s):  
Bingxin Yao ◽  
Bin Wu ◽  
Siyun Wu ◽  
Yin Ji ◽  
Danggui Chen ◽  
...  

In this paper, an offloading algorithm based on Markov Decision Process (MDP) is proposed to solve the multi-objective offloading decision problem in Mobile Edge Computing (MEC) system. The feature of the algorithm is that MDP is used to make offloading decision. The number of tasks in the task queue, the number of accessible edge clouds and Signal-Noise-Ratio (SNR) of the wireless channel are taken into account in the state space of the MDP model. The offloading delay and energy consumption are considered to define the value function of the MDP model, i.e. the objective function. To maximize the value function, Value Iteration Algorithm is used to obtain the optimal offloading policy. According to the policy, tasks of mobile terminals (MTs) are offloaded to the edge cloud or central cloud, or executed locally. The simulation results show that the proposed algorithm can effectively reduce the offloading delay and energy consumption.


2022 ◽  
Vol 2022 (1) ◽  
Author(s):  
Junkee Jeon ◽  
Minsuk Kwak

AbstractWe introduce a variable annuity (VA) contract with a surrender option and lookback benefit, that is, the benefit of the VA contract is linked to the maximum process of the policyholder’s account value. In contrast to the constant guarantee model provided in Bernard et al. (Insur. Math. Econ. 55:116–128, 2014), it is optimal for the policyholder of the VA contract with lookback benefit to surrender the VA contract when the policyholder’s account value is below or equal to the optimal surrender boundary. Thus, from the perspective of the insurer to construct a portfolio of VA contracts, utilizing the VA contracts with lookback benefit along with VA contracts with constant guarantee provides the diversification of early surrenders. The valuation of this contract can be described as a two-dimensional parabolic variational inequality. By converting this into the one-dimensional problem, we obtain the integral equations for the value function and the free boundary. The recursive integration method is applied to obtain the numerical solutions. We also provide comparative statics of the optimal surrender boundaries with respect to various parameters.


2021 ◽  
Vol 59 (1) ◽  
pp. 77-107

Political risk concerns the profits and investment plans of international business (MNCs, FDI). The Social Dimensions of Political Risk – SDPR is an unchartered territory of political risk. Consequently, on the basis of the analysis of theories of risk, political risk, systems, values and globalization the concept for SDPR is generated. This concept is based on basic assumptions: 1) society is a system whose elements are subsystems; 2) the societal subsystem is at the core of society; 3) the relation between societal subsystem and society is such as the relation element – system; 4) political risk is systemic; 5) values are axial to the system, and their carrier is the societal subsystem; 6) laws are an artificial construct that has only a value function, but is not a value; 7) the incommensurability between values and the above mentioned artificial construct generates SDPRs that are relevant to the risk for society. A formal theoretical and analytical model of SDPR and a value triangle and conceptual index of SDPR based on it are introduced. Key conclusions pertain to the following: the need for reconsider the paradigm of democracy; greater participation of the societal subsystem; need for subsystems’ mutual restraint based on the principle of authorities’ restraint.


Author(s):  
Иван Борисович Микиртумов

В статье я излагаю свои соображения по поводу статьи Евгения Борисова, помещённой в этом выпуске журнала. Попутно я излагаю своё видение проблем кросс-мировой предикации и кросс-идентификации. Я полагаю, что межмировое тождество невозможно и что главная задача состоит в обеспечении идентификации. Для этого можно использовать либо метод поддержания когнитивного контакта либо метод двойников, отождествляемых по набору существенных признаков. Он определяется прагматически. Метод жёстких десигнаторов также ведёт к интенсиональной логике, поскольку в языке-объекте должны присутствовать релятивизованные к мирам имена объектов. Борисов пытается построить логику кросс-мировой предикации сразу на нескольких основаниях, которые плохо совместимы друг с другом. Он квантифицирует по возможным индивидам, но при этом пытается опереться на метаязыковые имена индивидов как на основание для кросс-идентификации, метаязыковое имя индивида становится аргументом для функции значения, хотя не является жёстким десигнатором. Ключевая операция системы Борисова - назначение двойника в возможном мире - спрятана за функцией f, которая выступает в роли условия идентификации, т. е. прочерчивает кросс-мировую линию. На мой взгляд, система имеет потенциал, но нуждается в додумывании и уточнении. In this article, I present my comments on the article by Evgeny Borisov, which is included in this issue of the journal. Along the way, I set out my vision of the problems of cross-world predication and cross-identification. I believe that cross-world identity is impossible, and that the main task is to provide identification. To do this, you can use either the method of keeping cognitive contact, or the method of counterparts identified by a set of essential features, which is defined pragmatically. The method of rigid designators leads to intensional logic, since the object language must contain object names that are relativized to worlds. Borisov is trying to build the logic of cross-world predication on several bases at once, which are poorly compatible with each other. He quantifies over the domain of possible individuals, but at the same time he tries to rely on the metalinguistic names of individuals as a basis for cross-identification, the metalinguistic name of an individual becomes an argument for the value function, although it is not a rigid designator. The key operation of Borisov’s system is the appointment of a counterpart in a possible world. It is hidden behind the function f, which acts as a condition for identification, that is, it draws a cross-world line. In my opinion, the system has some good prospects, but it needs to be thought out and refined.


2021 ◽  
Author(s):  
◽  
Yiming Peng

<p>Reinforcement Learning (RL) problems appear in diverse real-world applications and are gaining substantial attention in academia and industry. Policy Direct Search (PDS) is widely recognized as an effective approach to RL problems. However, existing PDS algorithms have some major limitations. First, many step-wise Policy Gradient Search (PGS) algorithms cannot effectively utilize informative historical gradients to accurately estimate policy gradients. Second, although evolutionary PDS algorithms do not rely on accurate policy gradient estimations and can explore learning environments effectively, they are not sample efficient at learning policies in the form of deep neural networks. Third, existing PGS algorithms often diverge easily due to the lack of reliable and flexible techniques for value function learning. Fourth, existing PGS algorithms have not provided suitable mechanisms to learn proper state features automatically.  To address these limitations, the overall goal of this thesis is to develop effective policy direct search algorithms for tackling challenging RL problems through technical innovations in four key areas. First, the thesis aims to improve the accuracy of policy gradient estimation by utilizing historical gradients through a Primal-Dual Approximation technique. Second, the thesis targets on surpassing the state-of-the-art performance by properly balancing the exploration-exploitation trade-off via Covariance Matrix Adaption Evolutionary Strategy (CMA-ES) and Proximal Policy Optimization (PPO). Third, the thesis seeks to stabilize value function learning via a self-organized Sandpile Model (SM) meanwhile generalize the compatible condition to support flexible value function learning. Fourth, the thesis endeavors to develop innovative evolutionary feature learning techniques that are capable of automatically extracting useful state features so as to enhance various cutting-edge PGS algorithms.  In the thesis, we explore the four key technical areas by studying policies with increasing complexity. First of all, we start the research from a simple linear policy representation, and then proceed to a complex neural network based policy representation. Next, we consider a more complicated situation where policy learning is coupled with a value function learning. Subsequently, we consider policies modeled as a concatenation of two interrelated networks, one for feature learning and one for action selection.  To achieve the first goal, this thesis proposes a new policy gradient learning framework where a series of historical gradients are jointly exploited to obtain accurate policy gradient estimations via the Primal-Dual Approximation technique. Under the framework, three new PGS algorithms for step-wise policy training have been derived from three widely used PGS algorithms; meanwhile, the convergence properties of these new algorithms have been theoretically analyzed. The empirical results on several benchmark control problems further show that the newly proposed algorithms can significantly outperform their base algorithms.  To achieve the second goal, this thesis develops a new sample efficient evolutionary deep policy optimization algorithm based on CMA-ES and PPO. The algorithm has a layer-wise learning mechanism to improve computational efficiency in comparison to CMA-ES. Additionally, it uses a performance lower bound based surrogate model for fitness evaluation to significantly reduce the sample cost to the state-of-the-art level. More importantly, the best policy found by CMA-ES at every generation is further improved by PPO to properly balance exploration and exploitation. The experimental results confirm that the proposed algorithm outperforms various cutting-edge algorithms on many benchmark continuous control problems.  To achieve the third goal, this thesis develops new value function learning methods that are both reliable and flexible so as to further enhance the effectiveness of policy gradient search. Two Actor-Critic (AC) algorithms have been successfully developed from a commonly-used PGS algorithm, i.e., Regular Actor-Critic (RAC). The first algorithm adopts SM to stabilize value function learning, and the second algorithm generalizes the logarithm function used by the compatible condition to provide a flexible family of new compatible functions. The experimental results show that, with the help of reliable and flexible value function learning, the newly developed algorithms are more effective than RAC on several benchmark control problems.  To achieve the fourth goal, this thesis develops innovative NeuroEvolution algorithms for automated feature learning to enhance various cutting-edge PGS algorithms. The newly developed algorithms not only can extract useful state features but also learn good policies. The experimental analysis demonstrates that the newly proposed algorithms can achieve better performance on large-scale RL problems in comparison to both well-known PGS algorithms and NeuroEvolution techniques. Our experiments also confirm that the state features learned by NeuroEvolution on one RL task can be easily transferred to boost learning performance on similar but different tasks.</p>


Sign in / Sign up

Export Citation Format

Share Document