Battlefield Agent Decision-Making Based on Markov Decision Process

Author(s):  
Jia Zhang ◽  
◽  
Xiang Wang ◽  
Fang Deng ◽  
Bin Xin ◽  
...  

Battlefield decision-making is an important part of modern information warfare. It can analyze and integrate battlefield information, reduce operators’ work and assist them to make decisions quickly in complex battlefield environment. The paper presents a dynamic battlefield decision-making method based on Markov Decision Processes (MDP). By this method, operators can get decision support quickly in the case of incomplete information. In order to improve the credibility of decisions, dynamic adaptability and intelligence, softmax regression and random forest are introduced to improve the MDP model. Simulations show that the method is intuitive and practical, and has remarkable advantages in solving the dynamic decision problems under incomplete information.

10.28945/2750 ◽  
2004 ◽  
Author(s):  
Abdullah Gani ◽  
Omar Zakaria ◽  
Nor Badrul Anuar Jumaat

This paper presents an application of Markov Decision Process (MDP) into the provision of traffic prioritisation in the best-effort networks. MDP was used because it is a standard, general formalism for modelling stochastic, sequential decision problems. The implementation of traffic prioritisation involves a series of decision making processes by which packets are marked and classified before being despatched to destinations. The application of MDP was driven by the objective of ensuring the higher priority packets are not delayed by the lower ones. The MDP is believed to be applicable in improving the traffic prioritisation arbitration.


1999 ◽  
Vol 32 (2) ◽  
pp. 4852-4857
Author(s):  
Shalabh Bhatnagar ◽  
Michael C. Fu ◽  
Steven I. Marcus ◽  
Ying He

2021 ◽  
pp. 1-16
Author(s):  
Pegah Alizadeh ◽  
Emiliano Traversi ◽  
Aomar Osmani

Markov Decision Process Models (MDPs) are a powerful tool for planning tasks and sequential decision-making issues. In this work we deal with MDPs with imprecise rewards, often used when dealing with situations where the data is uncertain. In this context, we provide algorithms for finding the policy that minimizes the maximum regret. To the best of our knowledge, all the regret-based methods proposed in the literature focus on providing an optimal stochastic policy. We introduce for the first time a method to calculate an optimal deterministic policy using optimization approaches. Deterministic policies are easily interpretable for users because for a given state they provide a unique choice. To better motivate the use of an exact procedure for finding a deterministic policy, we show some (theoretical and experimental) cases where the intuitive idea of using a deterministic policy obtained after “determinizing” the optimal stochastic policy leads to a policy far from the exact deterministic policy.


2013 ◽  
Vol 756-759 ◽  
pp. 504-508
Author(s):  
De Min Li ◽  
Jian Zou ◽  
Kai Kai Yue ◽  
Hong Yun Guan ◽  
Jia Cun Wang

Evacuation for a firefighter in complex fire scene is challenge problem. In this paper, we discuss a firefighters evacuation decision making model in ad hoc robot network on fire scene. Due to the dynamics on fire scene, we know that the sensed information in ad hoc robot network is also dynamically variance. So in this paper, we adapt dynamic decision method, Markov decision process, to model the firefighters decision making process for evacuation from fire scene. In firefighting decision making process, we know that the critical problems are how to define action space and evaluate the transition law in Markov decision process. In this paper, we discuss those problems according to the triangular sensors situation in ad hoc robot network and describe a decision making model for a firefighters evacuation the in the end.


1987 ◽  
Vol 24 (03) ◽  
pp. 644-656 ◽  
Author(s):  
Frederick J. Beutler ◽  
Keith W. Ross

Uniformization permits the replacement of a semi-Markov decision process (SMDP) by a Markov chain exhibiting the same average rewards for simple (non-randomized) policies. It is shown that various anomalies may occur, especially for stationary (randomized) policies; uniformization introduces virtual jumps with concomitant action changes not present in the original process. Since these lead to discrepancies in the average rewards for stationary processes, uniformization can be accepted as valid only for simple policies. We generalize uniformization to yield consistent results for stationary policies also. These results are applied to constrained optimization of SMDP, in which stationary (randomized) policies appear naturally. The structure of optimal constrained SMDP policies can then be elucidated by studying the corresponding controlled Markov chains. Moreover, constrained SMDP optimal policy computations can be more easily implemented in discrete time, the generalized uniformization being employed to relate discrete- and continuous-time optimal constrained policies.


2005 ◽  
Vol 5 (2) ◽  
pp. 23-30
Author(s):  
J.P. Torterotot ◽  
M. Rebelo ◽  
C. Werey ◽  
J. Craveiro

The European Project CARE-W (Computer Aided Rehabilitation of Water Networks), which is supported by the European Commission, has created and tested a prototype of decision support system for the rehabilitation of water pipes. Inside the project, the present operational decision making processes have been analysed in 14 water utilities. The objectives were to identify the involved actors and their interactions, as well as the structure (formal and non formal) of the decision processes: institutional and regulatory contexts, steps of decision making, information fluxes, sharing out of responsibilities and of influence, participation of social and institutional stakeholders. Synthetic results are presented. The cases studied are diversified on several aspects. An “average” situation could be described as showing a moderate level of confrontation, with rather formalised procedures, and very centralised decision making out of the interrelations with road works programming. The highest diversity encountered among the utilities concerns the level of information inside the decision process: data considered, fluxes of information, “sophistication” of criteria taken into account.


2017 ◽  
Vol 54 (4) ◽  
pp. 1071-1088
Author(s):  
Xin Guo ◽  
Alexey Piunovskiy ◽  
Yi Zhang

AbstractWe consider the discounted continuous-time Markov decision process (CTMDP), where the negative part of each cost rate is bounded by a drift function, sayw, whereas the positive part is allowed to be arbitrarily unbounded. Our focus is on the existence of a stationary optimal policy for the discounted CTMDP problems out of the more general class. Both constrained and unconstrained problems are considered. Our investigations are based on the continuous-time version of the Veinott transformation. This technique has not been widely employed in the previous literature on CTMDPs, but it clarifies the roles of the imposed conditions in a rather transparent way.


Sign in / Sign up

Export Citation Format

Share Document