Hydrocarbon Field Re-Development as Markov Decision Process

2021 ◽  
Author(s):  
Martin Sieberer ◽  
Torsten Clemens

Abstract Hydrocarbon field (re-)development requires that a multitude of decisions are made under uncertainty. These decisions include the type and size of surface facilities, location, configuration and number of wells but also which data to acquire. Both types of decisions, which development to choose and which data to acquire, are strongly coupled. The aim of appraisal is to maximize value while minimizing data acquisition costs. These decisions have to be done under uncertainty owing to the inherent uncertainty of the subsurface but also of other costs and economic parameters. Conventional Value Of Information (VOI) evaluations can be used to determine how much can be spend to acquire data. However, VOI is very challenging to calculate for complex sequences of decisions with various costs and including the risk attitude of the decision maker. We are using a fully observable Markov-Decision-Process (MDP) to determine the policy for the sequence and type of measurements and decisions to do. A fully observable MDP is characterised by the states (here: description of the system at a certain point in time), actions (here: measurements and development scenario), transition function (probabilities of transitioning from one state to the next), and rewards (costs for measurements, Expected Monetary Value (EMV) of development options). Solving the MDP gives the optimum policy, sequence of the decisions, the Probability Of Maturation (POM) of a project, the Expected Monetary Value (EMV), the expected loss, the expected appraisal costs, and the Probability of Economic Success (PES). These key performance indicators can then be used to select in a portfolio of projects the ones generating the highest expected reward for the company. Combining the production forecasts from numerical model ensembles with probabilistic capital and operating expenditures and economic parameters allows for quantitative decision making under uncertainty.

2017 ◽  
Vol 47 (6) ◽  
pp. 800-807 ◽  
Author(s):  
Joseph Buongiorno ◽  
Mo Zhou ◽  
Craig Johnston

Markov decision process models were extended to reflect some consequences of the risk attitude of forestry decision makers. One approach consisted of maximizing the expected value of a criterion subject to an upper bound on the variance or, symmetrically, minimizing the variance subject to a lower bound on the expected value. The other method used the certainty equivalent criterion, a weighted average of the expected value and variance. The two approaches were applied to data for mixed softwood–hardwood forests in the southern United States with multiple financial and ecological criteria. Compared with risk neutrality or risk seeking, financial risk aversion reduced expected annual financial returns and production and led to shorter cutting cycles that lowered the expected diversity of tree species and size, stand basal area, stored CO2e, and old-growth area.


Author(s):  
Thiago D. Simão

Reinforcement Learning (RL) deals with problems that can be modeled as a Markov Decision Process (MDP) where the transition function is unknown. In situations where an arbitrary policy pi is already in execution and the experiences with the environment were recorded in a batch D, an RL algorithm can use D to compute a new policy pi'. However, the policy computed by traditional RL algorithms might have worse performance compared to pi. Our goal is to develop safe RL algorithms, where the agent has a high confidence that the performance of pi' is better than the performance of pi given D. To develop sample-efficient and safe RL algorithms we combine ideas from exploration strategies in RL with a safe policy improvement method.


Author(s):  
Alessandro Ronca ◽  
Giuseppe De Giacomo

Recently regular decision processes have been proposed as a well-behaved form of non-Markov decision process. Regular decision processes are characterised by a transition function and a reward function that depend on the whole history, though regularly (as in regular languages). In practice both the transition and the reward functions can be seen as finite transducers. We study reinforcement learning in regular decision processes. Our main contribution is to show that a near-optimal policy can be PAC-learned in polynomial time in a set of parameters that describe the underlying decision process. We argue that the identified set of parameters is minimal and it reasonably captures the difficulty of a regular decision process.


Mathematics ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. 1385
Author(s):  
Irais Mora-Ochomogo ◽  
Marco Serrato ◽  
Jaime Mora-Vargas ◽  
Raha Akhavan-Tabatabaei

Natural disasters represent a latent threat for every country in the world. Due to climate change and other factors, statistics show that they continue to be on the rise. This situation presents a challenge for the communities and the humanitarian organizations to be better prepared and react faster to natural disasters. In some countries, in-kind donations represent a high percentage of the supply for the operations, which presents additional challenges. This research proposes a Markov Decision Process (MDP) model to resemble operations in collection centers, where in-kind donations are received, sorted, packed, and sent to the affected areas. The decision addressed is when to send a shipment considering the uncertainty of the donations’ supply and the demand, as well as the logistics costs and the penalty of unsatisfied demand. As a result of the MDP a Monotone Optimal Non-Decreasing Policy (MONDP) is proposed, which provides valuable insights for decision-makers within this field. Moreover, the necessary conditions to prove the existence of such MONDP are presented.


Sign in / Sign up

Export Citation Format

Share Document