Automatic first-arrival picking method via intelligent Markov optimal decision processes

Abstract Picking the first arrival is an important step in seismic processing. The large volume of the seismic data calls for automatic and objective picking. In this paper, we formulate first-arrival picking as an intelligent Markov decision process in the multi-dimensional feature attribute space. By designing a reasonable model, the global optimization is carried out in the reward function space to obtain the path with the largest cumulative reward value, to achieve the purpose of automatically picking up the first arrival. The state-value function contains a distance-related discount factor γ, which enables the Markov decision process to pick up the first-arrival continuity to consider the lateral continuity of the seismic data and avoid the bad trace information in the seismic data. On this basis, the method of this paper further introduces the optimized model that is a fuzzy clustering-based multi-dimensional attribute reward function and structure-based Gaussian stochastic policy, thereby reducing the difficulty of model design, and making the seismic data pick up more accurately and automatically. Testing this approach in the field seismic data reveals its properties and shows it can automatically pick up more reasonable first arrivals and has a certain quality control ability, especially the first-arrival energy is weak (the signal-to-noise ratio is low) or there are adjacent complex waveforms in the shallow layer.

Download Full-text

Adiabatic Markov Decision Process: Convergence of Value Iteration Algorithm

Journal of Dynamic Systems Measurement and Control ◽

10.1115/1.4032875 ◽

2016 ◽

Vol 138 (6) ◽

Author(s):

Thai Duong ◽

Duong Nguyen-Huu ◽

Thinh Nguyen

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Transition Probability ◽

Transition Probability Matrix ◽

Rate Of Change ◽

Optimal Decision ◽

Iteration Algorithm ◽

Value Iteration ◽

Markov Decision ◽

Value Iteration Algorithm

Markov decision process (MDP) is a well-known framework for devising the optimal decision-making strategies under uncertainty. Typically, the decision maker assumes a stationary environment which is characterized by a time-invariant transition probability matrix. However, in many real-world scenarios, this assumption is not justified, thus the optimal strategy might not provide the expected performance. In this paper, we study the performance of the classic value iteration algorithm for solving an MDP problem under nonstationary environments. Specifically, the nonstationary environment is modeled as a sequence of time-variant transition probability matrices governed by an adiabatic evolution inspired from quantum mechanics. We characterize the performance of the value iteration algorithm subject to the rate of change of the underlying environment. The performance is measured in terms of the convergence rate to the optimal average reward. We show two examples of queuing systems that make use of our analysis framework.

Download Full-text

FLOW SHOP SCHEDULING WITH REINFORCEMENT LEARNING

Asia Pacific Journal of Operational Research ◽

10.1142/s0217595913500140 ◽

2013 ◽

Vol 30 (05) ◽

pp. 1350014 ◽

Cited By ~ 2

Author(s):

ZHICONG ZHANG ◽

WEIPING WANG ◽

SHOUYAN ZHONG ◽

KAISHUN HU

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Large Scale ◽

Flow Shop ◽

Flow Shop Scheduling ◽

Scheduling Problems ◽

Shop Scheduling ◽

Reward Function ◽

Markov Decision

Reinforcement learning (RL) is a state or action value based machine learning method which solves large-scale multi-stage decision problems such as Markov Decision Process (MDP) and Semi-Markov Decision Process (SMDP) problems. We minimize the makespan of flow shop scheduling problems with an RL algorithm. We convert flow shop scheduling problems into SMDPs by constructing elaborate state features, actions and the reward function. Minimizing the accumulated reward is equivalent to minimizing the schedule objective function. We apply on-line TD(λ) algorithm with linear gradient-descent function approximation to solve the SMDPs. To examine the performance of the proposed RL algorithm, computational experiments are conducted on benchmarking problems in comparison with other scheduling methods. The experimental results support the efficiency of the proposed algorithm and illustrate that the RL approach is a promising computational approach for flow shop scheduling problems worthy of further investigation.

Download Full-text

Efficient PAC Reinforcement Learning in Regular Decision Processes

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/279 ◽

2021 ◽

Author(s):

Alessandro Ronca ◽

Giuseppe De Giacomo

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Polynomial Time ◽

Optimal Policy ◽

Decision Process ◽

Transition Function ◽

Decision Processes ◽

Reward Function ◽

Markov Decision ◽

Reward Functions

Recently regular decision processes have been proposed as a well-behaved form of non-Markov decision process. Regular decision processes are characterised by a transition function and a reward function that depend on the whole history, though regularly (as in regular languages). In practice both the transition and the reward functions can be seen as finite transducers. We study reinforcement learning in regular decision processes. Our main contribution is to show that a near-optimal policy can be PAC-learned in polynomial time in a set of parameters that describe the underlying decision process. We argue that the identified set of parameters is minimal and it reasonably captures the difficulty of a regular decision process.

Download Full-text

A Markov-Based Multi-Attribute Vertical Handoff Decision Algorithm

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.785-786.1403 ◽

2013 ◽

Vol 785-786 ◽

pp. 1403-1407

Author(s):

Qing Yang Song ◽

Xun Li ◽

Shu Yu Ding ◽

Zhao Long Ning

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Decision Problem ◽

Vertical Handoff ◽

Decision Algorithm ◽

Reward Function ◽

Total Reward ◽

Call Dropping ◽

Markov Decision ◽

The Impact

Many vertical handoff decision algorithms have not considered the impact of call dropping during the vertical handoff decision process. Besides, most of current multi-attribute vertical handoff algorithms cannot predict users’ specific circumstances dynamically. In this paper, we formulate the vertical handoff decision problem as a Markov decision process, with the objective of maximizing the expected total reward during the handoff procedure. A reward function is formulated to assess the service quality during each connection. The G1 and entropy methods are applied in an iterative way, by which we can work out a stationary deterministic policy. Numerical results demonstrate the superiority of our proposed algorithm compared with the existing methods.

Download Full-text

Blockchain Framework For Artificial Intelligence Computation

10.21203/rs.3.rs-1000746/v1 ◽

2021 ◽

Author(s):

Jie You

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Deep Neural Network ◽

Iteration Process ◽

Distributed Database ◽

Industrial Applications ◽

Optimal Decision ◽

Markov Decision

Abstract Blockchain is an essentially distributed database recording all transactions or digital events among participating parties. Each transaction in the records is approved and verified by consensus of the participants in the system that requires solving a hard mathematical puzzle, which is known as proof-of-work. To make the approved records immutable, the mathematical puzzle is not trivial to solve and therefore consumes substantial computing resources. However, it is energy-wasteful to have many computational nodes installed in the blockchain competing to approve the records by just solving a meaningless puzzle. Here, we pose proof-of-work as a reinforcement-learning problem by modeling the blockchain growing as a Markov decision process, in which a learning agent makes an optimal decision over the environment’s state, whereas a new block is added and verified. Specifically, we design the block verification and consensus mechanism as a deep reinforcement-learning iteration process. As a result, our method utilizes the determination of state transition and the randomness of action selection of a Markov decision process, as well as the computational complexity of a deep neural network, collectively to make the blocks not easy to recompute and to preserve the order of transactions, while the blockchain nodes are exploited to train the same deep neural network with different data samples (state-action pairs) in parallel, allowing the model to experience multiple episodes across computing nodes but at one time. Our method is used to design the next generation of public blockchain networks, which has the potential not only to spare computational resources for industrial applications but also to encourage data sharing and AI model design for common problems.

Download Full-text

Controllable Summarization with Constrained Markov Decision Process

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00423 ◽

2021 ◽

Vol 9 ◽

pp. 1213-1232

Author(s):

Hou Pong Chan ◽

Lu Wang ◽

Irwin King

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Gain Control ◽

Text Summarization ◽

Reward Function ◽

Markov Decision ◽

Constrained Markov Decision Process

Abstract We study controllable text summarization, which allows users to gain control on a particular attribute (e.g., length limit) of the generated summaries. In this work, we propose a novel training framework based on Constrained Markov Decision Process (CMDP), which conveniently includes a reward function along with a set of constraints, to facilitate better summarization control. The reward function encourages the generation to resemble the human-written reference, while the constraints are used to explicitly prevent the generated summaries from violating user-imposed requirements. Our framework can be applied to control important attributes of summarization, including length, covered entities, and abstractiveness, as we devise specific constraints for each of these aspects. Extensive experiments on popular benchmarks show that our CMDP framework helps generate informative summaries while complying with a given attribute’s requirement.1

Download Full-text

A Markov Decision Process Approach for Cost-Benefit Analysis of Infrastructure Resilience Upgrades

SSRN Electronic Journal ◽

10.2139/ssrn.3657479 ◽

2020 ◽

Author(s):

Qianru Zhu ◽

Benjamin D. Leibowicz

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Cost Benefit Analysis ◽

Cost Benefit ◽

Process Approach ◽

Benefit Analysis ◽

Markov Decision ◽

Infrastructure Resilience

Download Full-text

A Markov Decision Process Workflow for Automating Interior Design

KSCE Journal of Civil Engineering ◽

10.1007/s12205-021-1272-6 ◽

2021 ◽

Author(s):

Ebrahim Karan ◽

Sadegh Asgari ◽

Abbas Rashidi

Keyword(s):

Markov Decision Process ◽

Interior Design ◽

Decision Process ◽

Markov Decision

Download Full-text

A constraint partially observable semi-Markov decision process for the attack–defence relationships in various critical infrastructures

Cyber-Physical Systems ◽

10.1080/23335777.2021.1879935 ◽

2021 ◽

pp. 1-26

Author(s):

Nadia Niknami ◽

Jie Wu

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Critical Infrastructures ◽

Markov Decision ◽

Partially Observable

Download Full-text

Development of a Shipment Policy for Collection Centers

Mathematics ◽

10.3390/math9121385 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1385

Author(s):

Irais Mora-Ochomogo ◽

Marco Serrato ◽

Jaime Mora-Vargas ◽

Raha Akhavan-Tabatabaei

Keyword(s):

Climate Change ◽

Natural Disasters ◽

Markov Decision Process ◽

Decision Process ◽

Necessary Conditions ◽

Decision Makers ◽

Humanitarian Organizations ◽

The World ◽

Markov Decision ◽

Unsatisfied Demand

Natural disasters represent a latent threat for every country in the world. Due to climate change and other factors, statistics show that they continue to be on the rise. This situation presents a challenge for the communities and the humanitarian organizations to be better prepared and react faster to natural disasters. In some countries, in-kind donations represent a high percentage of the supply for the operations, which presents additional challenges. This research proposes a Markov Decision Process (MDP) model to resemble operations in collection centers, where in-kind donations are received, sorted, packed, and sent to the affected areas. The decision addressed is when to send a shipment considering the uncertainty of the donations’ supply and the demand, as well as the logistics costs and the penalty of unsatisfied demand. As a result of the MDP a Monotone Optimal Non-Decreasing Policy (MONDP) is proposed, which provides valuable insights for decision-makers within this field. Moreover, the necessary conditions to prove the existence of such MONDP are presented.

Download Full-text