scholarly journals Automatic first-arrival picking method via intelligent Markov optimal decision processes

2021 ◽  
Vol 18 (3) ◽  
pp. 406-417
Author(s):  
Fei Luo ◽  
Bo Feng ◽  
Huazhong Wang

Abstract Picking the first arrival is an important step in seismic processing. The large volume of the seismic data calls for automatic and objective picking. In this paper, we formulate first-arrival picking as an intelligent Markov decision process in the multi-dimensional feature attribute space. By designing a reasonable model, the global optimization is carried out in the reward function space to obtain the path with the largest cumulative reward value, to achieve the purpose of automatically picking up the first arrival. The state-value function contains a distance-related discount factor γ, which enables the Markov decision process to pick up the first-arrival continuity to consider the lateral continuity of the seismic data and avoid the bad trace information in the seismic data. On this basis, the method of this paper further introduces the optimized model that is a fuzzy clustering-based multi-dimensional attribute reward function and structure-based Gaussian stochastic policy, thereby reducing the difficulty of model design, and making the seismic data pick up more accurately and automatically. Testing this approach in the field seismic data reveals its properties and shows it can automatically pick up more reasonable first arrivals and has a certain quality control ability, especially the first-arrival energy is weak (the signal-to-noise ratio is low) or there are adjacent complex waveforms in the shallow layer.

2016 ◽  
Vol 138 (6) ◽  
Author(s):  
Thai Duong ◽  
Duong Nguyen-Huu ◽  
Thinh Nguyen

Markov decision process (MDP) is a well-known framework for devising the optimal decision-making strategies under uncertainty. Typically, the decision maker assumes a stationary environment which is characterized by a time-invariant transition probability matrix. However, in many real-world scenarios, this assumption is not justified, thus the optimal strategy might not provide the expected performance. In this paper, we study the performance of the classic value iteration algorithm for solving an MDP problem under nonstationary environments. Specifically, the nonstationary environment is modeled as a sequence of time-variant transition probability matrices governed by an adiabatic evolution inspired from quantum mechanics. We characterize the performance of the value iteration algorithm subject to the rate of change of the underlying environment. The performance is measured in terms of the convergence rate to the optimal average reward. We show two examples of queuing systems that make use of our analysis framework.


2013 ◽  
Vol 30 (05) ◽  
pp. 1350014 ◽  
Author(s):  
ZHICONG ZHANG ◽  
WEIPING WANG ◽  
SHOUYAN ZHONG ◽  
KAISHUN HU

Reinforcement learning (RL) is a state or action value based machine learning method which solves large-scale multi-stage decision problems such as Markov Decision Process (MDP) and Semi-Markov Decision Process (SMDP) problems. We minimize the makespan of flow shop scheduling problems with an RL algorithm. We convert flow shop scheduling problems into SMDPs by constructing elaborate state features, actions and the reward function. Minimizing the accumulated reward is equivalent to minimizing the schedule objective function. We apply on-line TD(λ) algorithm with linear gradient-descent function approximation to solve the SMDPs. To examine the performance of the proposed RL algorithm, computational experiments are conducted on benchmarking problems in comparison with other scheduling methods. The experimental results support the efficiency of the proposed algorithm and illustrate that the RL approach is a promising computational approach for flow shop scheduling problems worthy of further investigation.


Author(s):  
Alessandro Ronca ◽  
Giuseppe De Giacomo

Recently regular decision processes have been proposed as a well-behaved form of non-Markov decision process. Regular decision processes are characterised by a transition function and a reward function that depend on the whole history, though regularly (as in regular languages). In practice both the transition and the reward functions can be seen as finite transducers. We study reinforcement learning in regular decision processes. Our main contribution is to show that a near-optimal policy can be PAC-learned in polynomial time in a set of parameters that describe the underlying decision process. We argue that the identified set of parameters is minimal and it reasonably captures the difficulty of a regular decision process.


2013 ◽  
Vol 785-786 ◽  
pp. 1403-1407
Author(s):  
Qing Yang Song ◽  
Xun Li ◽  
Shu Yu Ding ◽  
Zhao Long Ning

Many vertical handoff decision algorithms have not considered the impact of call dropping during the vertical handoff decision process. Besides, most of current multi-attribute vertical handoff algorithms cannot predict users’ specific circumstances dynamically. In this paper, we formulate the vertical handoff decision problem as a Markov decision process, with the objective of maximizing the expected total reward during the handoff procedure. A reward function is formulated to assess the service quality during each connection. The G1 and entropy methods are applied in an iterative way, by which we can work out a stationary deterministic policy. Numerical results demonstrate the superiority of our proposed algorithm compared with the existing methods.


2021 ◽  
Author(s):  
Jie You

Abstract Blockchain is an essentially distributed database recording all transactions or digital events among participating parties. Each transaction in the records is approved and verified by consensus of the participants in the system that requires solving a hard mathematical puzzle, which is known as proof-of-work. To make the approved records immutable, the mathematical puzzle is not trivial to solve and therefore consumes substantial computing resources. However, it is energy-wasteful to have many computational nodes installed in the blockchain competing to approve the records by just solving a meaningless puzzle. Here, we pose proof-of-work as a reinforcement-learning problem by modeling the blockchain growing as a Markov decision process, in which a learning agent makes an optimal decision over the environment’s state, whereas a new block is added and verified. Specifically, we design the block verification and consensus mechanism as a deep reinforcement-learning iteration process. As a result, our method utilizes the determination of state transition and the randomness of action selection of a Markov decision process, as well as the computational complexity of a deep neural network, collectively to make the blocks not easy to recompute and to preserve the order of transactions, while the blockchain nodes are exploited to train the same deep neural network with different data samples (state-action pairs) in parallel, allowing the model to experience multiple episodes across computing nodes but at one time. Our method is used to design the next generation of public blockchain networks, which has the potential not only to spare computational resources for industrial applications but also to encourage data sharing and AI model design for common problems.


2021 ◽  
Vol 9 ◽  
pp. 1213-1232
Author(s):  
Hou Pong Chan ◽  
Lu Wang ◽  
Irwin King

Abstract We study controllable text summarization, which allows users to gain control on a particular attribute (e.g., length limit) of the generated summaries. In this work, we propose a novel training framework based on Constrained Markov Decision Process (CMDP), which conveniently includes a reward function along with a set of constraints, to facilitate better summarization control. The reward function encourages the generation to resemble the human-written reference, while the constraints are used to explicitly prevent the generated summaries from violating user-imposed requirements. Our framework can be applied to control important attributes of summarization, including length, covered entities, and abstractiveness, as we devise specific constraints for each of these aspects. Extensive experiments on popular benchmarks show that our CMDP framework helps generate informative summaries while complying with a given attribute’s requirement.1


Mathematics ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. 1385
Author(s):  
Irais Mora-Ochomogo ◽  
Marco Serrato ◽  
Jaime Mora-Vargas ◽  
Raha Akhavan-Tabatabaei

Natural disasters represent a latent threat for every country in the world. Due to climate change and other factors, statistics show that they continue to be on the rise. This situation presents a challenge for the communities and the humanitarian organizations to be better prepared and react faster to natural disasters. In some countries, in-kind donations represent a high percentage of the supply for the operations, which presents additional challenges. This research proposes a Markov Decision Process (MDP) model to resemble operations in collection centers, where in-kind donations are received, sorted, packed, and sent to the affected areas. The decision addressed is when to send a shipment considering the uncertainty of the donations’ supply and the demand, as well as the logistics costs and the penalty of unsatisfied demand. As a result of the MDP a Monotone Optimal Non-Decreasing Policy (MONDP) is proposed, which provides valuable insights for decision-makers within this field. Moreover, the necessary conditions to prove the existence of such MONDP are presented.


Sign in / Sign up

Export Citation Format

Share Document