Controllable Summarization with Constrained Markov Decision Process

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00423 ◽

2021 ◽

Vol 9 ◽

pp. 1213-1232

Author(s):

Hou Pong Chan ◽

Lu Wang ◽

Irwin King

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Gain Control ◽

Text Summarization ◽

Reward Function ◽

Markov Decision ◽

Constrained Markov Decision Process

Abstract We study controllable text summarization, which allows users to gain control on a particular attribute (e.g., length limit) of the generated summaries. In this work, we propose a novel training framework based on Constrained Markov Decision Process (CMDP), which conveniently includes a reward function along with a set of constraints, to facilitate better summarization control. The reward function encourages the generation to resemble the human-written reference, while the constraints are used to explicitly prevent the generated summaries from violating user-imposed requirements. Our framework can be applied to control important attributes of summarization, including length, covered entities, and abstractiveness, as we devise specific constraints for each of these aspects. Extensive experiments on popular benchmarks show that our CMDP framework helps generate informative summaries while complying with a given attribute’s requirement.1

Download Full-text

Adaptive Virtual Resource Allocation in 5G Network Slicing Using Constrained Markov Decision Process

IEEE Access ◽

10.1109/access.2018.2876544 ◽

2018 ◽

Vol 6 ◽

pp. 61184-61195 ◽

Cited By ~ 10

Author(s):

Lun Tang ◽

Qi Tan ◽

Yingjie Shi ◽

Chenmeng Wang ◽

Qianbin Chen

Keyword(s):

Resource Allocation ◽

Markov Decision Process ◽

Decision Process ◽

Network Slicing ◽

5G Network ◽

Markov Decision ◽

Constrained Markov Decision Process

Download Full-text

Structural Results on Optimal Transmission Scheduling over Dynamical Fading Channels: A Constrained Markov Decision Process Approach

Wireless Communications - The IMA Volumes in Mathematics and its Applications ◽

10.1007/978-0-387-48945-2_4 ◽

2007 ◽

pp. 75-98 ◽

Cited By ~ 5

Author(s):

Dejan V. Djonin ◽

Vikram Krishnamurthy

Keyword(s):

Fading Channels ◽

Markov Decision Process ◽

Decision Process ◽

Process Approach ◽

Transmission Scheduling ◽

Markov Decision ◽

Constrained Markov Decision Process

Download Full-text

VARIANCE CONSTRAINED MARKOV DECISION PROCESS

Journal of the Operations Research Society of Japan ◽

10.15807/jorsj.30.88 ◽

1987 ◽

Vol 30 (1) ◽

pp. 88-100 ◽

Cited By ~ 1

Author(s):

Hajime Kawai ◽

Naoki Katoh

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Markov Decision ◽

Constrained Markov Decision Process

Download Full-text

FLOW SHOP SCHEDULING WITH REINFORCEMENT LEARNING

Asia Pacific Journal of Operational Research ◽

10.1142/s0217595913500140 ◽

2013 ◽

Vol 30 (05) ◽

pp. 1350014 ◽

Cited By ~ 2

Author(s):

ZHICONG ZHANG ◽

WEIPING WANG ◽

SHOUYAN ZHONG ◽

KAISHUN HU

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Large Scale ◽

Flow Shop ◽

Flow Shop Scheduling ◽

Scheduling Problems ◽

Shop Scheduling ◽

Reward Function ◽

Markov Decision

Reinforcement learning (RL) is a state or action value based machine learning method which solves large-scale multi-stage decision problems such as Markov Decision Process (MDP) and Semi-Markov Decision Process (SMDP) problems. We minimize the makespan of flow shop scheduling problems with an RL algorithm. We convert flow shop scheduling problems into SMDPs by constructing elaborate state features, actions and the reward function. Minimizing the accumulated reward is equivalent to minimizing the schedule objective function. We apply on-line TD(λ) algorithm with linear gradient-descent function approximation to solve the SMDPs. To examine the performance of the proposed RL algorithm, computational experiments are conducted on benchmarking problems in comparison with other scheduling methods. The experimental results support the efficiency of the proposed algorithm and illustrate that the RL approach is a promising computational approach for flow shop scheduling problems worthy of further investigation.

Download Full-text

A Constrained Markov Decision Process for Flight Safety Assessment and Management

AIAA Infotech @ Aerospace ◽

10.2514/6.2015-0115 ◽

2015 ◽

Cited By ~ 4

Author(s):

Sweewarman Balachandran ◽

Ella M. Atkins

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Safety Assessment ◽

Flight Safety ◽

Markov Decision ◽

Constrained Markov Decision Process

Download Full-text

Constrained Markov Decision Process Modeling for Optimal Sensing of Cardiac Events in Mobile Health

IEEE Transactions on Automation Science and Engineering ◽

10.1109/tase.2021.3052483 ◽

2021 ◽

pp. 1-13

Author(s):

Bing Yao ◽

Yun Chen ◽

Hui Yang

Keyword(s):

Markov Decision Process ◽

Mobile Health ◽

Process Modeling ◽

Decision Process ◽

Cardiac Events ◽

Markov Decision ◽

Constrained Markov Decision Process

Download Full-text

Efficient PAC Reinforcement Learning in Regular Decision Processes

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/279 ◽

2021 ◽

Author(s):

Alessandro Ronca ◽

Giuseppe De Giacomo

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Polynomial Time ◽

Optimal Policy ◽

Decision Process ◽

Transition Function ◽

Decision Processes ◽

Reward Function ◽

Markov Decision ◽

Reward Functions

Recently regular decision processes have been proposed as a well-behaved form of non-Markov decision process. Regular decision processes are characterised by a transition function and a reward function that depend on the whole history, though regularly (as in regular languages). In practice both the transition and the reward functions can be seen as finite transducers. We study reinforcement learning in regular decision processes. Our main contribution is to show that a near-optimal policy can be PAC-learned in polynomial time in a set of parameters that describe the underlying decision process. We argue that the identified set of parameters is minimal and it reasonably captures the difficulty of a regular decision process.

Download Full-text

Balancing Long Lifetime and Satisfying Fairness in WBAN Using a Constrained Markov Decision Process

International Journal of Antennas and Propagation ◽

10.1155/2015/657854 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 5

Author(s):

Yingqi Yin ◽

Fengye Hu ◽

Ling Cen ◽

Yu Du ◽

Lu Wang

Keyword(s):

Markov Decision Process ◽

Network Lifetime ◽

Decision Process ◽

Scheduling Algorithm ◽

Wireless Body Area Network ◽

Sensor Nodes ◽

Distributed Scheduling ◽

Area Network ◽

Markov Decision ◽

Constrained Markov Decision Process

As an important part of the Internet of Things (IOT) and the special case of device-to-device (D2D) communication, wireless body area network (WBAN) gradually becomes the focus of attention. Since WBAN is a body-centered network, the energy of sensor nodes is strictly restrained since they are supplied by battery with limited power. In each data collection, only one sensor node is scheduled to transmit its measurements directly to the access point (AP) through the fading channel. We formulate the problem of dynamically choosing which sensor should communicate with the AP to maximize network lifetime under the constraint of fairness as a constrained markov decision process (CMDP). The optimal lifetime and optimal policy are obtained by Bellman equation in dynamic programming. The proposed algorithm defines the limiting performance in WBAN lifetime under different degrees of fairness constraints. Due to the defect of large implementation overhead in acquiring global channel state information (CSI), we put forward a distributed scheduling algorithm that adopts local CSI, which saves the network overhead and simplifies the algorithm. It was demonstrated via simulation that this scheduling algorithm can allocate time slot reasonably under different channel conditions to balance the performances of network lifetime and fairness.

Download Full-text

A Markov-Based Multi-Attribute Vertical Handoff Decision Algorithm

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.785-786.1403 ◽

2013 ◽

Vol 785-786 ◽

pp. 1403-1407

Author(s):

Qing Yang Song ◽

Xun Li ◽

Shu Yu Ding ◽

Zhao Long Ning

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Decision Problem ◽

Vertical Handoff ◽

Decision Algorithm ◽

Reward Function ◽

Total Reward ◽

Call Dropping ◽

Markov Decision ◽

The Impact

Many vertical handoff decision algorithms have not considered the impact of call dropping during the vertical handoff decision process. Besides, most of current multi-attribute vertical handoff algorithms cannot predict users’ specific circumstances dynamically. In this paper, we formulate the vertical handoff decision problem as a Markov decision process, with the objective of maximizing the expected total reward during the handoff procedure. A reward function is formulated to assess the service quality during each connection. The G1 and entropy methods are applied in an iterative way, by which we can work out a stationary deterministic policy. Numerical results demonstrate the superiority of our proposed algorithm compared with the existing methods.

Download Full-text