scholarly journals Controllable Summarization with Constrained Markov Decision Process

2021 ◽  
Vol 9 ◽  
pp. 1213-1232
Author(s):  
Hou Pong Chan ◽  
Lu Wang ◽  
Irwin King

Abstract We study controllable text summarization, which allows users to gain control on a particular attribute (e.g., length limit) of the generated summaries. In this work, we propose a novel training framework based on Constrained Markov Decision Process (CMDP), which conveniently includes a reward function along with a set of constraints, to facilitate better summarization control. The reward function encourages the generation to resemble the human-written reference, while the constraints are used to explicitly prevent the generated summaries from violating user-imposed requirements. Our framework can be applied to control important attributes of summarization, including length, covered entities, and abstractiveness, as we devise specific constraints for each of these aspects. Extensive experiments on popular benchmarks show that our CMDP framework helps generate informative summaries while complying with a given attribute’s requirement.1

2013 ◽  
Vol 30 (05) ◽  
pp. 1350014 ◽  
Author(s):  
ZHICONG ZHANG ◽  
WEIPING WANG ◽  
SHOUYAN ZHONG ◽  
KAISHUN HU

Reinforcement learning (RL) is a state or action value based machine learning method which solves large-scale multi-stage decision problems such as Markov Decision Process (MDP) and Semi-Markov Decision Process (SMDP) problems. We minimize the makespan of flow shop scheduling problems with an RL algorithm. We convert flow shop scheduling problems into SMDPs by constructing elaborate state features, actions and the reward function. Minimizing the accumulated reward is equivalent to minimizing the schedule objective function. We apply on-line TD(λ) algorithm with linear gradient-descent function approximation to solve the SMDPs. To examine the performance of the proposed RL algorithm, computational experiments are conducted on benchmarking problems in comparison with other scheduling methods. The experimental results support the efficiency of the proposed algorithm and illustrate that the RL approach is a promising computational approach for flow shop scheduling problems worthy of further investigation.


Author(s):  
Alessandro Ronca ◽  
Giuseppe De Giacomo

Recently regular decision processes have been proposed as a well-behaved form of non-Markov decision process. Regular decision processes are characterised by a transition function and a reward function that depend on the whole history, though regularly (as in regular languages). In practice both the transition and the reward functions can be seen as finite transducers. We study reinforcement learning in regular decision processes. Our main contribution is to show that a near-optimal policy can be PAC-learned in polynomial time in a set of parameters that describe the underlying decision process. We argue that the identified set of parameters is minimal and it reasonably captures the difficulty of a regular decision process.


2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Yingqi Yin ◽  
Fengye Hu ◽  
Ling Cen ◽  
Yu Du ◽  
Lu Wang

As an important part of the Internet of Things (IOT) and the special case of device-to-device (D2D) communication, wireless body area network (WBAN) gradually becomes the focus of attention. Since WBAN is a body-centered network, the energy of sensor nodes is strictly restrained since they are supplied by battery with limited power. In each data collection, only one sensor node is scheduled to transmit its measurements directly to the access point (AP) through the fading channel. We formulate the problem of dynamically choosing which sensor should communicate with the AP to maximize network lifetime under the constraint of fairness as a constrained markov decision process (CMDP). The optimal lifetime and optimal policy are obtained by Bellman equation in dynamic programming. The proposed algorithm defines the limiting performance in WBAN lifetime under different degrees of fairness constraints. Due to the defect of large implementation overhead in acquiring global channel state information (CSI), we put forward a distributed scheduling algorithm that adopts local CSI, which saves the network overhead and simplifies the algorithm. It was demonstrated via simulation that this scheduling algorithm can allocate time slot reasonably under different channel conditions to balance the performances of network lifetime and fairness.


2013 ◽  
Vol 785-786 ◽  
pp. 1403-1407
Author(s):  
Qing Yang Song ◽  
Xun Li ◽  
Shu Yu Ding ◽  
Zhao Long Ning

Many vertical handoff decision algorithms have not considered the impact of call dropping during the vertical handoff decision process. Besides, most of current multi-attribute vertical handoff algorithms cannot predict users’ specific circumstances dynamically. In this paper, we formulate the vertical handoff decision problem as a Markov decision process, with the objective of maximizing the expected total reward during the handoff procedure. A reward function is formulated to assess the service quality during each connection. The G1 and entropy methods are applied in an iterative way, by which we can work out a stationary deterministic policy. Numerical results demonstrate the superiority of our proposed algorithm compared with the existing methods.


Sign in / Sign up

Export Citation Format

Share Document