discounted rewards Latest Research Papers

Introduction: Non-monetary rewards are non-financial measures that a merchant or a seller realigns with customer values to attract and retain more customers. This involves providing value to customers in other ways than discount and dollars rewards. Depending on the customer’s values, and on the industry, customers may find more value in non-monetary or discounted rewards. Purpose: The overall objective of the study was to investigate the effect of non-monetary programs in the financial performance of selected firms in the service industry in Kenya. Methodology: The research design adopted for the study was descriptive research design. The study explored major users of non-monetary programs in Kenya including: the telecommunication firms, supermarkets, 18 five-star hotels in Kenya, Kenya airport authority and fueling station in Kenya. The target population was three (3) telecommunication firms (Safaricom, Airtel and Telkom Kenya), 5 large supermarkets and 18 Five Star hotels in Nairobi. Since the population of telecommunication firm is small the study used the census survey method and thus there was no sampling. The researcher used both descriptive and inferential statistics. Findings: The results show that non-monetary programs have a positive and significant relationship with financial performance. The study concludes that non-monetary programs have a positive and significant effect on financial performance of selected service industries in Kenya. Recommendation: Communication Authority of Kenya, Tourism Authority of Kenya and the ministry of trade should support the development and usage of monetary loyalty programs among service industries firms in Kenya. This can be done in friendly manner such as avoiding overly broad and strong regulation of the loyalty programs. In this regard, the government and the law makers should ensure that they involve a variety of loyalty programs stakeholders in the regulatory process, so that their vision and needs can be fairly balanced with government interests. The government should work closely with loyalty programs businesses, users, miners and advocates when creating and enforcing law.

Download Full-text

Reducing Entropy Overestimation in Soft Actor Critic Using Dual Policy Network

Wireless Communications and Mobile Computing ◽

10.1155/2021/9920591 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Hamid Ali ◽

Hammad Majeed ◽

Imran Usman ◽

Khaled A. Almejalli

Keyword(s):

Continuous Control ◽

Policy Network ◽

Minimum Entropy ◽

Local Optima ◽

Optimization Framework ◽

Simulated Environments ◽

Entropy Term ◽

Discounted Rewards ◽

Policy Optimization ◽

The Given

In reinforcement learning (RL), an agent learns an environment through hit and trail. This behavior allows the agent to learn in complex and difficult environments. In RL, the agent normally learns the given environment by exploring or exploiting. Most of the algorithms suffer from under exploration in the latter stage of the episodes. Recently, an off-policy algorithm called soft actor critic (SAC) is proposed that overcomes this problem by maximizing entropy as it learns the environment. In it, the agent tries to maximize entropy along with the expected discounted rewards. In SAC, the agent tries to be as random as possible while moving towards the maximum reward. This randomness allows the agent to explore the environment and stops it from getting stuck into local optima. We believe that maximizing the entropy causes the overestimation of entropy term which results in slow policy learning. This is because of the drastic change in action distribution whenever agent revisits the similar states. To overcome this problem, we propose a dual policy optimization framework, in which two independent policies are trained. Both the policies try to maximize entropy by choosing actions against the minimum entropy to reduce the overestimation. The use of two policies result in better and faster convergence. We demonstrate our approach on different well known continuous control simulated environments. Results show that our proposed technique achieves better results against state of the art SAC algorithm and learns better policies.

Download Full-text

Anticipatory Classifier System with Average Reward Criterion in Discretized Multi-Step Environments

Applied Sciences ◽

10.3390/app11031098 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1098

Author(s):

Norbert Kozłowski ◽

Olgierd Unold

Keyword(s):

Decision Problems ◽

Benchmark Problems ◽

Average Reward ◽

Sequential Decision ◽

Classifier Systems ◽

Q Learning ◽

Learning Classifier ◽

Discounted Rewards ◽

Average Reward Criterion ◽

Reward Criterion

Initially, Anticipatory Classifier Systems (ACS) were designed to address both single and multistep decision problems. In the latter case, the objective was to maximize the total discounted rewards, usually based on Q-learning algorithms. Studies on other Learning Classifier Systems (LCS) revealed many real-world sequential decision problems where the preferred objective is the maximization of the average of successive rewards. This paper proposes a relevant modification toward the learning component, allowing us to address such problems. The modified system is called AACS2 (Averaged ACS2) and is tested on three multistep benchmark problems.

Download Full-text

Markov Decision Processes with Discounted Rewards: New Action Elimination Procedure

Business Intelligence - Lecture Notes in Business Information Processing ◽

10.1007/978-3-030-76508-8_16 ◽

2021 ◽

pp. 223-238

Author(s):

Abdellatif Semmouri ◽

Mostafa Jourhmane ◽

Bahaa Eddine Elbaghazaoui

Keyword(s):

Markov Decision Processes ◽

Decision Processes ◽

Elimination Procedure ◽

Discounted Rewards ◽

Markov Decision

Download Full-text

Higher Behavioral Profile of Mindfulness and Psychological Flexibility is Related to Reduced Impulsivity in Smokers, and Reduced Risk Aversion Regardless of Smoking Status

Polish Psychological Bulletin ◽

10.1515/ppb-2017-0051 ◽

2017 ◽

Vol 48 (4) ◽

pp. 445-455

Author(s):

Przemysław Marcowski ◽

Wojciech Białaszek ◽

Joanna Dudek ◽

Paweł Ostaszewski

Keyword(s):

Behavioral Interventions ◽

Smoking Status ◽

Psychological Flexibility ◽

Probability Discounting ◽

Risk Averse ◽

Behavioral Profile ◽

Discounted Rewards ◽

Main Effect ◽

Relationship Of ◽

The Relationship

Abstract Empirical evidence suggests that mindfulness, psychological flexibility, and addiction are interrelated in decision making. In our study, we investigated the relationship of the behavioral profile, composed of mindfulness and psychological flexibility, and smoking status on delay and probability discounting. We demonstrated the interaction of the behavioral profile of mindfulness and psychological flexibility (lower or higher) and smoking status on delay discounting. We found that individuals who smoked and displayed higher mindfulness and psychological flexibility devalued rewards at a slower rate, compared to smokers with a lower profile. Importantly, in those with a higher profile, smokers discounted rewards no differently than nonsmokers. Smokers with a lower profile did display, however, increased impulsivity, compared to nonsmokers. These results suggest that behavioral interventions aiming to modify the behavioral profile with regard to mindfulness and psychological flexibility can indeed support the regulation of elevated impulsivity in smokers to equate with that of nonsmokers. In probability discounting, we observed that individuals with a higher profile displayed lower discounting rates, i.e., were less risk-averse, with no other significant main effect or interaction.

Download Full-text

Stochastic Comparison of Discounted Rewards

Journal of Applied Probability ◽

10.1017/s0021900200007786 ◽

2011 ◽

Vol 48 (01) ◽

pp. 293-294

Author(s):

Rhonda Righter

Keyword(s):

Stochastic Process ◽

Stopping Time ◽

Stochastic Comparison ◽

Convex Ordering ◽

Total Reward ◽

Discounted Rewards

It is well know that the expected exponentially discounted total reward for a stochastic process can also be defined as the expected total undiscounted reward earned before an independent exponential stopping time (let us call this the stopped reward). Feinberg and Fei (2009) recently showed that the variance of the discounted reward is smaller than the variance of the stopped reward. We strengthen this result to show that the discounted reward is smaller than the stopped reward in the convex ordering sense.

Download Full-text

Stochastic Comparison of Discounted Rewards

Journal of Applied Probability ◽

10.1239/jap/1300198151 ◽

2011 ◽

Vol 48 (1) ◽

pp. 293-294 ◽

Cited By ~ 1

Author(s):

Rhonda Righter

Keyword(s):

Stochastic Process ◽

Stopping Time ◽

Stochastic Comparison ◽

Convex Ordering ◽

Total Reward ◽

Discounted Rewards

It is well know that the expected exponentially discounted total reward for a stochastic process can also be defined as the expected total undiscounted reward earned before an independent exponential stopping time (let us call this the stopped reward). Feinberg and Fei (2009) recently showed that the variance of the discounted reward is smaller than the variance of the stopped reward. We strengthen this result to show that the discounted reward is smaller than the stopped reward in the convex ordering sense.

Download Full-text

An Inequality for Variances of the Discounted Rewards

Journal of Applied Probability ◽

10.1017/s0021900200006240 ◽

2009 ◽

Vol 46 (04) ◽

pp. 1209-1212 ◽

Cited By ~ 3

Author(s):

Eugene A. Feinberg ◽

Jun Fei

Keyword(s):

Exponential Distribution ◽

Stopping Time ◽

Discounted Rewards ◽

Multiplicative Coefficient

We consider the following two definitions of discounting: (i) multiplicative coefficient in front of the rewards, and (ii) probability that the process has not been stopped if the stopping time has an exponential distribution independent of the process. It is well known that the expected total discounted rewards corresponding to these definitions are the same. In this note we show that, the variance of the total discounted rewards is smaller for the first definition than for the second definition.

Download Full-text

An Inequality for Variances of the Discounted Rewards

Journal of Applied Probability ◽

10.1239/jap/1261670699 ◽

2009 ◽

Vol 46 (4) ◽

pp. 1209-1212 ◽

Cited By ~ 4

Author(s):

Eugene A. Feinberg ◽

Jun Fei

Keyword(s):

Exponential Distribution ◽

Stopping Time ◽

Discounted Rewards ◽

Multiplicative Coefficient

We consider the following two definitions of discounting: (i) multiplicative coefficient in front of the rewards, and (ii) probability that the process has not been stopped if the stopping time has an exponential distribution independent of the process. It is well known that the expected total discounted rewards corresponding to these definitions are the same. In this note we show that, the variance of the total discounted rewards is smaller for the first definition than for the second definition.

Download Full-text

discounted rewards
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Markov Decision Processes with Discounted Rewards: Improved Successive Over-Relaxation Method

Effect of Non-Monetary Programs on Financial Performance of Selected Firms in the Service Industry in Kenya.

Reducing Entropy Overestimation in Soft Actor Critic Using Dual Policy Network

Anticipatory Classifier System with Average Reward Criterion in Discretized Multi-Step Environments

Markov Decision Processes with Discounted Rewards: New Action Elimination Procedure

Higher Behavioral Profile of Mindfulness and Psychological Flexibility is Related to Reduced Impulsivity in Smokers, and Reduced Risk Aversion Regardless of Smoking Status

Stochastic Comparison of Discounted Rewards

Stochastic Comparison of Discounted Rewards

An Inequality for Variances of the Discounted Rewards

An Inequality for Variances of the Discounted Rewards

Export Citation Format

discounted rewardsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Markov Decision Processes with Discounted Rewards: Improved Successive Over-Relaxation Method

Effect of Non-Monetary Programs on Financial Performance of Selected Firms in the Service Industry in Kenya.

Reducing Entropy Overestimation in Soft Actor Critic Using Dual Policy Network

Anticipatory Classifier System with Average Reward Criterion in Discretized Multi-Step Environments

Markov Decision Processes with Discounted Rewards: New Action Elimination Procedure

Higher Behavioral Profile of Mindfulness and Psychological Flexibility is Related to Reduced Impulsivity in Smokers, and Reduced Risk Aversion Regardless of Smoking Status

Stochastic Comparison of Discounted Rewards

Stochastic Comparison of Discounted Rewards

An Inequality for Variances of the Discounted Rewards

An Inequality for Variances of the Discounted Rewards

discounted rewards
Recently Published Documents