discounted rewards
Recently Published Documents


TOTAL DOCUMENTS

25
(FIVE YEARS 5)

H-INDEX

8
(FIVE YEARS 0)

2021 ◽  
Vol 6 (1) ◽  
pp. 56-70
Author(s):  
John Kiarie ◽  
Gabriel Kirori ◽  
David Wachira

Introduction: Non-monetary rewards are non-financial measures that a merchant or a seller realigns with customer values to attract and retain more customers. This involves providing value to customers in other ways than discount and dollars rewards. Depending on the customer’s values, and on the industry, customers may find more value in non-monetary or discounted rewards. Purpose: The overall objective of the study was to investigate the effect of non-monetary programs in the financial performance of selected firms in the service industry in Kenya. Methodology: The research design adopted for the study was descriptive research design. The study explored major users of non-monetary programs in Kenya including: the telecommunication firms, supermarkets, 18 five-star hotels in Kenya, Kenya airport authority and fueling station in Kenya. The target population was three (3) telecommunication firms (Safaricom, Airtel and Telkom Kenya), 5 large supermarkets and 18 Five Star hotels in Nairobi.  Since the population of telecommunication firm is small the study used the census survey method and thus there was no sampling. The researcher used both descriptive and inferential statistics. Findings: The results show that non-monetary programs have a positive and significant relationship with financial performance. The study concludes that non-monetary programs have a positive and significant effect on financial performance of selected service industries in Kenya. Recommendation: Communication Authority of Kenya, Tourism Authority of Kenya and the ministry of trade should support the development and usage of monetary loyalty programs among service industries firms in Kenya. This can be done in friendly manner such as avoiding overly broad and strong regulation of the loyalty programs. In this regard, the government and the law makers should ensure that they involve a variety of loyalty programs stakeholders in the regulatory process, so that their vision and needs can be fairly balanced with government interests. The government should work closely with loyalty programs businesses, users, miners and advocates when creating and enforcing law.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Hamid Ali ◽  
Hammad Majeed ◽  
Imran Usman ◽  
Khaled A. Almejalli

In reinforcement learning (RL), an agent learns an environment through hit and trail. This behavior allows the agent to learn in complex and difficult environments. In RL, the agent normally learns the given environment by exploring or exploiting. Most of the algorithms suffer from under exploration in the latter stage of the episodes. Recently, an off-policy algorithm called soft actor critic (SAC) is proposed that overcomes this problem by maximizing entropy as it learns the environment. In it, the agent tries to maximize entropy along with the expected discounted rewards. In SAC, the agent tries to be as random as possible while moving towards the maximum reward. This randomness allows the agent to explore the environment and stops it from getting stuck into local optima. We believe that maximizing the entropy causes the overestimation of entropy term which results in slow policy learning. This is because of the drastic change in action distribution whenever agent revisits the similar states. To overcome this problem, we propose a dual policy optimization framework, in which two independent policies are trained. Both the policies try to maximize entropy by choosing actions against the minimum entropy to reduce the overestimation. The use of two policies result in better and faster convergence. We demonstrate our approach on different well known continuous control simulated environments. Results show that our proposed technique achieves better results against state of the art SAC algorithm and learns better policies.


2021 ◽  
Vol 11 (3) ◽  
pp. 1098
Author(s):  
Norbert Kozłowski ◽  
Olgierd Unold

Initially, Anticipatory Classifier Systems (ACS) were designed to address both single and multistep decision problems. In the latter case, the objective was to maximize the total discounted rewards, usually based on Q-learning algorithms. Studies on other Learning Classifier Systems (LCS) revealed many real-world sequential decision problems where the preferred objective is the maximization of the average of successive rewards. This paper proposes a relevant modification toward the learning component, allowing us to address such problems. The modified system is called AACS2 (Averaged ACS2) and is tested on three multistep benchmark problems.


2017 ◽  
Vol 48 (4) ◽  
pp. 445-455
Author(s):  
Przemysław Marcowski ◽  
Wojciech Białaszek ◽  
Joanna Dudek ◽  
Paweł Ostaszewski

Abstract Empirical evidence suggests that mindfulness, psychological flexibility, and addiction are interrelated in decision making. In our study, we investigated the relationship of the behavioral profile, composed of mindfulness and psychological flexibility, and smoking status on delay and probability discounting. We demonstrated the interaction of the behavioral profile of mindfulness and psychological flexibility (lower or higher) and smoking status on delay discounting. We found that individuals who smoked and displayed higher mindfulness and psychological flexibility devalued rewards at a slower rate, compared to smokers with a lower profile. Importantly, in those with a higher profile, smokers discounted rewards no differently than nonsmokers. Smokers with a lower profile did display, however, increased impulsivity, compared to nonsmokers. These results suggest that behavioral interventions aiming to modify the behavioral profile with regard to mindfulness and psychological flexibility can indeed support the regulation of elevated impulsivity in smokers to equate with that of nonsmokers. In probability discounting, we observed that individuals with a higher profile displayed lower discounting rates, i.e., were less risk-averse, with no other significant main effect or interaction.


2011 ◽  
Vol 48 (01) ◽  
pp. 293-294
Author(s):  
Rhonda Righter

It is well know that the expected exponentially discounted total reward for a stochastic process can also be defined as the expected total undiscounted reward earned before an independent exponential stopping time (let us call this the stopped reward). Feinberg and Fei (2009) recently showed that the variance of the discounted reward is smaller than the variance of the stopped reward. We strengthen this result to show that the discounted reward is smaller than the stopped reward in the convex ordering sense.


2011 ◽  
Vol 48 (1) ◽  
pp. 293-294 ◽  
Author(s):  
Rhonda Righter

It is well know that the expected exponentially discounted total reward for a stochastic process can also be defined as the expected total undiscounted reward earned before an independent exponential stopping time (let us call this the stopped reward). Feinberg and Fei (2009) recently showed that the variance of the discounted reward is smaller than the variance of the stopped reward. We strengthen this result to show that the discounted reward is smaller than the stopped reward in the convex ordering sense.


2009 ◽  
Vol 46 (04) ◽  
pp. 1209-1212 ◽  
Author(s):  
Eugene A. Feinberg ◽  
Jun Fei

We consider the following two definitions of discounting: (i) multiplicative coefficient in front of the rewards, and (ii) probability that the process has not been stopped if the stopping time has an exponential distribution independent of the process. It is well known that the expected total discounted rewards corresponding to these definitions are the same. In this note we show that, the variance of the total discounted rewards is smaller for the first definition than for the second definition.


2009 ◽  
Vol 46 (4) ◽  
pp. 1209-1212 ◽  
Author(s):  
Eugene A. Feinberg ◽  
Jun Fei

We consider the following two definitions of discounting: (i) multiplicative coefficient in front of the rewards, and (ii) probability that the process has not been stopped if the stopping time has an exponential distribution independent of the process. It is well known that the expected total discounted rewards corresponding to these definitions are the same. In this note we show that, the variance of the total discounted rewards is smaller for the first definition than for the second definition.


Sign in / Sign up

Export Citation Format

Share Document