scholarly journals Optimizing Discount and Reputation Trade-Offs in E-Commerce Systems: Characterization and Online Learning

Author(s):  
Hong Xie ◽  
Yongkun Li ◽  
John C. S. Lui

Feedback-based reputation systems are widely deployed in E-commerce systems. Evidences showed that earning a reputable label (for sellers of such systems) may take a substantial amount of time and this implies a reduction of profit. We propose to enhance sellers’ reputation via price discounts. However, the challenges are: (1) The demands from buyers depend on both the discount and reputation; (2) The demands are unknown to the seller. To address these challenges, we first formulate a profit maximization problem via a semiMarkov decision process (SMDP) to explore the optimal trade-offs in selecting price discounts. We prove the monotonicity of the optimal profit and optimal discount. Based on the monotonicity, we design a QLFP (Q-learning with forward projection) algorithm, which infers the optimal discount from historical transaction data. We conduct experiments on a dataset from to show that our QLFP algorithm improves the profit by as high as 50% over both the classical Q-learning and speedy Q-learning algorithm. Our QLFP algorithm also improves the profit by as high as four times over the case of not providing any price discount.

2009 ◽  
Vol 28 (12) ◽  
pp. 3268-3270
Author(s):  
Chao WANG ◽  
Jing GUO ◽  
Zhen-qiang BAO

2021 ◽  
Vol 11 (14) ◽  
pp. 6401
Author(s):  
Kateryna Czerniachowska ◽  
Karina Sachpazidu-Wójcicka ◽  
Piotr Sulikowski ◽  
Marcin Hernes ◽  
Artur Rot

This paper discusses the problem of retailers’ profit maximization regarding displaying products on the planogram shelves, which may have different dimensions in each store but allocate the same product sets. We develop a mathematical model and a genetic algorithm for solving the shelf space allocation problem with the criteria of retailers’ profit maximization. The implemented program executes in a reasonable time. The quality of the genetic algorithm has been evaluated using the CPLEX solver. We determine four groups of constraints for the products that should be allocated on a shelf: shelf constraints, shelf type constraints, product constraints, and virtual segment constraints. The validity of the developed genetic algorithm has been checked on 25 retailing test cases. Computational results prove that the proposed approach allows for obtaining efficient results in short running time, and the developed complex shelf space allocation model, which considers multiple attributes of a shelf, segment, and product, as well as product capping and nesting allocation rule, is of high practical relevance. The proposed approach allows retailers to receive higher store profits with regard to the actual merchandising rules.


Aerospace ◽  
2021 ◽  
Vol 8 (4) ◽  
pp. 113
Author(s):  
Pedro Andrade ◽  
Catarina Silva ◽  
Bernardete Ribeiro ◽  
Bruno F. Santos

This paper presents a Reinforcement Learning (RL) approach to optimize the long-term scheduling of maintenance for an aircraft fleet. The problem considers fleet status, maintenance capacity, and other maintenance constraints to schedule hangar checks for a specified time horizon. The checks are scheduled within an interval, and the goal is to, schedule them as close as possible to their due date. In doing so, the number of checks is reduced, and the fleet availability increases. A Deep Q-learning algorithm is used to optimize the scheduling policy. The model is validated in a real scenario using maintenance data from 45 aircraft. The maintenance plan that is generated with our approach is compared with a previous study, which presented a Dynamic Programming (DP) based approach and airline estimations for the same period. The results show a reduction in the number of checks scheduled, which indicates the potential of RL in solving this problem. The adaptability of RL is also tested by introducing small disturbances in the initial conditions. After training the model with these simulated scenarios, the results show the robustness of the RL approach and its ability to generate efficient maintenance plans in only a few seconds.


Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 737
Author(s):  
Fengjie Sun ◽  
Xianchang Wang ◽  
Rui Zhang

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.


Sign in / Sign up

Export Citation Format

Share Document