A regret lower bound for assortment optimization under the capacitated MNL model with arbitrary revenue parameters
Abstract In this note, we consider dynamic assortment optimization with incomplete information under the capacitated multinomial logit choice model. Recently, it has been shown that the regret (the cumulative expected revenue loss caused by offering suboptimal assortments) that any decision policy endures is bounded from below by a constant times $\sqrt {NT}$ , where $N$ denotes the number of products and $T$ denotes the time horizon. This result is shown under the assumption that the product revenues are constant, and thus leaves the question open whether a lower regret rate can be achieved for nonconstant revenue parameters. In this note, we show that this is not the case: we show that, for any vector of product revenues there is a positive constant such that the regret of any policy is bounded from below by this constant times $\sqrt {N T}$ . Our result implies that policies that achieve ${{\mathcal {O}}}(\sqrt {NT})$ regret are asymptotically optimal for all product revenue parameters.