We develop the first nonparametric learning algorithm for periodic-review perishable inventory systems. In contrast to the classical perishable inventory literature, we assume that the firm does not know the demand distribution a priori and makes replenishment decisions in each period based only on the past sales (censored demand) data. It is well known that even with complete information about the demand distribution a priori, the optimal policy for this problem does not possess a simple structure. Motivated by the studies in the literature showing that base-stock policies perform near optimal in these systems, we focus on finding the best base-stock policy. We first establish a convexity result, showing that the total holding, lost sales and outdating cost is convex in the base-stock level. Then, we develop a nonparametric learning algorithm that generates a sequence of order-up-to levels whose running average cost converges to the cost of the optimal base-stock policy. We establish a square-root convergence rate of the proposed algorithm, which is the best possible. Our algorithm and analyses require a novel method for computing a valid cycle subgradient and the construction of a bridging problem, which significantly departs from previous studies.
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Management Science and Operations Research