The goal of a phase I dose-finding trial is to determine the dose level of a new drug with acceptable toxicity. The optimal dose level is determined by sequentially allocating patients to increasing dose levels while monitoring any safety concerns. In practice, multiple toxicity types may be of interest and with varying degrees of importance of each toxicity type. To address this, scoring systems have been developed and conventional adaptive designs, such as the continual reassessment method (CRM), have accordingly been modified to handle them. In this article, we consider how to model the dose-finding problem under the multiarmed bandit framework, which naturally embeds the tradeoff between exploring the toxicity of dose levels and exploiting the current information to optimize benefit. We then propose a Bayesian multiarmed bandit design, dubbed quasi-likelihood optimistic bandit (QLOB), which has desirable operating characteristics, including allocation of patients to the dose level which has an estimated toxicity score closest to the target level and is relatively less explored. In extensive simulation studies, it is demonstrated that QLOB outperformed toxicity-score-based designs, such as quasi-CRM (QCRM), and general Bayesian optimal interval (gBOIN) in most scenarios considered; and performed much better than the conventional CRM and “3 + 3” designs with respect to dose recommendation and patient allocation. In addition, our design is shown to be robust against misspecification of the relevant hyper-parameter, and to have improved performance as the number of enrolled patients increases.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Pharmaceutical Science