Cost-aware cascading bandits

Ruida Zhou, Chao Gan, Jing Yang, Cong Shen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

In this paper, we propose a cost-aware cascading bandits model, a new variant of multi-armed bandits with cascading feedback, by considering the random cost of pulling arms. In each step, the learning agent chooses an ordered list of items and examines them sequentially, until certain stopping condition is satisfied. Our objective is then to maximize the expected net reward in each step, i.e., the reward obtained in each step minus the total cost incurred in examining the items, by deciding the ordered list of items, as well as when to stop examination. We study both the offline and online settings, depending on whether the state and cost statistics of the items are known beforehand. For the offline setting, we show that the Unit Cost Ranking with Threshold 1 (UCR-T1) policy is optimal. For the online setting, we propose a Cost-aware Cascading Upper Confidence Bound (CC-UCB) algorithm, and show that the cumulative regret scales in O(log T). We also provide a lower bound for all α-consistent policies, which scales in Ω(log T) and matches our upper bound. The performance of the CC-UCB algorithm is evaluated with both synthetic and real-world data.

Original languageEnglish (US)
Title of host publicationProceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018
EditorsJerome Lang
PublisherInternational Joint Conferences on Artificial Intelligence
Pages3228-3234
Number of pages7
ISBN (Electronic)9780999241127
DOIs
StatePublished - 2018
Event27th International Joint Conference on Artificial Intelligence, IJCAI 2018 - Stockholm, Sweden
Duration: Jul 13 2018Jul 19 2018

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
Volume2018-July
ISSN (Print)1045-0823

Other

Other27th International Joint Conference on Artificial Intelligence, IJCAI 2018
CountrySweden
CityStockholm
Period7/13/187/19/18

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Cost-aware cascading bandits'. Together they form a unique fingerprint.

  • Cite this

    Zhou, R., Gan, C., Yang, J., & Shen, C. (2018). Cost-aware cascading bandits. In J. Lang (Ed.), Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018 (pp. 3228-3234). (IJCAI International Joint Conference on Artificial Intelligence; Vol. 2018-July). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2018/448