Minimax concave penalized multi-armed bandit model with high-dimensional convariates

Xue Wang, Mike Mingcheng Wei, Tao Yao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we propose a Minimax Concave Penalized Multi-Armed Bandit (MCP-Bandit) algorithm for a decision-maker facing highdimensional data with latent sparse structure in an online learning and decision-making process. We demonstrate that the MCP-Bandit algorithm asymptotically achieves the optimal cumulative regret in sample size T, O(logT), and further attains a tighter bound in both covariates dimension d and the number of significant covariates s, O(s2 (s + log d)). In addition, we develop a linear approximation method, the 2-step Weighted Lasso procedure, to identify the MCP estimator for the MCP-Bandit algorithm under non-i.i.d. samples. Using this procedure, the MCP estimator matches the oracle estimator with high probability. Finally, we present two experiments to benchmark our proposed the MCP-Bandit algorithm to other bandit algorithms. Both experiments demonstrate that the MCP-Bandit algorithm performs favorably over other benchmark algorithms, especially when there is a high level of data sparsity or when the sample size is not too small.

Original languageEnglish (US)
Title of host publication35th International Conference on Machine Learning, ICML 2018
EditorsJennifer Dy, Andreas Krause
PublisherInternational Machine Learning Society (IMLS)
Pages8249-8266
Number of pages18
ISBN (Electronic)9781510867963
StatePublished - Jan 1 2018
Event35th International Conference on Machine Learning, ICML 2018 - Stockholm, Sweden
Duration: Jul 10 2018Jul 15 2018

Publication series

Name35th International Conference on Machine Learning, ICML 2018
Volume12

Other

Other35th International Conference on Machine Learning, ICML 2018
CountrySweden
CityStockholm
Period7/10/187/15/18

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Fingerprint Dive into the research topics of 'Minimax concave penalized multi-armed bandit model with high-dimensional convariates'. Together they form a unique fingerprint.

  • Cite this

    Wang, X., Wei, M. M., & Yao, T. (2018). Minimax concave penalized multi-armed bandit model with high-dimensional convariates. In J. Dy, & A. Krause (Eds.), 35th International Conference on Machine Learning, ICML 2018 (pp. 8249-8266). (35th International Conference on Machine Learning, ICML 2018; Vol. 12). International Machine Learning Society (IMLS).