TY - GEN
T1 - Minimax concave penalized multi-armed bandit model with high-dimensional convariates
AU - Wang, Xue
AU - Wei, Mike Mingcheng
AU - Yao, Tao
PY - 2018/1/1
Y1 - 2018/1/1
N2 - In this paper, we propose a Minimax Concave Penalized Multi-Armed Bandit (MCP-Bandit) algorithm for a decision-maker facing highdimensional data with latent sparse structure in an online learning and decision-making process. We demonstrate that the MCP-Bandit algorithm asymptotically achieves the optimal cumulative regret in sample size T, O(logT), and further attains a tighter bound in both covariates dimension d and the number of significant covariates s, O(s2 (s + log d)). In addition, we develop a linear approximation method, the 2-step Weighted Lasso procedure, to identify the MCP estimator for the MCP-Bandit algorithm under non-i.i.d. samples. Using this procedure, the MCP estimator matches the oracle estimator with high probability. Finally, we present two experiments to benchmark our proposed the MCP-Bandit algorithm to other bandit algorithms. Both experiments demonstrate that the MCP-Bandit algorithm performs favorably over other benchmark algorithms, especially when there is a high level of data sparsity or when the sample size is not too small.
AB - In this paper, we propose a Minimax Concave Penalized Multi-Armed Bandit (MCP-Bandit) algorithm for a decision-maker facing highdimensional data with latent sparse structure in an online learning and decision-making process. We demonstrate that the MCP-Bandit algorithm asymptotically achieves the optimal cumulative regret in sample size T, O(logT), and further attains a tighter bound in both covariates dimension d and the number of significant covariates s, O(s2 (s + log d)). In addition, we develop a linear approximation method, the 2-step Weighted Lasso procedure, to identify the MCP estimator for the MCP-Bandit algorithm under non-i.i.d. samples. Using this procedure, the MCP estimator matches the oracle estimator with high probability. Finally, we present two experiments to benchmark our proposed the MCP-Bandit algorithm to other bandit algorithms. Both experiments demonstrate that the MCP-Bandit algorithm performs favorably over other benchmark algorithms, especially when there is a high level of data sparsity or when the sample size is not too small.
UR - http://www.scopus.com/inward/record.url?scp=85057295901&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85057295901&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85057295901
T3 - 35th International Conference on Machine Learning, ICML 2018
SP - 8249
EP - 8266
BT - 35th International Conference on Machine Learning, ICML 2018
A2 - Dy, Jennifer
A2 - Krause, Andreas
PB - International Machine Learning Society (IMLS)
T2 - 35th International Conference on Machine Learning, ICML 2018
Y2 - 10 July 2018 through 15 July 2018
ER -