TY - GEN
T1 - Returning is believing
T2 - 26th ACM International Conference on Information and Knowledge Management, CIKM 2017
AU - Wu, Qingyun
AU - Wang, Hongning
AU - Hong, Liangjie
AU - Shi, Yue
N1 - Funding Information:
Œe authors would like to thank anonymous reviewers for their helpful comments suggestions. Œis work was supported by the National Science Foundation under grant IIS-1553568 and IIS-1618948.
Publisher Copyright:
© 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM.
PY - 2017/11/6
Y1 - 2017/11/6
N2 - In this work, we propose to improve long-term user engagement in a recommender system from the perspective of sequential decision optimization, where users' click and return behaviors are directly modeled for online optimization. A bandit-based solution is formulated to balance three competing factors during online learning, including exploitation for immediate click, exploitation for expected future clicks, and exploration of unknowns for model estimation. We rigorously prove that with a high probability our proposed solution achieves a sublinear upper regret bound in maximizing cumulative clicks from a population of users in a given period of time, while a linear regret is inevitable if a user's temporal return behavior is not considered when making the recommendations. Extensive experimentation on both simulations and a large-scale real-world dataset collected from Yahoo frontpage news recommendation log verified the effectiveness and significant improvement of our proposed algorithm compared with several state-of-the-art online learning baselines for recommendation.
AB - In this work, we propose to improve long-term user engagement in a recommender system from the perspective of sequential decision optimization, where users' click and return behaviors are directly modeled for online optimization. A bandit-based solution is formulated to balance three competing factors during online learning, including exploitation for immediate click, exploitation for expected future clicks, and exploration of unknowns for model estimation. We rigorously prove that with a high probability our proposed solution achieves a sublinear upper regret bound in maximizing cumulative clicks from a population of users in a given period of time, while a linear regret is inevitable if a user's temporal return behavior is not considered when making the recommendations. Extensive experimentation on both simulations and a large-scale real-world dataset collected from Yahoo frontpage news recommendation log verified the effectiveness and significant improvement of our proposed algorithm compared with several state-of-the-art online learning baselines for recommendation.
UR - http://www.scopus.com/inward/record.url?scp=85037345528&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85037345528&partnerID=8YFLogxK
U2 - 10.1145/3132847.3133025
DO - 10.1145/3132847.3133025
M3 - Conference contribution
AN - SCOPUS:85037345528
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1927
EP - 1936
BT - CIKM 2017 - Proceedings of the 2017 ACM Conference on Information and Knowledge Management
PB - Association for Computing Machinery
Y2 - 6 November 2017 through 10 November 2017
ER -