Reinforcement learning algorithms for adaptive cyber defense against heartbleed

Minghui Zhu, Zhisheng Hu, Peng Liu

Research output: Contribution to journalConference article

14 Scopus citations


In this paper, we investigate a model where a defender and an attacker simultaneously and repeatedly adjust the defenses and attacks. Under this model, we propose two iterative reinforcement learning algorithms which allow the defender to identify optimal defenses when the information about the attacker is limited. With probability one, the adaptive reinforcement learning algorithm converges to the best response with respect to the attacks when the attacker diminishingly explores the system. With a probability arbitrarily close to one, the robust reinforcement learning algorithm converges to the min-max strategy despite that the attacker persistently explores the system. The algorithm convergence is formally proven and the algorithm performance is verified via numerical simulations.

Original languageEnglish (US)
Pages (from-to)51-58
Number of pages8
JournalProceedings of the ACM Conference on Computer and Communications Security
Issue numberNovember
StatePublished - Nov 7 2014
Event1st ACM Workshop on Moving Target Defense, MTD 2014 - Co-located with 21st ACM Conference on Computer and Communications Security, CCS 2014 - Scottsdale, United States
Duration: Nov 3 2014 → …


All Science Journal Classification (ASJC) codes

  • Software
  • Computer Networks and Communications

Cite this