Multi-Agent Reinforcement Learning for Efficient Content Caching in Mobile D2D Networks

Wei Jiang, Gang Feng, Shuang Qin, Tak Shing Peter Yum, Guohong Cao

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

To address the increase of multimedia traffic dominated by streaming videos, user equipment (UE) can collaboratively cache and share contents to alleviate the burden of base stations. Prior work on device-To-device (D2D) caching policies assumes perfect knowledge of the content popularity distribution. Since the content popularity distribution is usually unavailable in advance, a machine learning-based caching strategy that exploits the knowledge of content demand history would be highly promising. Thus, we design D2D caching strategies using multi-Agent reinforcement learning in this paper. Specifically, we model the D2D caching problem as a multi-Agent multi-Armed bandit problem and use Q-learning to learn how to coordinate the caching decisions. The UEs can be independent learners (ILs) if they learn the Q-values of their own actions, and joint action learners (JALs) if they learn the Q-values of their own actions in conjunction with those of the other UEs. As the action space is very vast leading to high computational complexity, a modified combinatorial upper confidence bound algorithm is proposed to reduce the action space for both IL and JAL. The simulation results show that the proposed JAL-based caching scheme outperforms the IL-based caching scheme and other popular caching schemes in terms of average downloading latency and cache hit rate.

Original languageEnglish (US)
Article number8629363
Pages (from-to)1610-1622
Number of pages13
JournalIEEE Transactions on Wireless Communications
Volume18
Issue number3
DOIs
StatePublished - Mar 1 2019

Fingerprint

Multiagent Learning
Video streaming
Reinforcement learning
Caching
Mobile Networks
Reinforcement Learning
Telecommunication traffic
Base stations
Learning systems
Computational complexity
Wireless networks
Content Distribution
Cache
Multi-armed Bandit
Bandit Problems
Confidence Bounds
Q-learning
Video Streaming
Hits
Multimedia

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Electrical and Electronic Engineering
  • Applied Mathematics

Cite this

Jiang, Wei ; Feng, Gang ; Qin, Shuang ; Yum, Tak Shing Peter ; Cao, Guohong. / Multi-Agent Reinforcement Learning for Efficient Content Caching in Mobile D2D Networks. In: IEEE Transactions on Wireless Communications. 2019 ; Vol. 18, No. 3. pp. 1610-1622.
@article{5c2fda6fd2b343f3ad837ab30a88ec54,
title = "Multi-Agent Reinforcement Learning for Efficient Content Caching in Mobile D2D Networks",
abstract = "To address the increase of multimedia traffic dominated by streaming videos, user equipment (UE) can collaboratively cache and share contents to alleviate the burden of base stations. Prior work on device-To-device (D2D) caching policies assumes perfect knowledge of the content popularity distribution. Since the content popularity distribution is usually unavailable in advance, a machine learning-based caching strategy that exploits the knowledge of content demand history would be highly promising. Thus, we design D2D caching strategies using multi-Agent reinforcement learning in this paper. Specifically, we model the D2D caching problem as a multi-Agent multi-Armed bandit problem and use Q-learning to learn how to coordinate the caching decisions. The UEs can be independent learners (ILs) if they learn the Q-values of their own actions, and joint action learners (JALs) if they learn the Q-values of their own actions in conjunction with those of the other UEs. As the action space is very vast leading to high computational complexity, a modified combinatorial upper confidence bound algorithm is proposed to reduce the action space for both IL and JAL. The simulation results show that the proposed JAL-based caching scheme outperforms the IL-based caching scheme and other popular caching schemes in terms of average downloading latency and cache hit rate.",
author = "Wei Jiang and Gang Feng and Shuang Qin and Yum, {Tak Shing Peter} and Guohong Cao",
year = "2019",
month = "3",
day = "1",
doi = "10.1109/TWC.2019.2894403",
language = "English (US)",
volume = "18",
pages = "1610--1622",
journal = "IEEE Transactions on Wireless Communications",
issn = "1536-1276",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "3",

}

Multi-Agent Reinforcement Learning for Efficient Content Caching in Mobile D2D Networks. / Jiang, Wei; Feng, Gang; Qin, Shuang; Yum, Tak Shing Peter; Cao, Guohong.

In: IEEE Transactions on Wireless Communications, Vol. 18, No. 3, 8629363, 01.03.2019, p. 1610-1622.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Multi-Agent Reinforcement Learning for Efficient Content Caching in Mobile D2D Networks

AU - Jiang, Wei

AU - Feng, Gang

AU - Qin, Shuang

AU - Yum, Tak Shing Peter

AU - Cao, Guohong

PY - 2019/3/1

Y1 - 2019/3/1

N2 - To address the increase of multimedia traffic dominated by streaming videos, user equipment (UE) can collaboratively cache and share contents to alleviate the burden of base stations. Prior work on device-To-device (D2D) caching policies assumes perfect knowledge of the content popularity distribution. Since the content popularity distribution is usually unavailable in advance, a machine learning-based caching strategy that exploits the knowledge of content demand history would be highly promising. Thus, we design D2D caching strategies using multi-Agent reinforcement learning in this paper. Specifically, we model the D2D caching problem as a multi-Agent multi-Armed bandit problem and use Q-learning to learn how to coordinate the caching decisions. The UEs can be independent learners (ILs) if they learn the Q-values of their own actions, and joint action learners (JALs) if they learn the Q-values of their own actions in conjunction with those of the other UEs. As the action space is very vast leading to high computational complexity, a modified combinatorial upper confidence bound algorithm is proposed to reduce the action space for both IL and JAL. The simulation results show that the proposed JAL-based caching scheme outperforms the IL-based caching scheme and other popular caching schemes in terms of average downloading latency and cache hit rate.

AB - To address the increase of multimedia traffic dominated by streaming videos, user equipment (UE) can collaboratively cache and share contents to alleviate the burden of base stations. Prior work on device-To-device (D2D) caching policies assumes perfect knowledge of the content popularity distribution. Since the content popularity distribution is usually unavailable in advance, a machine learning-based caching strategy that exploits the knowledge of content demand history would be highly promising. Thus, we design D2D caching strategies using multi-Agent reinforcement learning in this paper. Specifically, we model the D2D caching problem as a multi-Agent multi-Armed bandit problem and use Q-learning to learn how to coordinate the caching decisions. The UEs can be independent learners (ILs) if they learn the Q-values of their own actions, and joint action learners (JALs) if they learn the Q-values of their own actions in conjunction with those of the other UEs. As the action space is very vast leading to high computational complexity, a modified combinatorial upper confidence bound algorithm is proposed to reduce the action space for both IL and JAL. The simulation results show that the proposed JAL-based caching scheme outperforms the IL-based caching scheme and other popular caching schemes in terms of average downloading latency and cache hit rate.

UR - http://www.scopus.com/inward/record.url?scp=85063002213&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063002213&partnerID=8YFLogxK

U2 - 10.1109/TWC.2019.2894403

DO - 10.1109/TWC.2019.2894403

M3 - Article

AN - SCOPUS:85063002213

VL - 18

SP - 1610

EP - 1622

JO - IEEE Transactions on Wireless Communications

JF - IEEE Transactions on Wireless Communications

SN - 1536-1276

IS - 3

M1 - 8629363

ER -