Cost-aware learning and optimization for opportunistic spectrum access

Chao Gan, Ruida Zhou, Jing Yang, Cong Shen

Research output: Contribution to journalArticle

Abstract

In this paper, we investigate cost-aware joint learning and optimization for multi-channel opportunistic spectrum access in a cognitive radio system. We investigate a discrete-time model where the time axis is partitioned into frames. Each frame consists of a sensing phase, followed by a transmission phase. During the sensing phase, the user is able to sense a subset of channels sequentially before it decides to use one of them in the following transmission phase. We assume the channel states alternate between busy and idle according to independent Bernoulli random processes from frame to frame. To capture the inherent uncertainty in channel sensing, we assume the reward of each transmission when the channel is idle is a random variable. We also associate random costs with sensing and transmission actions. Our objective is to understand how the costs and reward of the actions would affect the optimal behavior of the user in both offline and online settings, and design the corresponding opportunistic spectrum access strategies to maximize the expected cumulative net reward (i.e., reward-minus-cost). We start with an offline setting where the statistics of the channel status, costs, and reward are known beforehand. We show that the optimal policy exhibits a recursive double-threshold structure, and the user needs to compare the channel statistics with those thresholds sequentially in order to decide its actions. With such insights, we then study the online setting, where the statistical information of the channels, costs and reward are unknown a priori. We judiciously balance exploration and exploitation, and show that the cumulative regret scales in O (log T). We also establish a matched lower bound, which implies that our online algorithm is order-optimal. Simulation results corroborate our theoretical analysis.

Original languageEnglish (US)
Article number8570777
Pages (from-to)15-27
Number of pages13
JournalIEEE Transactions on Cognitive Communications and Networking
Volume5
Issue number1
DOIs
StatePublished - Mar 1 2019

Fingerprint

Costs
Statistics
Radio systems
Cognitive radio
Random processes
Random variables
Uncertainty

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture
  • Computer Networks and Communications
  • Artificial Intelligence

Cite this

@article{389b55f65e604248af97985b7c2acb54,
title = "Cost-aware learning and optimization for opportunistic spectrum access",
abstract = "In this paper, we investigate cost-aware joint learning and optimization for multi-channel opportunistic spectrum access in a cognitive radio system. We investigate a discrete-time model where the time axis is partitioned into frames. Each frame consists of a sensing phase, followed by a transmission phase. During the sensing phase, the user is able to sense a subset of channels sequentially before it decides to use one of them in the following transmission phase. We assume the channel states alternate between busy and idle according to independent Bernoulli random processes from frame to frame. To capture the inherent uncertainty in channel sensing, we assume the reward of each transmission when the channel is idle is a random variable. We also associate random costs with sensing and transmission actions. Our objective is to understand how the costs and reward of the actions would affect the optimal behavior of the user in both offline and online settings, and design the corresponding opportunistic spectrum access strategies to maximize the expected cumulative net reward (i.e., reward-minus-cost). We start with an offline setting where the statistics of the channel status, costs, and reward are known beforehand. We show that the optimal policy exhibits a recursive double-threshold structure, and the user needs to compare the channel statistics with those thresholds sequentially in order to decide its actions. With such insights, we then study the online setting, where the statistical information of the channels, costs and reward are unknown a priori. We judiciously balance exploration and exploitation, and show that the cumulative regret scales in O (log T). We also establish a matched lower bound, which implies that our online algorithm is order-optimal. Simulation results corroborate our theoretical analysis.",
author = "Chao Gan and Ruida Zhou and Jing Yang and Cong Shen",
year = "2019",
month = "3",
day = "1",
doi = "10.1109/TCCN.2018.2885790",
language = "English (US)",
volume = "5",
pages = "15--27",
journal = "IEEE Transactions on Cognitive Communications and Networking",
issn = "2332-7731",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "1",

}

Cost-aware learning and optimization for opportunistic spectrum access. / Gan, Chao; Zhou, Ruida; Yang, Jing; Shen, Cong.

In: IEEE Transactions on Cognitive Communications and Networking, Vol. 5, No. 1, 8570777, 01.03.2019, p. 15-27.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Cost-aware learning and optimization for opportunistic spectrum access

AU - Gan, Chao

AU - Zhou, Ruida

AU - Yang, Jing

AU - Shen, Cong

PY - 2019/3/1

Y1 - 2019/3/1

N2 - In this paper, we investigate cost-aware joint learning and optimization for multi-channel opportunistic spectrum access in a cognitive radio system. We investigate a discrete-time model where the time axis is partitioned into frames. Each frame consists of a sensing phase, followed by a transmission phase. During the sensing phase, the user is able to sense a subset of channels sequentially before it decides to use one of them in the following transmission phase. We assume the channel states alternate between busy and idle according to independent Bernoulli random processes from frame to frame. To capture the inherent uncertainty in channel sensing, we assume the reward of each transmission when the channel is idle is a random variable. We also associate random costs with sensing and transmission actions. Our objective is to understand how the costs and reward of the actions would affect the optimal behavior of the user in both offline and online settings, and design the corresponding opportunistic spectrum access strategies to maximize the expected cumulative net reward (i.e., reward-minus-cost). We start with an offline setting where the statistics of the channel status, costs, and reward are known beforehand. We show that the optimal policy exhibits a recursive double-threshold structure, and the user needs to compare the channel statistics with those thresholds sequentially in order to decide its actions. With such insights, we then study the online setting, where the statistical information of the channels, costs and reward are unknown a priori. We judiciously balance exploration and exploitation, and show that the cumulative regret scales in O (log T). We also establish a matched lower bound, which implies that our online algorithm is order-optimal. Simulation results corroborate our theoretical analysis.

AB - In this paper, we investigate cost-aware joint learning and optimization for multi-channel opportunistic spectrum access in a cognitive radio system. We investigate a discrete-time model where the time axis is partitioned into frames. Each frame consists of a sensing phase, followed by a transmission phase. During the sensing phase, the user is able to sense a subset of channels sequentially before it decides to use one of them in the following transmission phase. We assume the channel states alternate between busy and idle according to independent Bernoulli random processes from frame to frame. To capture the inherent uncertainty in channel sensing, we assume the reward of each transmission when the channel is idle is a random variable. We also associate random costs with sensing and transmission actions. Our objective is to understand how the costs and reward of the actions would affect the optimal behavior of the user in both offline and online settings, and design the corresponding opportunistic spectrum access strategies to maximize the expected cumulative net reward (i.e., reward-minus-cost). We start with an offline setting where the statistics of the channel status, costs, and reward are known beforehand. We show that the optimal policy exhibits a recursive double-threshold structure, and the user needs to compare the channel statistics with those thresholds sequentially in order to decide its actions. With such insights, we then study the online setting, where the statistical information of the channels, costs and reward are unknown a priori. We judiciously balance exploration and exploitation, and show that the cumulative regret scales in O (log T). We also establish a matched lower bound, which implies that our online algorithm is order-optimal. Simulation results corroborate our theoretical analysis.

UR - http://www.scopus.com/inward/record.url?scp=85058183635&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058183635&partnerID=8YFLogxK

U2 - 10.1109/TCCN.2018.2885790

DO - 10.1109/TCCN.2018.2885790

M3 - Article

AN - SCOPUS:85058183635

VL - 5

SP - 15

EP - 27

JO - IEEE Transactions on Cognitive Communications and Networking

JF - IEEE Transactions on Cognitive Communications and Networking

SN - 2332-7731

IS - 1

M1 - 8570777

ER -