μc-States: Fine-grained GPU Datapath Power Management

Onur Kayiran, Adwait Jog, Ashutosh Pattnaik, Rachata Ausavarungnirun, Xulong Tang, Mahmut T. Kandemir, Gabriel H. Loh, Onur Mutlu, Chita R. Das

Research output: Contribution to journalConference article

19 Citations (Scopus)

Abstract

To improve the performance of Graphics Processing Units (GPUs) beyond simply increasing core count, architects are recently adopting a scale-up approach: the peak throughput and individual capabilities of the GPU cores are increasing rapidly. This big-core trend in GPUs leads to various challenges, including higher static power consumption and lower and imbalanced utilization of the datapath components of a big core. As we show in this paper, two key problems ensue: (1) the lower and imbalanced datapath utilization can waste power as an application does not always utilize all portions of the big core datapath, and (2) the use of big cores can lead to application performance degradation in some cases due to the higher memory system contention caused by the more memory requests generated by each big core. This paper introduces a new analysis of datapath component utilization in big-core GPUs based on queuing theory principles. Building on this analysis, we introduce a fine-grained dynamic power-and clock-gating mechanism for the entire datapath, called C-States, which aims to minimize power consumption by turning off or tuning-down datapath components that are not bottlenecks for the performance of the running application. Our experimental evaluation demonstrates that C-States significantly reduces both static and dynamic power consumption in a big-core GPU, while also significantly improving the performance of applications affected by high memory system contention. We also show that our analysis of datapath component utilization can guide scheduling and design decisions in a GPU architecture that contains heterogeneous cores.

Original languageEnglish (US)
Pages (from-to)17-30
Number of pages14
JournalParallel Architectures and Compilation Techniques - Conference Proceedings, PACT
DOIs
StatePublished - Jan 1 2016
Event25th International Conference on Parallel Architectures and Compilation Techniques, PACT 2016 - Haifa, Israel
Duration: Sep 11 2016Sep 15 2016

Fingerprint

Power Management
Graphics Processing Unit
Power Consumption
Electric power utilization
Contention
Data storage equipment
Waste utilization
Queuing Theory
Scale-up
Experimental Evaluation
Power management
Graphics processing unit
Clocks
Tuning
Count
Degradation
Computer systems
Throughput
Scheduling
Entire

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture

Cite this

Kayiran, Onur ; Jog, Adwait ; Pattnaik, Ashutosh ; Ausavarungnirun, Rachata ; Tang, Xulong ; Kandemir, Mahmut T. ; Loh, Gabriel H. ; Mutlu, Onur ; Das, Chita R. / μc-States : Fine-grained GPU Datapath Power Management. In: Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT. 2016 ; pp. 17-30.
@article{c2e103016e7f437c996485e900b1101e,
title = "μc-States: Fine-grained GPU Datapath Power Management",
abstract = "To improve the performance of Graphics Processing Units (GPUs) beyond simply increasing core count, architects are recently adopting a scale-up approach: the peak throughput and individual capabilities of the GPU cores are increasing rapidly. This big-core trend in GPUs leads to various challenges, including higher static power consumption and lower and imbalanced utilization of the datapath components of a big core. As we show in this paper, two key problems ensue: (1) the lower and imbalanced datapath utilization can waste power as an application does not always utilize all portions of the big core datapath, and (2) the use of big cores can lead to application performance degradation in some cases due to the higher memory system contention caused by the more memory requests generated by each big core. This paper introduces a new analysis of datapath component utilization in big-core GPUs based on queuing theory principles. Building on this analysis, we introduce a fine-grained dynamic power-and clock-gating mechanism for the entire datapath, called C-States, which aims to minimize power consumption by turning off or tuning-down datapath components that are not bottlenecks for the performance of the running application. Our experimental evaluation demonstrates that C-States significantly reduces both static and dynamic power consumption in a big-core GPU, while also significantly improving the performance of applications affected by high memory system contention. We also show that our analysis of datapath component utilization can guide scheduling and design decisions in a GPU architecture that contains heterogeneous cores.",
author = "Onur Kayiran and Adwait Jog and Ashutosh Pattnaik and Rachata Ausavarungnirun and Xulong Tang and Kandemir, {Mahmut T.} and Loh, {Gabriel H.} and Onur Mutlu and Das, {Chita R.}",
year = "2016",
month = "1",
day = "1",
doi = "10.1145/2967938.2967941",
language = "English (US)",
pages = "17--30",
journal = "Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT",
issn = "1089-795X",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

μc-States : Fine-grained GPU Datapath Power Management. / Kayiran, Onur; Jog, Adwait; Pattnaik, Ashutosh; Ausavarungnirun, Rachata; Tang, Xulong; Kandemir, Mahmut T.; Loh, Gabriel H.; Mutlu, Onur; Das, Chita R.

In: Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT, 01.01.2016, p. 17-30.

Research output: Contribution to journalConference article

TY - JOUR

T1 - μc-States

T2 - Fine-grained GPU Datapath Power Management

AU - Kayiran, Onur

AU - Jog, Adwait

AU - Pattnaik, Ashutosh

AU - Ausavarungnirun, Rachata

AU - Tang, Xulong

AU - Kandemir, Mahmut T.

AU - Loh, Gabriel H.

AU - Mutlu, Onur

AU - Das, Chita R.

PY - 2016/1/1

Y1 - 2016/1/1

N2 - To improve the performance of Graphics Processing Units (GPUs) beyond simply increasing core count, architects are recently adopting a scale-up approach: the peak throughput and individual capabilities of the GPU cores are increasing rapidly. This big-core trend in GPUs leads to various challenges, including higher static power consumption and lower and imbalanced utilization of the datapath components of a big core. As we show in this paper, two key problems ensue: (1) the lower and imbalanced datapath utilization can waste power as an application does not always utilize all portions of the big core datapath, and (2) the use of big cores can lead to application performance degradation in some cases due to the higher memory system contention caused by the more memory requests generated by each big core. This paper introduces a new analysis of datapath component utilization in big-core GPUs based on queuing theory principles. Building on this analysis, we introduce a fine-grained dynamic power-and clock-gating mechanism for the entire datapath, called C-States, which aims to minimize power consumption by turning off or tuning-down datapath components that are not bottlenecks for the performance of the running application. Our experimental evaluation demonstrates that C-States significantly reduces both static and dynamic power consumption in a big-core GPU, while also significantly improving the performance of applications affected by high memory system contention. We also show that our analysis of datapath component utilization can guide scheduling and design decisions in a GPU architecture that contains heterogeneous cores.

AB - To improve the performance of Graphics Processing Units (GPUs) beyond simply increasing core count, architects are recently adopting a scale-up approach: the peak throughput and individual capabilities of the GPU cores are increasing rapidly. This big-core trend in GPUs leads to various challenges, including higher static power consumption and lower and imbalanced utilization of the datapath components of a big core. As we show in this paper, two key problems ensue: (1) the lower and imbalanced datapath utilization can waste power as an application does not always utilize all portions of the big core datapath, and (2) the use of big cores can lead to application performance degradation in some cases due to the higher memory system contention caused by the more memory requests generated by each big core. This paper introduces a new analysis of datapath component utilization in big-core GPUs based on queuing theory principles. Building on this analysis, we introduce a fine-grained dynamic power-and clock-gating mechanism for the entire datapath, called C-States, which aims to minimize power consumption by turning off or tuning-down datapath components that are not bottlenecks for the performance of the running application. Our experimental evaluation demonstrates that C-States significantly reduces both static and dynamic power consumption in a big-core GPU, while also significantly improving the performance of applications affected by high memory system contention. We also show that our analysis of datapath component utilization can guide scheduling and design decisions in a GPU architecture that contains heterogeneous cores.

UR - http://www.scopus.com/inward/record.url?scp=84989291136&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84989291136&partnerID=8YFLogxK

U2 - 10.1145/2967938.2967941

DO - 10.1145/2967938.2967941

M3 - Conference article

AN - SCOPUS:84989291136

SP - 17

EP - 30

JO - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT

JF - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT

SN - 1089-795X

ER -