A framework for accelerating bottlenecks in GPU execution with assist warps

N. Vijaykumar, G. Pekhimenko, A. Jog, S. Ghose, A. Bhowmick, R. Ausavarungnirun, C. Das, M. Kandemir, T. C. Mowry, O. Mutlu

Research output: Chapter in Book/Report/Conference proceedingChapter

3 Citations (Scopus)

Abstract

Modern graphics processing units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, different bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the cores. For example, when a GPU is bottlenecked by the available off-chip memory bandwidth, its computational resources are often overwhelmingly idle, waiting for data from memory to arrive.This chapter describes the Core-Assisted Bottleneck Acceleration (CABA) framework that employs idle on-chip resources to alleviate different bottlenecks in GPU execution. CABA provides flexible mechanisms to automatically generate "assist warps" that execute on GPU cores to perform specific tasks that can improve GPU performance and efficiency.CABA enables the use of idle computational units and pipelines to alleviate the memory bandwidth bottleneck, for example, by using assist warps to perform data compression to transfer less data from memory. Conversely, the same framework can be employed to handle cases where the GPU is bottlenecked by the available computational units, in which case, the memory pipelines are idle and can be used by CABA to speed up computation, for example, by performing memoization using assist warps.We provide a comprehensive design and evaluation of CABA to perform effective and flexible data compression in the GPU memory hierarchy to alleviate the memory bandwidth bottleneck. Our extensive evaluations show that CABA, when used to implement data compression, provides on average a performance improvement of 41.7% (as high as 2.6×) across a variety of memory-bandwidth-sensitive general-purpose GPU applications.We believe that CABA is a flexible framework that enables the use of idle resources to improve application performance with different optimizations and to perform other useful tasks. We discuss how CABA can be used, for example, for memoization, prefetching, handling interrupts, profiling, redundant multithreading, and speculative precomputation.

Original languageEnglish (US)
Title of host publicationAdvances in GPU Research and Practice
PublisherElsevier Inc.
Pages372-415
Number of pages44
ISBN (Electronic)9780128037881
ISBN (Print)9780128037386
DOIs
StatePublished - Sep 8 2016

Fingerprint

Data storage equipment
Data compression
Bandwidth
Pipelines
Graphics processing unit
Data transfer

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Cite this

Vijaykumar, N., Pekhimenko, G., Jog, A., Ghose, S., Bhowmick, A., Ausavarungnirun, R., ... Mutlu, O. (2016). A framework for accelerating bottlenecks in GPU execution with assist warps. In Advances in GPU Research and Practice (pp. 372-415). Elsevier Inc.. https://doi.org/10.1016/B978-0-12-803738-6.00015-X
Vijaykumar, N. ; Pekhimenko, G. ; Jog, A. ; Ghose, S. ; Bhowmick, A. ; Ausavarungnirun, R. ; Das, C. ; Kandemir, M. ; Mowry, T. C. ; Mutlu, O. / A framework for accelerating bottlenecks in GPU execution with assist warps. Advances in GPU Research and Practice. Elsevier Inc., 2016. pp. 372-415
@inbook{f61d73eae4af445aae87a37f627a00de,
title = "A framework for accelerating bottlenecks in GPU execution with assist warps",
abstract = "Modern graphics processing units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, different bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the cores. For example, when a GPU is bottlenecked by the available off-chip memory bandwidth, its computational resources are often overwhelmingly idle, waiting for data from memory to arrive.This chapter describes the Core-Assisted Bottleneck Acceleration (CABA) framework that employs idle on-chip resources to alleviate different bottlenecks in GPU execution. CABA provides flexible mechanisms to automatically generate {"}assist warps{"} that execute on GPU cores to perform specific tasks that can improve GPU performance and efficiency.CABA enables the use of idle computational units and pipelines to alleviate the memory bandwidth bottleneck, for example, by using assist warps to perform data compression to transfer less data from memory. Conversely, the same framework can be employed to handle cases where the GPU is bottlenecked by the available computational units, in which case, the memory pipelines are idle and can be used by CABA to speed up computation, for example, by performing memoization using assist warps.We provide a comprehensive design and evaluation of CABA to perform effective and flexible data compression in the GPU memory hierarchy to alleviate the memory bandwidth bottleneck. Our extensive evaluations show that CABA, when used to implement data compression, provides on average a performance improvement of 41.7{\%} (as high as 2.6×) across a variety of memory-bandwidth-sensitive general-purpose GPU applications.We believe that CABA is a flexible framework that enables the use of idle resources to improve application performance with different optimizations and to perform other useful tasks. We discuss how CABA can be used, for example, for memoization, prefetching, handling interrupts, profiling, redundant multithreading, and speculative precomputation.",
author = "N. Vijaykumar and G. Pekhimenko and A. Jog and S. Ghose and A. Bhowmick and R. Ausavarungnirun and C. Das and M. Kandemir and Mowry, {T. C.} and O. Mutlu",
year = "2016",
month = "9",
day = "8",
doi = "10.1016/B978-0-12-803738-6.00015-X",
language = "English (US)",
isbn = "9780128037386",
pages = "372--415",
booktitle = "Advances in GPU Research and Practice",
publisher = "Elsevier Inc.",
address = "United States",

}

Vijaykumar, N, Pekhimenko, G, Jog, A, Ghose, S, Bhowmick, A, Ausavarungnirun, R, Das, C, Kandemir, M, Mowry, TC & Mutlu, O 2016, A framework for accelerating bottlenecks in GPU execution with assist warps. in Advances in GPU Research and Practice. Elsevier Inc., pp. 372-415. https://doi.org/10.1016/B978-0-12-803738-6.00015-X

A framework for accelerating bottlenecks in GPU execution with assist warps. / Vijaykumar, N.; Pekhimenko, G.; Jog, A.; Ghose, S.; Bhowmick, A.; Ausavarungnirun, R.; Das, C.; Kandemir, M.; Mowry, T. C.; Mutlu, O.

Advances in GPU Research and Practice. Elsevier Inc., 2016. p. 372-415.

Research output: Chapter in Book/Report/Conference proceedingChapter

TY - CHAP

T1 - A framework for accelerating bottlenecks in GPU execution with assist warps

AU - Vijaykumar, N.

AU - Pekhimenko, G.

AU - Jog, A.

AU - Ghose, S.

AU - Bhowmick, A.

AU - Ausavarungnirun, R.

AU - Das, C.

AU - Kandemir, M.

AU - Mowry, T. C.

AU - Mutlu, O.

PY - 2016/9/8

Y1 - 2016/9/8

N2 - Modern graphics processing units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, different bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the cores. For example, when a GPU is bottlenecked by the available off-chip memory bandwidth, its computational resources are often overwhelmingly idle, waiting for data from memory to arrive.This chapter describes the Core-Assisted Bottleneck Acceleration (CABA) framework that employs idle on-chip resources to alleviate different bottlenecks in GPU execution. CABA provides flexible mechanisms to automatically generate "assist warps" that execute on GPU cores to perform specific tasks that can improve GPU performance and efficiency.CABA enables the use of idle computational units and pipelines to alleviate the memory bandwidth bottleneck, for example, by using assist warps to perform data compression to transfer less data from memory. Conversely, the same framework can be employed to handle cases where the GPU is bottlenecked by the available computational units, in which case, the memory pipelines are idle and can be used by CABA to speed up computation, for example, by performing memoization using assist warps.We provide a comprehensive design and evaluation of CABA to perform effective and flexible data compression in the GPU memory hierarchy to alleviate the memory bandwidth bottleneck. Our extensive evaluations show that CABA, when used to implement data compression, provides on average a performance improvement of 41.7% (as high as 2.6×) across a variety of memory-bandwidth-sensitive general-purpose GPU applications.We believe that CABA is a flexible framework that enables the use of idle resources to improve application performance with different optimizations and to perform other useful tasks. We discuss how CABA can be used, for example, for memoization, prefetching, handling interrupts, profiling, redundant multithreading, and speculative precomputation.

AB - Modern graphics processing units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, different bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the cores. For example, when a GPU is bottlenecked by the available off-chip memory bandwidth, its computational resources are often overwhelmingly idle, waiting for data from memory to arrive.This chapter describes the Core-Assisted Bottleneck Acceleration (CABA) framework that employs idle on-chip resources to alleviate different bottlenecks in GPU execution. CABA provides flexible mechanisms to automatically generate "assist warps" that execute on GPU cores to perform specific tasks that can improve GPU performance and efficiency.CABA enables the use of idle computational units and pipelines to alleviate the memory bandwidth bottleneck, for example, by using assist warps to perform data compression to transfer less data from memory. Conversely, the same framework can be employed to handle cases where the GPU is bottlenecked by the available computational units, in which case, the memory pipelines are idle and can be used by CABA to speed up computation, for example, by performing memoization using assist warps.We provide a comprehensive design and evaluation of CABA to perform effective and flexible data compression in the GPU memory hierarchy to alleviate the memory bandwidth bottleneck. Our extensive evaluations show that CABA, when used to implement data compression, provides on average a performance improvement of 41.7% (as high as 2.6×) across a variety of memory-bandwidth-sensitive general-purpose GPU applications.We believe that CABA is a flexible framework that enables the use of idle resources to improve application performance with different optimizations and to perform other useful tasks. We discuss how CABA can be used, for example, for memoization, prefetching, handling interrupts, profiling, redundant multithreading, and speculative precomputation.

UR - http://www.scopus.com/inward/record.url?scp=85027130746&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85027130746&partnerID=8YFLogxK

U2 - 10.1016/B978-0-12-803738-6.00015-X

DO - 10.1016/B978-0-12-803738-6.00015-X

M3 - Chapter

AN - SCOPUS:85027130746

SN - 9780128037386

SP - 372

EP - 415

BT - Advances in GPU Research and Practice

PB - Elsevier Inc.

ER -

Vijaykumar N, Pekhimenko G, Jog A, Ghose S, Bhowmick A, Ausavarungnirun R et al. A framework for accelerating bottlenecks in GPU execution with assist warps. In Advances in GPU Research and Practice. Elsevier Inc. 2016. p. 372-415 https://doi.org/10.1016/B978-0-12-803738-6.00015-X