A case for core-assisted bottleneck acceleration in GPUs: Enabling flexible data compression with assist warps

Nandita Vijaykumar, Gennady Pekhimenko, Adwait Jog, Abhishek Bhowmick, Rachata Ausavarungnirun, Chita Das, Mahmut Kandemir, Todd C. Mowry, Onur Mutlu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

47 Scopus citations

Abstract

Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, different bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the cores. For example, when a GPU is bottlenecked by the available off-chip memory bandwidth, its computational resources are often overwhelmingly idle, waiting for data from memory to arrive. This paper introduces the Core-Assisted Bottleneck Acceleration (CABA) framework that employs idle on-chip resources to alleviate different bottlenecks in GPU execution. CABA provides flexible mechanisms to automatically generate "assist warps" that execute on GPU cores to perform specific tasks that can improve GPU performance and efficiency. CABA enables the use of idle computational units and pipelines to alleviate the memory bandwidth bottleneck, e.g., by using assist warps to perform data compression to transfer less data from memory. Conversely, the same framework can be employed to handle cases where the GPU is bottlenecked by the available computational units, in which case the memory pipelines are idle and can be used by CABA to speed up computation, e.g., by performing memoization using assist warps. We provide a comprehensive design and evaluation of CABA to perform effective and flexible data compression in the GPU memory hierarchy to alleviate the memory bandwidth bottleneck. Our extensive evaluations show that CABA, when used to implement data compression, provides an average performance improvement of 41.7% (as high as 2.6X) across a variety of memory-bandwidth-sensitive GPGPU applications.

Original languageEnglish (US)
Title of host publicationISCA 2015 - 42nd Annual International Symposium on Computer Architecture, Conference Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages41-53
Number of pages13
ISBN (Electronic)9781450334020
DOIs
StatePublished - Jun 13 2015
Event42nd Annual International Symposium on Computer Architecture, ISCA 2015 - Portland, United States
Duration: Jun 13 2015Jun 17 2015

Publication series

NameProceedings - International Symposium on Computer Architecture
Volume13-17-June-2015
ISSN (Print)1063-6897

Other

Other42nd Annual International Symposium on Computer Architecture, ISCA 2015
CountryUnited States
CityPortland
Period6/13/156/17/15

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture

Cite this

Vijaykumar, N., Pekhimenko, G., Jog, A., Bhowmick, A., Ausavarungnirun, R., Das, C., Kandemir, M., Mowry, T. C., & Mutlu, O. (2015). A case for core-assisted bottleneck acceleration in GPUs: Enabling flexible data compression with assist warps. In ISCA 2015 - 42nd Annual International Symposium on Computer Architecture, Conference Proceedings (pp. 41-53). (Proceedings - International Symposium on Computer Architecture; Vol. 13-17-June-2015). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1145/2749469.2750399