Increasing GPU translation reach by leveraging under-utilized on-chip resources

Jagadish B. Kotra, Michael LeBeane, Mahmut T. Kandemir, Gabriel H. Loh

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Many GPU applications issue irregular memory accesses to a very large memory footprint. We confirm observations from prior work that these irregular access patterns are severely bottlenecked by insufficient Translation Lookaside Buffer (TLB) reach, resulting in expensive page table walks. In this work, we investigate mechanisms to improve TLB reach without increasing the page size or the size of the TLB itself. Our work is based around the observation that a GPU's instruction cache (I-cache) and Local Data Share (LDS) scratchpad memory are under-utilized in many applications, including those that suffer from poor TLB reach. We leverage this to opportunistically utilize idle capacity and port bandwidth from the GPU's I-cache and LDS structures for address translations. We explore various potential architectural designs for each structure to optimize performance and minimize complexity. Both structures are organized as a victim cache between the L1 and L2 TLBs to boost translation reach. We find that our designs can increase performance on average by 30.1% without impacting the performance of applications that do not require additional reach.

Original languageEnglish (US)
Title of host publicationMICRO 2021 - 54th Annual IEEE/ACM International Symposium on Microarchitecture, Proceedings
PublisherIEEE Computer Society
Number of pages13
ISBN (Electronic)9781450385572
StatePublished - Oct 18 2021
Event54th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2021 - Virtual, Online, Greece
Duration: Oct 18 2021Oct 22 2021

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
ISSN (Print)1072-4451


Conference54th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2021
CityVirtual, Online

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture

Cite this