RDIP

Return-address-stack directed instruction prefetching

Aasheesh Kolli, Ali Saidi, Thomas F. Wenisch

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Citations (Scopus)

Abstract

L1 instruction fetch misses remain a critical performance bottleneck, accounting for up to 40% slowdowns in server applications. Whereas instruction footprints typically fit within last-level caches, they overwhelm L1 caches, whose capacity is limited by latency constraints. Past work has shown that server application instruction miss sequences are highly repetitive. By recording, indexing, and prefetching according to these sequences, nearly all L1 instruction misses can be eliminated. However, existing schemes require impractical storage and considerable complexity to correct for minor control-flow variations that disrupt sequences. In this work, we simplify and reduce the energy requirements of accurate instruction prefetching via two observations: (1) program context as captured in the call stack correlates strongly with L1 instruction misses, and (2) the return address stack (RAS), already present in all high performance processors, succinctly summarizes program context. We propose RAS-Directed Instruction Prefetching (RDIP), which associates prefetch operations with signatures formed from the contents of the RAS. RDIP achieves 70% of the potential speedup of an ideal L1 cache, outperforms a prefetcherless baseline by 11.5% and reduces energy and complexity relative to sequence-based prefetching. RDIP's performance is within 2% of the state-of-the-art Proactive Instruction Fetch, with nearly 3X reduction in storage and 1.9X reduction in energy overheads.

Original languageEnglish (US)
Title of host publicationMICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Pages260-271
Number of pages12
DOIs
StatePublished - Dec 1 2013
Event46th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2013 - Davis, CA, United States
Duration: Dec 7 2013Dec 11 2013

Publication series

NameMICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Conference

Conference46th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2013
CountryUnited States
CityDavis, CA
Period12/7/1312/11/13

Fingerprint

Servers
Flow control

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture
  • Electrical and Electronic Engineering

Cite this

Kolli, A., Saidi, A., & Wenisch, T. F. (2013). RDIP: Return-address-stack directed instruction prefetching. In MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (pp. 260-271). (MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture). https://doi.org/10.1145/2540708.2540731
Kolli, Aasheesh ; Saidi, Ali ; Wenisch, Thomas F. / RDIP : Return-address-stack directed instruction prefetching. MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. 2013. pp. 260-271 (MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture).
@inproceedings{c310c7842c1a486e874a86d129ce0ca5,
title = "RDIP: Return-address-stack directed instruction prefetching",
abstract = "L1 instruction fetch misses remain a critical performance bottleneck, accounting for up to 40{\%} slowdowns in server applications. Whereas instruction footprints typically fit within last-level caches, they overwhelm L1 caches, whose capacity is limited by latency constraints. Past work has shown that server application instruction miss sequences are highly repetitive. By recording, indexing, and prefetching according to these sequences, nearly all L1 instruction misses can be eliminated. However, existing schemes require impractical storage and considerable complexity to correct for minor control-flow variations that disrupt sequences. In this work, we simplify and reduce the energy requirements of accurate instruction prefetching via two observations: (1) program context as captured in the call stack correlates strongly with L1 instruction misses, and (2) the return address stack (RAS), already present in all high performance processors, succinctly summarizes program context. We propose RAS-Directed Instruction Prefetching (RDIP), which associates prefetch operations with signatures formed from the contents of the RAS. RDIP achieves 70{\%} of the potential speedup of an ideal L1 cache, outperforms a prefetcherless baseline by 11.5{\%} and reduces energy and complexity relative to sequence-based prefetching. RDIP's performance is within 2{\%} of the state-of-the-art Proactive Instruction Fetch, with nearly 3X reduction in storage and 1.9X reduction in energy overheads.",
author = "Aasheesh Kolli and Ali Saidi and Wenisch, {Thomas F.}",
year = "2013",
month = "12",
day = "1",
doi = "10.1145/2540708.2540731",
language = "English (US)",
isbn = "9781450326384",
series = "MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture",
pages = "260--271",
booktitle = "MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture",

}

Kolli, A, Saidi, A & Wenisch, TF 2013, RDIP: Return-address-stack directed instruction prefetching. in MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 260-271, 46th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2013, Davis, CA, United States, 12/7/13. https://doi.org/10.1145/2540708.2540731

RDIP : Return-address-stack directed instruction prefetching. / Kolli, Aasheesh; Saidi, Ali; Wenisch, Thomas F.

MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. 2013. p. 260-271 (MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - RDIP

T2 - Return-address-stack directed instruction prefetching

AU - Kolli, Aasheesh

AU - Saidi, Ali

AU - Wenisch, Thomas F.

PY - 2013/12/1

Y1 - 2013/12/1

N2 - L1 instruction fetch misses remain a critical performance bottleneck, accounting for up to 40% slowdowns in server applications. Whereas instruction footprints typically fit within last-level caches, they overwhelm L1 caches, whose capacity is limited by latency constraints. Past work has shown that server application instruction miss sequences are highly repetitive. By recording, indexing, and prefetching according to these sequences, nearly all L1 instruction misses can be eliminated. However, existing schemes require impractical storage and considerable complexity to correct for minor control-flow variations that disrupt sequences. In this work, we simplify and reduce the energy requirements of accurate instruction prefetching via two observations: (1) program context as captured in the call stack correlates strongly with L1 instruction misses, and (2) the return address stack (RAS), already present in all high performance processors, succinctly summarizes program context. We propose RAS-Directed Instruction Prefetching (RDIP), which associates prefetch operations with signatures formed from the contents of the RAS. RDIP achieves 70% of the potential speedup of an ideal L1 cache, outperforms a prefetcherless baseline by 11.5% and reduces energy and complexity relative to sequence-based prefetching. RDIP's performance is within 2% of the state-of-the-art Proactive Instruction Fetch, with nearly 3X reduction in storage and 1.9X reduction in energy overheads.

AB - L1 instruction fetch misses remain a critical performance bottleneck, accounting for up to 40% slowdowns in server applications. Whereas instruction footprints typically fit within last-level caches, they overwhelm L1 caches, whose capacity is limited by latency constraints. Past work has shown that server application instruction miss sequences are highly repetitive. By recording, indexing, and prefetching according to these sequences, nearly all L1 instruction misses can be eliminated. However, existing schemes require impractical storage and considerable complexity to correct for minor control-flow variations that disrupt sequences. In this work, we simplify and reduce the energy requirements of accurate instruction prefetching via two observations: (1) program context as captured in the call stack correlates strongly with L1 instruction misses, and (2) the return address stack (RAS), already present in all high performance processors, succinctly summarizes program context. We propose RAS-Directed Instruction Prefetching (RDIP), which associates prefetch operations with signatures formed from the contents of the RAS. RDIP achieves 70% of the potential speedup of an ideal L1 cache, outperforms a prefetcherless baseline by 11.5% and reduces energy and complexity relative to sequence-based prefetching. RDIP's performance is within 2% of the state-of-the-art Proactive Instruction Fetch, with nearly 3X reduction in storage and 1.9X reduction in energy overheads.

UR - http://www.scopus.com/inward/record.url?scp=84892524803&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84892524803&partnerID=8YFLogxK

U2 - 10.1145/2540708.2540731

DO - 10.1145/2540708.2540731

M3 - Conference contribution

SN - 9781450326384

T3 - MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

SP - 260

EP - 271

BT - MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

ER -

Kolli A, Saidi A, Wenisch TF. RDIP: Return-address-stack directed instruction prefetching. In MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. 2013. p. 260-271. (MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture). https://doi.org/10.1145/2540708.2540731