Reshaping cache misses to improve row-buffer locality in multicore systems

Wei Ding, Jun Liu, Mahmut Kandemir, Mary Jane Irwin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Optimizing cache locality has always been important since the emergence of caches, and numerous cache locality optimization schemes have been published in compiler literature. However, in modern architectures, cache locality is not the only factor that determines memory system performance. Many emerging multicores employ banked memory systems and each bank is attached a row-buffer that holds the most-recently accessed memory row (page). A last-level cache miss that also misses in the row-buffer can experience much higher latency than a cache miss that hits in the row-buffer. Consequently, optimizing for row-buffer locality can be as important as optimizing for cache locality. Targeting emerging multicores and multithreaded applications, this paper presents a compiler-directed row-buffer locality optimization strategy. This strategy modifies the memory layout of data to increase the number of row-buffer hits without increasing the number of misses in the on-chip cache hierarchy. We implemented our proposed optimization strategy in an open-source compiler and tested its effectiveness in improving the row-buffer performance using a set of multithreaded applications. Our results indicate that the proposed approach improves the average data access latency by about 29%, and this translates, on average, to about 15% improvement in execution time.

Original languageEnglish (US)
Title of host publicationPACT 2013 - Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques
Pages235-244
Number of pages10
DOIs
StatePublished - Nov 18 2013
Event22nd International Conference on Parallel Architectures and Compilation Techniques, PACT 2013 - Edinburgh, United Kingdom
Duration: Sep 7 2013Sep 11 2013

Publication series

NameParallel Architectures and Compilation Techniques - Conference Proceedings, PACT
ISSN (Print)1089-795X

Other

Other22nd International Conference on Parallel Architectures and Compilation Techniques, PACT 2013
CountryUnited Kingdom
CityEdinburgh
Period9/7/139/11/13

Fingerprint

Locality
Cache
Buffer
Data storage equipment
Compiler
Hits
Latency
Optimization
Computer systems
Open Source
Execution Time
Layout
System Performance
Chip
Strategy

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture

Cite this

Ding, W., Liu, J., Kandemir, M., & Irwin, M. J. (2013). Reshaping cache misses to improve row-buffer locality in multicore systems. In PACT 2013 - Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (pp. 235-244). [6618820] (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT). https://doi.org/10.1109/PACT.2013.6618820
Ding, Wei ; Liu, Jun ; Kandemir, Mahmut ; Irwin, Mary Jane. / Reshaping cache misses to improve row-buffer locality in multicore systems. PACT 2013 - Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. 2013. pp. 235-244 (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT).
@inproceedings{8163abe7db8246cf9663c3eda1b38207,
title = "Reshaping cache misses to improve row-buffer locality in multicore systems",
abstract = "Optimizing cache locality has always been important since the emergence of caches, and numerous cache locality optimization schemes have been published in compiler literature. However, in modern architectures, cache locality is not the only factor that determines memory system performance. Many emerging multicores employ banked memory systems and each bank is attached a row-buffer that holds the most-recently accessed memory row (page). A last-level cache miss that also misses in the row-buffer can experience much higher latency than a cache miss that hits in the row-buffer. Consequently, optimizing for row-buffer locality can be as important as optimizing for cache locality. Targeting emerging multicores and multithreaded applications, this paper presents a compiler-directed row-buffer locality optimization strategy. This strategy modifies the memory layout of data to increase the number of row-buffer hits without increasing the number of misses in the on-chip cache hierarchy. We implemented our proposed optimization strategy in an open-source compiler and tested its effectiveness in improving the row-buffer performance using a set of multithreaded applications. Our results indicate that the proposed approach improves the average data access latency by about 29{\%}, and this translates, on average, to about 15{\%} improvement in execution time.",
author = "Wei Ding and Jun Liu and Mahmut Kandemir and Irwin, {Mary Jane}",
year = "2013",
month = "11",
day = "18",
doi = "10.1109/PACT.2013.6618820",
language = "English (US)",
isbn = "9781479910212",
series = "Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT",
pages = "235--244",
booktitle = "PACT 2013 - Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques",

}

Ding, W, Liu, J, Kandemir, M & Irwin, MJ 2013, Reshaping cache misses to improve row-buffer locality in multicore systems. in PACT 2013 - Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques., 6618820, Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT, pp. 235-244, 22nd International Conference on Parallel Architectures and Compilation Techniques, PACT 2013, Edinburgh, United Kingdom, 9/7/13. https://doi.org/10.1109/PACT.2013.6618820

Reshaping cache misses to improve row-buffer locality in multicore systems. / Ding, Wei; Liu, Jun; Kandemir, Mahmut; Irwin, Mary Jane.

PACT 2013 - Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. 2013. p. 235-244 6618820 (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Reshaping cache misses to improve row-buffer locality in multicore systems

AU - Ding, Wei

AU - Liu, Jun

AU - Kandemir, Mahmut

AU - Irwin, Mary Jane

PY - 2013/11/18

Y1 - 2013/11/18

N2 - Optimizing cache locality has always been important since the emergence of caches, and numerous cache locality optimization schemes have been published in compiler literature. However, in modern architectures, cache locality is not the only factor that determines memory system performance. Many emerging multicores employ banked memory systems and each bank is attached a row-buffer that holds the most-recently accessed memory row (page). A last-level cache miss that also misses in the row-buffer can experience much higher latency than a cache miss that hits in the row-buffer. Consequently, optimizing for row-buffer locality can be as important as optimizing for cache locality. Targeting emerging multicores and multithreaded applications, this paper presents a compiler-directed row-buffer locality optimization strategy. This strategy modifies the memory layout of data to increase the number of row-buffer hits without increasing the number of misses in the on-chip cache hierarchy. We implemented our proposed optimization strategy in an open-source compiler and tested its effectiveness in improving the row-buffer performance using a set of multithreaded applications. Our results indicate that the proposed approach improves the average data access latency by about 29%, and this translates, on average, to about 15% improvement in execution time.

AB - Optimizing cache locality has always been important since the emergence of caches, and numerous cache locality optimization schemes have been published in compiler literature. However, in modern architectures, cache locality is not the only factor that determines memory system performance. Many emerging multicores employ banked memory systems and each bank is attached a row-buffer that holds the most-recently accessed memory row (page). A last-level cache miss that also misses in the row-buffer can experience much higher latency than a cache miss that hits in the row-buffer. Consequently, optimizing for row-buffer locality can be as important as optimizing for cache locality. Targeting emerging multicores and multithreaded applications, this paper presents a compiler-directed row-buffer locality optimization strategy. This strategy modifies the memory layout of data to increase the number of row-buffer hits without increasing the number of misses in the on-chip cache hierarchy. We implemented our proposed optimization strategy in an open-source compiler and tested its effectiveness in improving the row-buffer performance using a set of multithreaded applications. Our results indicate that the proposed approach improves the average data access latency by about 29%, and this translates, on average, to about 15% improvement in execution time.

UR - http://www.scopus.com/inward/record.url?scp=84887455704&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84887455704&partnerID=8YFLogxK

U2 - 10.1109/PACT.2013.6618820

DO - 10.1109/PACT.2013.6618820

M3 - Conference contribution

AN - SCOPUS:84887455704

SN - 9781479910212

T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT

SP - 235

EP - 244

BT - PACT 2013 - Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques

ER -

Ding W, Liu J, Kandemir M, Irwin MJ. Reshaping cache misses to improve row-buffer locality in multicore systems. In PACT 2013 - Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques. 2013. p. 235-244. 6618820. (Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT). https://doi.org/10.1109/PACT.2013.6618820