Co-optimizing memory-level parallelism and cache-level parallelism

Xulong Tang, Mustafa Karakoy, Mahmut Kandemir, Meenakshi Arunachalam

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Minimizing cache misses has been the traditional goal in optimizing cache performance using compiler based techniques. However, continuously increasing dataset sizes combined with large numbers of cache banks and memory banks connected using on-chip networks in emerging manycores/accelerators makes cache hitśmiss latency optimization as important as cache miss rate minimization. In this paper, we propose compiler support that optimizes both the latencies of last-level cache (LLC) hits and the latencies of LLC misses. Our approach tries to achieve this goal by improving the parallelism exhibited by LLC hits and LLC misses. More speciically, it tries to maximize both cache-level parallelism (CLP) and memory-level parallelism (MLP). This paper presents diferent incarnations of our approach, and evaluates them using a set of 12 multithreaded applications. Our results indicate that (i) optimizing MLP irst and CLP later brings, on average, 11.31% performance improvement over an approach that already minimizes the number of LLC misses, and (ii) optimizing CLP irst and MLP later brings 9.43% performance improvement. In comparison, balancing MLP and CLP brings 17.32% performance improvement on average.

Original languageEnglish (US)
Title of host publicationPLDI 2019 - Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation
EditorsKathryn S. McKinley, Kathleen Fisher
PublisherAssociation for Computing Machinery
Pages935-949
Number of pages15
ISBN (Electronic)9781450367127
DOIs
StatePublished - Jun 8 2019
Event40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019 - Phoenix, United States
Duration: Jun 22 2019Jun 26 2019

Publication series

NameProceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)

Conference

Conference40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019
CountryUnited States
CityPhoenix
Period6/22/196/26/19

Fingerprint

Data storage equipment
Particle accelerators

All Science Journal Classification (ASJC) codes

  • Software

Cite this

Tang, X., Karakoy, M., Kandemir, M., & Arunachalam, M. (2019). Co-optimizing memory-level parallelism and cache-level parallelism. In K. S. McKinley, & K. Fisher (Eds.), PLDI 2019 - Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (pp. 935-949). (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)). Association for Computing Machinery. https://doi.org/10.1145/3314221.3314599
Tang, Xulong ; Karakoy, Mustafa ; Kandemir, Mahmut ; Arunachalam, Meenakshi. / Co-optimizing memory-level parallelism and cache-level parallelism. PLDI 2019 - Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. editor / Kathryn S. McKinley ; Kathleen Fisher. Association for Computing Machinery, 2019. pp. 935-949 (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)).
@inproceedings{245345b7070a4f03971410236e27c89b,
title = "Co-optimizing memory-level parallelism and cache-level parallelism",
abstract = "Minimizing cache misses has been the traditional goal in optimizing cache performance using compiler based techniques. However, continuously increasing dataset sizes combined with large numbers of cache banks and memory banks connected using on-chip networks in emerging manycores/accelerators makes cache hitśmiss latency optimization as important as cache miss rate minimization. In this paper, we propose compiler support that optimizes both the latencies of last-level cache (LLC) hits and the latencies of LLC misses. Our approach tries to achieve this goal by improving the parallelism exhibited by LLC hits and LLC misses. More speciically, it tries to maximize both cache-level parallelism (CLP) and memory-level parallelism (MLP). This paper presents diferent incarnations of our approach, and evaluates them using a set of 12 multithreaded applications. Our results indicate that (i) optimizing MLP irst and CLP later brings, on average, 11.31{\%} performance improvement over an approach that already minimizes the number of LLC misses, and (ii) optimizing CLP irst and MLP later brings 9.43{\%} performance improvement. In comparison, balancing MLP and CLP brings 17.32{\%} performance improvement on average.",
author = "Xulong Tang and Mustafa Karakoy and Mahmut Kandemir and Meenakshi Arunachalam",
year = "2019",
month = "6",
day = "8",
doi = "10.1145/3314221.3314599",
language = "English (US)",
series = "Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)",
publisher = "Association for Computing Machinery",
pages = "935--949",
editor = "McKinley, {Kathryn S.} and Kathleen Fisher",
booktitle = "PLDI 2019 - Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation",

}

Tang, X, Karakoy, M, Kandemir, M & Arunachalam, M 2019, Co-optimizing memory-level parallelism and cache-level parallelism. in KS McKinley & K Fisher (eds), PLDI 2019 - Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Association for Computing Machinery, pp. 935-949, 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, United States, 6/22/19. https://doi.org/10.1145/3314221.3314599

Co-optimizing memory-level parallelism and cache-level parallelism. / Tang, Xulong; Karakoy, Mustafa; Kandemir, Mahmut; Arunachalam, Meenakshi.

PLDI 2019 - Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. ed. / Kathryn S. McKinley; Kathleen Fisher. Association for Computing Machinery, 2019. p. 935-949 (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Co-optimizing memory-level parallelism and cache-level parallelism

AU - Tang, Xulong

AU - Karakoy, Mustafa

AU - Kandemir, Mahmut

AU - Arunachalam, Meenakshi

PY - 2019/6/8

Y1 - 2019/6/8

N2 - Minimizing cache misses has been the traditional goal in optimizing cache performance using compiler based techniques. However, continuously increasing dataset sizes combined with large numbers of cache banks and memory banks connected using on-chip networks in emerging manycores/accelerators makes cache hitśmiss latency optimization as important as cache miss rate minimization. In this paper, we propose compiler support that optimizes both the latencies of last-level cache (LLC) hits and the latencies of LLC misses. Our approach tries to achieve this goal by improving the parallelism exhibited by LLC hits and LLC misses. More speciically, it tries to maximize both cache-level parallelism (CLP) and memory-level parallelism (MLP). This paper presents diferent incarnations of our approach, and evaluates them using a set of 12 multithreaded applications. Our results indicate that (i) optimizing MLP irst and CLP later brings, on average, 11.31% performance improvement over an approach that already minimizes the number of LLC misses, and (ii) optimizing CLP irst and MLP later brings 9.43% performance improvement. In comparison, balancing MLP and CLP brings 17.32% performance improvement on average.

AB - Minimizing cache misses has been the traditional goal in optimizing cache performance using compiler based techniques. However, continuously increasing dataset sizes combined with large numbers of cache banks and memory banks connected using on-chip networks in emerging manycores/accelerators makes cache hitśmiss latency optimization as important as cache miss rate minimization. In this paper, we propose compiler support that optimizes both the latencies of last-level cache (LLC) hits and the latencies of LLC misses. Our approach tries to achieve this goal by improving the parallelism exhibited by LLC hits and LLC misses. More speciically, it tries to maximize both cache-level parallelism (CLP) and memory-level parallelism (MLP). This paper presents diferent incarnations of our approach, and evaluates them using a set of 12 multithreaded applications. Our results indicate that (i) optimizing MLP irst and CLP later brings, on average, 11.31% performance improvement over an approach that already minimizes the number of LLC misses, and (ii) optimizing CLP irst and MLP later brings 9.43% performance improvement. In comparison, balancing MLP and CLP brings 17.32% performance improvement on average.

UR - http://www.scopus.com/inward/record.url?scp=85067638402&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85067638402&partnerID=8YFLogxK

U2 - 10.1145/3314221.3314599

DO - 10.1145/3314221.3314599

M3 - Conference contribution

AN - SCOPUS:85067638402

T3 - Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)

SP - 935

EP - 949

BT - PLDI 2019 - Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation

A2 - McKinley, Kathryn S.

A2 - Fisher, Kathleen

PB - Association for Computing Machinery

ER -

Tang X, Karakoy M, Kandemir M, Arunachalam M. Co-optimizing memory-level parallelism and cache-level parallelism. In McKinley KS, Fisher K, editors, PLDI 2019 - Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. Association for Computing Machinery. 2019. p. 935-949. (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)). https://doi.org/10.1145/3314221.3314599