TY - GEN
T1 - Distance-in-time versus distance-in-space
AU - Kandemir, Mahmut Taylan
AU - Tang, Xulong
AU - Zhao, Hui
AU - Ryoo, Jihyun
AU - Karakoy, Mustafa
N1 - Funding Information:
The authors sincerely thank Dr. Milind Kulkarni for shepherding the paper. The authors would also like to thank the anonymous PLDI 2021 reviewers for their constructive feedback and suggestions. This work is supported in part by NSF grants #1908793, #1629915, #1629129, #1763681, #2028929, #2008398, #2011146, and #1931531, as well as a startup funding from the University of Pittsburgh.
Publisher Copyright:
© 2021 ACM.
PY - 2021/6/18
Y1 - 2021/6/18
N2 - Cache behavior is one of the major factors that influence the performance of applications. Most of the existing compiler techniques that target cache memories focus exclusively on reducing data reuse distances in time (DIT). However, current manycore systems employ distributed on-chip caches that are connected using an on-chip network. As a result, a reused data element/block needs to travel over this on-chip network, and the distance to be traveled-reuse distance in space (DIS)-can be as influential in dictating application performance as reuse DIT. This paper represents the first attempt at defining a compiler framework that accommodates both DIT and DIS. Specifically, it first classifies data reuses into four groups: G1: (low DIT, low DIS), G2: (high DIT, low DIS), G3: (low DIT, high DIS), and G4: (high DIT, high DIS). Then, observing that reuses in G1 represent the ideal case and there is nothing much to be done in computations in G4, it proposes a "reuse transfer"strategy that transfers select reuses between G2 and G3, eventually, transforming each reuse to either G1 or G4. Finally, it evaluates the proposed strategy using a set of 10 multithreaded applications. The collected results reveal that the proposed strategy reduces parallel execution times of the tested applications between 19.3% and 33.3%.
AB - Cache behavior is one of the major factors that influence the performance of applications. Most of the existing compiler techniques that target cache memories focus exclusively on reducing data reuse distances in time (DIT). However, current manycore systems employ distributed on-chip caches that are connected using an on-chip network. As a result, a reused data element/block needs to travel over this on-chip network, and the distance to be traveled-reuse distance in space (DIS)-can be as influential in dictating application performance as reuse DIT. This paper represents the first attempt at defining a compiler framework that accommodates both DIT and DIS. Specifically, it first classifies data reuses into four groups: G1: (low DIT, low DIS), G2: (high DIT, low DIS), G3: (low DIT, high DIS), and G4: (high DIT, high DIS). Then, observing that reuses in G1 represent the ideal case and there is nothing much to be done in computations in G4, it proposes a "reuse transfer"strategy that transfers select reuses between G2 and G3, eventually, transforming each reuse to either G1 or G4. Finally, it evaluates the proposed strategy using a set of 10 multithreaded applications. The collected results reveal that the proposed strategy reduces parallel execution times of the tested applications between 19.3% and 33.3%.
UR - http://www.scopus.com/inward/record.url?scp=85108915550&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85108915550&partnerID=8YFLogxK
U2 - 10.1145/3453483.3454069
DO - 10.1145/3453483.3454069
M3 - Conference contribution
AN - SCOPUS:85108915550
T3 - Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
SP - 665
EP - 680
BT - PLDI 2021 - Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation
A2 - Freund, Stephen N.
A2 - Yahav, Eran
PB - Association for Computing Machinery
T2 - 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2021
Y2 - 20 June 2021 through 25 June 2021
ER -