Studying inter-core data reuse in multicores

Yuanrui Zhang, Mahmut Kandemir, Taylan Yemliha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

Most of existing research on emerging multicore machines focus on parallelism extraction and architectural level optimizations. While these optimizations are critical, complementary approaches such as data locality enhancement can also bring significant benefits. Most of the previous data locality optimization techniques have been proposed and evaluated in the context of single core architectures. While one can expect these optimizations to be useful for multicore machines as well, multicores present further opportunities due to shared on-chip caches most of them accommodate. In order to optimize data locality targeting multicore machines however, the first step is to understand data reuse characteristics of multithreaded applications and potential benefits shared caches can bring. Motivated by these observations, we make the following contributions in this paper. First, we give a definition for inter-core data reuse and quantify it on multicores using a set of ten multithreaded application programs. Second, we show that neither on-chip cache hierarchies of current multicore architectures nor state-of-the-art (single-core centric) code/data optimizations exploit available inter-core data reuse in multithreaded applications. Third, we demonstrate that exploiting all available inter-core reuse could boost overall application performance by around 21.3% on average, indicating that there is significant scope for optimization. However, we also show that trying to optimize for inter-core reuse aggressively without considering the impact of doing so on intra-core reuse can actually perform worse than optimizing for intra-core reuse alone. Finally, we present a novel, compiler-based data locality optimization strategy for multicores that balances both intercore and intra-core reuse optimizations carefully to maximize benefits that can be extracted from shared caches. Our experiments with this strategy reveal that it is very effective in optimizing data locality in multicores.

Original languageEnglish (US)
Title of host publicationSIGMETRICS'11 - Proceedings of the 2011 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems
Pages25-36
Number of pages12
Edition1 SPEC. ISSUE
DOIs
StatePublished - Jul 15 2011
Event2011 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS'11 - San Jose, CA, United States
Duration: Jun 7 2011Jun 11 2011

Publication series

NamePerformance Evaluation Review
Number1 SPEC. ISSUE
Volume39
ISSN (Print)0163-5999

Other

Other2011 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS'11
CountryUnited States
CitySan Jose, CA
Period6/7/116/11/11

Fingerprint

Application programs
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Zhang, Y., Kandemir, M., & Yemliha, T. (2011). Studying inter-core data reuse in multicores. In SIGMETRICS'11 - Proceedings of the 2011 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (1 SPEC. ISSUE ed., pp. 25-36). (Performance Evaluation Review; Vol. 39, No. 1 SPEC. ISSUE). https://doi.org/10.1145/2007116.2007120
Zhang, Yuanrui ; Kandemir, Mahmut ; Yemliha, Taylan. / Studying inter-core data reuse in multicores. SIGMETRICS'11 - Proceedings of the 2011 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. 1 SPEC. ISSUE. ed. 2011. pp. 25-36 (Performance Evaluation Review; 1 SPEC. ISSUE).
@inproceedings{92c6aae012b642aaa95fe8eaf1dfe3db,
title = "Studying inter-core data reuse in multicores",
abstract = "Most of existing research on emerging multicore machines focus on parallelism extraction and architectural level optimizations. While these optimizations are critical, complementary approaches such as data locality enhancement can also bring significant benefits. Most of the previous data locality optimization techniques have been proposed and evaluated in the context of single core architectures. While one can expect these optimizations to be useful for multicore machines as well, multicores present further opportunities due to shared on-chip caches most of them accommodate. In order to optimize data locality targeting multicore machines however, the first step is to understand data reuse characteristics of multithreaded applications and potential benefits shared caches can bring. Motivated by these observations, we make the following contributions in this paper. First, we give a definition for inter-core data reuse and quantify it on multicores using a set of ten multithreaded application programs. Second, we show that neither on-chip cache hierarchies of current multicore architectures nor state-of-the-art (single-core centric) code/data optimizations exploit available inter-core data reuse in multithreaded applications. Third, we demonstrate that exploiting all available inter-core reuse could boost overall application performance by around 21.3{\%} on average, indicating that there is significant scope for optimization. However, we also show that trying to optimize for inter-core reuse aggressively without considering the impact of doing so on intra-core reuse can actually perform worse than optimizing for intra-core reuse alone. Finally, we present a novel, compiler-based data locality optimization strategy for multicores that balances both intercore and intra-core reuse optimizations carefully to maximize benefits that can be extracted from shared caches. Our experiments with this strategy reveal that it is very effective in optimizing data locality in multicores.",
author = "Yuanrui Zhang and Mahmut Kandemir and Taylan Yemliha",
year = "2011",
month = "7",
day = "15",
doi = "10.1145/2007116.2007120",
language = "English (US)",
isbn = "9781450302623",
series = "Performance Evaluation Review",
number = "1 SPEC. ISSUE",
pages = "25--36",
booktitle = "SIGMETRICS'11 - Proceedings of the 2011 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems",
edition = "1 SPEC. ISSUE",

}

Zhang, Y, Kandemir, M & Yemliha, T 2011, Studying inter-core data reuse in multicores. in SIGMETRICS'11 - Proceedings of the 2011 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. 1 SPEC. ISSUE edn, Performance Evaluation Review, no. 1 SPEC. ISSUE, vol. 39, pp. 25-36, 2011 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS'11, San Jose, CA, United States, 6/7/11. https://doi.org/10.1145/2007116.2007120

Studying inter-core data reuse in multicores. / Zhang, Yuanrui; Kandemir, Mahmut; Yemliha, Taylan.

SIGMETRICS'11 - Proceedings of the 2011 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. 1 SPEC. ISSUE. ed. 2011. p. 25-36 (Performance Evaluation Review; Vol. 39, No. 1 SPEC. ISSUE).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Studying inter-core data reuse in multicores

AU - Zhang, Yuanrui

AU - Kandemir, Mahmut

AU - Yemliha, Taylan

PY - 2011/7/15

Y1 - 2011/7/15

N2 - Most of existing research on emerging multicore machines focus on parallelism extraction and architectural level optimizations. While these optimizations are critical, complementary approaches such as data locality enhancement can also bring significant benefits. Most of the previous data locality optimization techniques have been proposed and evaluated in the context of single core architectures. While one can expect these optimizations to be useful for multicore machines as well, multicores present further opportunities due to shared on-chip caches most of them accommodate. In order to optimize data locality targeting multicore machines however, the first step is to understand data reuse characteristics of multithreaded applications and potential benefits shared caches can bring. Motivated by these observations, we make the following contributions in this paper. First, we give a definition for inter-core data reuse and quantify it on multicores using a set of ten multithreaded application programs. Second, we show that neither on-chip cache hierarchies of current multicore architectures nor state-of-the-art (single-core centric) code/data optimizations exploit available inter-core data reuse in multithreaded applications. Third, we demonstrate that exploiting all available inter-core reuse could boost overall application performance by around 21.3% on average, indicating that there is significant scope for optimization. However, we also show that trying to optimize for inter-core reuse aggressively without considering the impact of doing so on intra-core reuse can actually perform worse than optimizing for intra-core reuse alone. Finally, we present a novel, compiler-based data locality optimization strategy for multicores that balances both intercore and intra-core reuse optimizations carefully to maximize benefits that can be extracted from shared caches. Our experiments with this strategy reveal that it is very effective in optimizing data locality in multicores.

AB - Most of existing research on emerging multicore machines focus on parallelism extraction and architectural level optimizations. While these optimizations are critical, complementary approaches such as data locality enhancement can also bring significant benefits. Most of the previous data locality optimization techniques have been proposed and evaluated in the context of single core architectures. While one can expect these optimizations to be useful for multicore machines as well, multicores present further opportunities due to shared on-chip caches most of them accommodate. In order to optimize data locality targeting multicore machines however, the first step is to understand data reuse characteristics of multithreaded applications and potential benefits shared caches can bring. Motivated by these observations, we make the following contributions in this paper. First, we give a definition for inter-core data reuse and quantify it on multicores using a set of ten multithreaded application programs. Second, we show that neither on-chip cache hierarchies of current multicore architectures nor state-of-the-art (single-core centric) code/data optimizations exploit available inter-core data reuse in multithreaded applications. Third, we demonstrate that exploiting all available inter-core reuse could boost overall application performance by around 21.3% on average, indicating that there is significant scope for optimization. However, we also show that trying to optimize for inter-core reuse aggressively without considering the impact of doing so on intra-core reuse can actually perform worse than optimizing for intra-core reuse alone. Finally, we present a novel, compiler-based data locality optimization strategy for multicores that balances both intercore and intra-core reuse optimizations carefully to maximize benefits that can be extracted from shared caches. Our experiments with this strategy reveal that it is very effective in optimizing data locality in multicores.

UR - http://www.scopus.com/inward/record.url?scp=79960198941&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79960198941&partnerID=8YFLogxK

U2 - 10.1145/2007116.2007120

DO - 10.1145/2007116.2007120

M3 - Conference contribution

SN - 9781450302623

T3 - Performance Evaluation Review

SP - 25

EP - 36

BT - SIGMETRICS'11 - Proceedings of the 2011 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems

ER -

Zhang Y, Kandemir M, Yemliha T. Studying inter-core data reuse in multicores. In SIGMETRICS'11 - Proceedings of the 2011 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. 1 SPEC. ISSUE ed. 2011. p. 25-36. (Performance Evaluation Review; 1 SPEC. ISSUE). https://doi.org/10.1145/2007116.2007120