Memory row reuse distance and its role in optimizing application performance

Mahmut Kandemir, Hui Zhao, Xulong Tang, Mustafa Karakoy

Research output: Contribution to journalConference article

4 Citations (Scopus)

Abstract

Continuously increasing dataset sizes of large-scale applications overwhelm on-chip cache capacities and make the performance of last-level caches (LLC) increasingly important. That is, in addition to maximizing LLC hit rates, it is becoming equally important to reduce LLC miss latencies. One of the critical factors that influence LLC miss latencies is row-buffer locality (i.e., the fraction of LLC misses that hit in the large buffer attached to a memory bank). While there has been a plethora of recent works on optimizing rowbuffer performance, to our knowledge, there is no study that quantifies the full potential of row-buffer locality and impact of maximizing it on application performance. Focusing on multithreaded applications, the first contribution of this paper is the definition of a new metric called (memory) row reuse distance (RRD). We show that, while intra-core RRDs are relatively small (increasing the chances for row-buffer hits), inter-core RRDs are quite large (increasing the chances for row-buffer misses). Motivated by this, we propose two schemes that measure the maximum potential benefits that could be obtained from minimizing RRDs, to the extent allowed by program dependencies. Specifically, one of our schemes (Scheme-I) targets only intra-core RRDs, whereas the other one (Scheme-II) aims at reducing both intra-core RRDs and inter-core RRDs. Our experimental evaluations demonstrate that (i) Scheme-I reduces intra-core RRDs but increases inter-core RRDs; (ii) Scheme-II reduces inter-core RRDs significantly while achieving a similar behavior to Scheme-I as far as intra-core RRDs are concerned; (iii) Scheme-I and Scheme-II improve execution times of our applications by 17% and 21%, respectively, on average; and (iv) both our schemes deliver consistently good results under different memory request scheduling policies.

Original languageEnglish (US)
Pages (from-to)137-149
Number of pages13
JournalPerformance Evaluation Review
Volume43
Issue number1
DOIs
StatePublished - Jun 24 2015
EventACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2015 - Portland, United States
Duration: Jun 15 2015Jun 19 2015

Fingerprint

Data storage equipment
Scheduling

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Kandemir, Mahmut ; Zhao, Hui ; Tang, Xulong ; Karakoy, Mustafa. / Memory row reuse distance and its role in optimizing application performance. In: Performance Evaluation Review. 2015 ; Vol. 43, No. 1. pp. 137-149.
@article{98b1d4544e8141ad92daa144c25fb483,
title = "Memory row reuse distance and its role in optimizing application performance",
abstract = "Continuously increasing dataset sizes of large-scale applications overwhelm on-chip cache capacities and make the performance of last-level caches (LLC) increasingly important. That is, in addition to maximizing LLC hit rates, it is becoming equally important to reduce LLC miss latencies. One of the critical factors that influence LLC miss latencies is row-buffer locality (i.e., the fraction of LLC misses that hit in the large buffer attached to a memory bank). While there has been a plethora of recent works on optimizing rowbuffer performance, to our knowledge, there is no study that quantifies the full potential of row-buffer locality and impact of maximizing it on application performance. Focusing on multithreaded applications, the first contribution of this paper is the definition of a new metric called (memory) row reuse distance (RRD). We show that, while intra-core RRDs are relatively small (increasing the chances for row-buffer hits), inter-core RRDs are quite large (increasing the chances for row-buffer misses). Motivated by this, we propose two schemes that measure the maximum potential benefits that could be obtained from minimizing RRDs, to the extent allowed by program dependencies. Specifically, one of our schemes (Scheme-I) targets only intra-core RRDs, whereas the other one (Scheme-II) aims at reducing both intra-core RRDs and inter-core RRDs. Our experimental evaluations demonstrate that (i) Scheme-I reduces intra-core RRDs but increases inter-core RRDs; (ii) Scheme-II reduces inter-core RRDs significantly while achieving a similar behavior to Scheme-I as far as intra-core RRDs are concerned; (iii) Scheme-I and Scheme-II improve execution times of our applications by 17{\%} and 21{\%}, respectively, on average; and (iv) both our schemes deliver consistently good results under different memory request scheduling policies.",
author = "Mahmut Kandemir and Hui Zhao and Xulong Tang and Mustafa Karakoy",
year = "2015",
month = "6",
day = "24",
doi = "10.1145/2796314.2745867",
language = "English (US)",
volume = "43",
pages = "137--149",
journal = "Performance Evaluation Review",
issn = "0163-5999",
publisher = "Association for Computing Machinery (ACM)",
number = "1",

}

Memory row reuse distance and its role in optimizing application performance. / Kandemir, Mahmut; Zhao, Hui; Tang, Xulong; Karakoy, Mustafa.

In: Performance Evaluation Review, Vol. 43, No. 1, 24.06.2015, p. 137-149.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Memory row reuse distance and its role in optimizing application performance

AU - Kandemir, Mahmut

AU - Zhao, Hui

AU - Tang, Xulong

AU - Karakoy, Mustafa

PY - 2015/6/24

Y1 - 2015/6/24

N2 - Continuously increasing dataset sizes of large-scale applications overwhelm on-chip cache capacities and make the performance of last-level caches (LLC) increasingly important. That is, in addition to maximizing LLC hit rates, it is becoming equally important to reduce LLC miss latencies. One of the critical factors that influence LLC miss latencies is row-buffer locality (i.e., the fraction of LLC misses that hit in the large buffer attached to a memory bank). While there has been a plethora of recent works on optimizing rowbuffer performance, to our knowledge, there is no study that quantifies the full potential of row-buffer locality and impact of maximizing it on application performance. Focusing on multithreaded applications, the first contribution of this paper is the definition of a new metric called (memory) row reuse distance (RRD). We show that, while intra-core RRDs are relatively small (increasing the chances for row-buffer hits), inter-core RRDs are quite large (increasing the chances for row-buffer misses). Motivated by this, we propose two schemes that measure the maximum potential benefits that could be obtained from minimizing RRDs, to the extent allowed by program dependencies. Specifically, one of our schemes (Scheme-I) targets only intra-core RRDs, whereas the other one (Scheme-II) aims at reducing both intra-core RRDs and inter-core RRDs. Our experimental evaluations demonstrate that (i) Scheme-I reduces intra-core RRDs but increases inter-core RRDs; (ii) Scheme-II reduces inter-core RRDs significantly while achieving a similar behavior to Scheme-I as far as intra-core RRDs are concerned; (iii) Scheme-I and Scheme-II improve execution times of our applications by 17% and 21%, respectively, on average; and (iv) both our schemes deliver consistently good results under different memory request scheduling policies.

AB - Continuously increasing dataset sizes of large-scale applications overwhelm on-chip cache capacities and make the performance of last-level caches (LLC) increasingly important. That is, in addition to maximizing LLC hit rates, it is becoming equally important to reduce LLC miss latencies. One of the critical factors that influence LLC miss latencies is row-buffer locality (i.e., the fraction of LLC misses that hit in the large buffer attached to a memory bank). While there has been a plethora of recent works on optimizing rowbuffer performance, to our knowledge, there is no study that quantifies the full potential of row-buffer locality and impact of maximizing it on application performance. Focusing on multithreaded applications, the first contribution of this paper is the definition of a new metric called (memory) row reuse distance (RRD). We show that, while intra-core RRDs are relatively small (increasing the chances for row-buffer hits), inter-core RRDs are quite large (increasing the chances for row-buffer misses). Motivated by this, we propose two schemes that measure the maximum potential benefits that could be obtained from minimizing RRDs, to the extent allowed by program dependencies. Specifically, one of our schemes (Scheme-I) targets only intra-core RRDs, whereas the other one (Scheme-II) aims at reducing both intra-core RRDs and inter-core RRDs. Our experimental evaluations demonstrate that (i) Scheme-I reduces intra-core RRDs but increases inter-core RRDs; (ii) Scheme-II reduces inter-core RRDs significantly while achieving a similar behavior to Scheme-I as far as intra-core RRDs are concerned; (iii) Scheme-I and Scheme-II improve execution times of our applications by 17% and 21%, respectively, on average; and (iv) both our schemes deliver consistently good results under different memory request scheduling policies.

UR - http://www.scopus.com/inward/record.url?scp=84955583379&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84955583379&partnerID=8YFLogxK

U2 - 10.1145/2796314.2745867

DO - 10.1145/2796314.2745867

M3 - Conference article

AN - SCOPUS:84955583379

VL - 43

SP - 137

EP - 149

JO - Performance Evaluation Review

JF - Performance Evaluation Review

SN - 0163-5999

IS - 1

ER -