Optimizing off-chip accesses in multicores

Wei Ding, Xulong Tang, Mahmut Kandemir, Yuanrui Zhang, Emre Kultursay

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

In a network-on-chip (NoC) based manycore architecture, an offchip data access (main memory access) needs to travel through the on-chip network, spending considerable amount of time within the chip (in addition to the memory access latency). In addition, it contends with on-chip (cache) accesses as both use the same NoC resources. In this paper, focusing on data-parallel, multithreaded applications, we propose a compiler-based off-chip data access localization strategy, which places data elements in the memory space such that an off-chip access traverses a minimum number of links (hops) to reach the memory controller that handles this access. This brings three main benefits. First, the network latency of off-chip accesses gets reduced; second, the network latency of onchip accesses gets reduced; and finally, the memory latency of offchip accesses improves, due to reduced queue latencies. We present an experimental evaluation of our optimization strategy using a set of 13 multithreaded application programs under both private and shared last-level caches. The results collected emphasize the importance of optimizing the off-chip data accesses.

Original languageEnglish (US)
Title of host publicationPLDI 2015 - Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation
EditorsSteve Blackburn, David Grove
PublisherAssociation for Computing Machinery
Pages131-142
Number of pages12
ISBN (Electronic)9781450334686
DOIs
StatePublished - Jun 3 2015
Event36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2015 - Portland, United States
Duration: Jun 13 2015Jun 17 2015

Publication series

NameProceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
Volume2015-June

Other

Other36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2015
CountryUnited States
CityPortland
Period6/13/156/17/15

Fingerprint

Data storage equipment
Application programs
Controllers
Network-on-chip

All Science Journal Classification (ASJC) codes

  • Software

Cite this

Ding, W., Tang, X., Kandemir, M., Zhang, Y., & Kultursay, E. (2015). Optimizing off-chip accesses in multicores. In S. Blackburn, & D. Grove (Eds.), PLDI 2015 - Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (pp. 131-142). (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI); Vol. 2015-June). Association for Computing Machinery. https://doi.org/10.1145/2737924.2737989
Ding, Wei ; Tang, Xulong ; Kandemir, Mahmut ; Zhang, Yuanrui ; Kultursay, Emre. / Optimizing off-chip accesses in multicores. PLDI 2015 - Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. editor / Steve Blackburn ; David Grove. Association for Computing Machinery, 2015. pp. 131-142 (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)).
@inproceedings{3f97bfe457f34cb294ddaeb459dac7c3,
title = "Optimizing off-chip accesses in multicores",
abstract = "In a network-on-chip (NoC) based manycore architecture, an offchip data access (main memory access) needs to travel through the on-chip network, spending considerable amount of time within the chip (in addition to the memory access latency). In addition, it contends with on-chip (cache) accesses as both use the same NoC resources. In this paper, focusing on data-parallel, multithreaded applications, we propose a compiler-based off-chip data access localization strategy, which places data elements in the memory space such that an off-chip access traverses a minimum number of links (hops) to reach the memory controller that handles this access. This brings three main benefits. First, the network latency of off-chip accesses gets reduced; second, the network latency of onchip accesses gets reduced; and finally, the memory latency of offchip accesses improves, due to reduced queue latencies. We present an experimental evaluation of our optimization strategy using a set of 13 multithreaded application programs under both private and shared last-level caches. The results collected emphasize the importance of optimizing the off-chip data accesses.",
author = "Wei Ding and Xulong Tang and Mahmut Kandemir and Yuanrui Zhang and Emre Kultursay",
year = "2015",
month = "6",
day = "3",
doi = "10.1145/2737924.2737989",
language = "English (US)",
series = "Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)",
publisher = "Association for Computing Machinery",
pages = "131--142",
editor = "Steve Blackburn and David Grove",
booktitle = "PLDI 2015 - Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation",

}

Ding, W, Tang, X, Kandemir, M, Zhang, Y & Kultursay, E 2015, Optimizing off-chip accesses in multicores. in S Blackburn & D Grove (eds), PLDI 2015 - Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), vol. 2015-June, Association for Computing Machinery, pp. 131-142, 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2015, Portland, United States, 6/13/15. https://doi.org/10.1145/2737924.2737989

Optimizing off-chip accesses in multicores. / Ding, Wei; Tang, Xulong; Kandemir, Mahmut; Zhang, Yuanrui; Kultursay, Emre.

PLDI 2015 - Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. ed. / Steve Blackburn; David Grove. Association for Computing Machinery, 2015. p. 131-142 (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI); Vol. 2015-June).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Optimizing off-chip accesses in multicores

AU - Ding, Wei

AU - Tang, Xulong

AU - Kandemir, Mahmut

AU - Zhang, Yuanrui

AU - Kultursay, Emre

PY - 2015/6/3

Y1 - 2015/6/3

N2 - In a network-on-chip (NoC) based manycore architecture, an offchip data access (main memory access) needs to travel through the on-chip network, spending considerable amount of time within the chip (in addition to the memory access latency). In addition, it contends with on-chip (cache) accesses as both use the same NoC resources. In this paper, focusing on data-parallel, multithreaded applications, we propose a compiler-based off-chip data access localization strategy, which places data elements in the memory space such that an off-chip access traverses a minimum number of links (hops) to reach the memory controller that handles this access. This brings three main benefits. First, the network latency of off-chip accesses gets reduced; second, the network latency of onchip accesses gets reduced; and finally, the memory latency of offchip accesses improves, due to reduced queue latencies. We present an experimental evaluation of our optimization strategy using a set of 13 multithreaded application programs under both private and shared last-level caches. The results collected emphasize the importance of optimizing the off-chip data accesses.

AB - In a network-on-chip (NoC) based manycore architecture, an offchip data access (main memory access) needs to travel through the on-chip network, spending considerable amount of time within the chip (in addition to the memory access latency). In addition, it contends with on-chip (cache) accesses as both use the same NoC resources. In this paper, focusing on data-parallel, multithreaded applications, we propose a compiler-based off-chip data access localization strategy, which places data elements in the memory space such that an off-chip access traverses a minimum number of links (hops) to reach the memory controller that handles this access. This brings three main benefits. First, the network latency of off-chip accesses gets reduced; second, the network latency of onchip accesses gets reduced; and finally, the memory latency of offchip accesses improves, due to reduced queue latencies. We present an experimental evaluation of our optimization strategy using a set of 13 multithreaded application programs under both private and shared last-level caches. The results collected emphasize the importance of optimizing the off-chip data accesses.

UR - http://www.scopus.com/inward/record.url?scp=84951827362&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84951827362&partnerID=8YFLogxK

U2 - 10.1145/2737924.2737989

DO - 10.1145/2737924.2737989

M3 - Conference contribution

AN - SCOPUS:84951827362

T3 - Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)

SP - 131

EP - 142

BT - PLDI 2015 - Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation

A2 - Blackburn, Steve

A2 - Grove, David

PB - Association for Computing Machinery

ER -

Ding W, Tang X, Kandemir M, Zhang Y, Kultursay E. Optimizing off-chip accesses in multicores. In Blackburn S, Grove D, editors, PLDI 2015 - Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. Association for Computing Machinery. 2015. p. 131-142. (Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)). https://doi.org/10.1145/2737924.2737989