On-chip cache hierarchy-aware tile scheduling for multicore machines

Jun Liu, Yuanrui Zhang, Wei Ding, Mahmut Kandemir

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

Iteration space tiling and scheduling is an important technique for optimizing loops that constitute a large fraction of execution times in computation kernels of both scientific codes and embedded applications. While tiling has been studied extensively in the context of both uniprocessor and multiprocessor platforms, prior research has paid less attention to tile scheduling, especially when targeting multicore machines with deep on-chip cache hierarchies. In this paper, we propose a cache hierarchy-aware tile scheduling algorithm for multicore machines, with the purpose of maximizing both horizontal and vertical data reuses in on-chip caches, and balancing the workloads across different cores. This scheduling algorithm is one of the key components in a source-to-source translation tool that we developed for automatic loop parallelization and multithreaded code generation from sequential codes. To the best of our knowledge, this is the first effort that develops a fully-automated tile scheduling strategy customized for on-chip cache topologies of multicore machines. The experimental results collected by executing twelve application programs on three commercial Intel machines (Nehalem, Dunnington, and Harpertown) reveal that our cache-aware tile scheduling brings about 27.9% reduction in cache misses, and on average, 13.5% improvement in execution times over an alternate method tested.

Original languageEnglish (US)
Title of host publicationProceedings - International Symposium on Code Generation and Optimization, CGO 2011
Pages161-170
Number of pages10
DOIs
StatePublished - May 30 2011
Event9th International Symposium on Code Generation and Optimization, CGO 2011 - Chamonix, France
Duration: Apr 2 2011Apr 6 2011

Publication series

NameProceedings - International Symposium on Code Generation and Optimization, CGO 2011

Other

Other9th International Symposium on Code Generation and Optimization, CGO 2011
CountryFrance
CityChamonix
Period4/2/114/6/11

Fingerprint

Tile
Cache
Chip
Scheduling
Scheduling algorithms
Tiling
Scheduling Algorithm
Execution Time
Application programs
Data Reuse
Code Generation
Topology
Multiprocessor
Parallelization
Balancing
Alternate
Workload
Hierarchy
Horizontal
Vertical

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Applied Mathematics

Cite this

Liu, J., Zhang, Y., Ding, W., & Kandemir, M. (2011). On-chip cache hierarchy-aware tile scheduling for multicore machines. In Proceedings - International Symposium on Code Generation and Optimization, CGO 2011 (pp. 161-170). [5764684] (Proceedings - International Symposium on Code Generation and Optimization, CGO 2011). https://doi.org/10.1109/CGO.2011.5764684
Liu, Jun ; Zhang, Yuanrui ; Ding, Wei ; Kandemir, Mahmut. / On-chip cache hierarchy-aware tile scheduling for multicore machines. Proceedings - International Symposium on Code Generation and Optimization, CGO 2011. 2011. pp. 161-170 (Proceedings - International Symposium on Code Generation and Optimization, CGO 2011).
@inproceedings{5b8ad9b733f84953b07e7bf8ffbff026,
title = "On-chip cache hierarchy-aware tile scheduling for multicore machines",
abstract = "Iteration space tiling and scheduling is an important technique for optimizing loops that constitute a large fraction of execution times in computation kernels of both scientific codes and embedded applications. While tiling has been studied extensively in the context of both uniprocessor and multiprocessor platforms, prior research has paid less attention to tile scheduling, especially when targeting multicore machines with deep on-chip cache hierarchies. In this paper, we propose a cache hierarchy-aware tile scheduling algorithm for multicore machines, with the purpose of maximizing both horizontal and vertical data reuses in on-chip caches, and balancing the workloads across different cores. This scheduling algorithm is one of the key components in a source-to-source translation tool that we developed for automatic loop parallelization and multithreaded code generation from sequential codes. To the best of our knowledge, this is the first effort that develops a fully-automated tile scheduling strategy customized for on-chip cache topologies of multicore machines. The experimental results collected by executing twelve application programs on three commercial Intel machines (Nehalem, Dunnington, and Harpertown) reveal that our cache-aware tile scheduling brings about 27.9{\%} reduction in cache misses, and on average, 13.5{\%} improvement in execution times over an alternate method tested.",
author = "Jun Liu and Yuanrui Zhang and Wei Ding and Mahmut Kandemir",
year = "2011",
month = "5",
day = "30",
doi = "10.1109/CGO.2011.5764684",
language = "English (US)",
isbn = "9781612843551",
series = "Proceedings - International Symposium on Code Generation and Optimization, CGO 2011",
pages = "161--170",
booktitle = "Proceedings - International Symposium on Code Generation and Optimization, CGO 2011",

}

Liu, J, Zhang, Y, Ding, W & Kandemir, M 2011, On-chip cache hierarchy-aware tile scheduling for multicore machines. in Proceedings - International Symposium on Code Generation and Optimization, CGO 2011., 5764684, Proceedings - International Symposium on Code Generation and Optimization, CGO 2011, pp. 161-170, 9th International Symposium on Code Generation and Optimization, CGO 2011, Chamonix, France, 4/2/11. https://doi.org/10.1109/CGO.2011.5764684

On-chip cache hierarchy-aware tile scheduling for multicore machines. / Liu, Jun; Zhang, Yuanrui; Ding, Wei; Kandemir, Mahmut.

Proceedings - International Symposium on Code Generation and Optimization, CGO 2011. 2011. p. 161-170 5764684 (Proceedings - International Symposium on Code Generation and Optimization, CGO 2011).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - On-chip cache hierarchy-aware tile scheduling for multicore machines

AU - Liu, Jun

AU - Zhang, Yuanrui

AU - Ding, Wei

AU - Kandemir, Mahmut

PY - 2011/5/30

Y1 - 2011/5/30

N2 - Iteration space tiling and scheduling is an important technique for optimizing loops that constitute a large fraction of execution times in computation kernels of both scientific codes and embedded applications. While tiling has been studied extensively in the context of both uniprocessor and multiprocessor platforms, prior research has paid less attention to tile scheduling, especially when targeting multicore machines with deep on-chip cache hierarchies. In this paper, we propose a cache hierarchy-aware tile scheduling algorithm for multicore machines, with the purpose of maximizing both horizontal and vertical data reuses in on-chip caches, and balancing the workloads across different cores. This scheduling algorithm is one of the key components in a source-to-source translation tool that we developed for automatic loop parallelization and multithreaded code generation from sequential codes. To the best of our knowledge, this is the first effort that develops a fully-automated tile scheduling strategy customized for on-chip cache topologies of multicore machines. The experimental results collected by executing twelve application programs on three commercial Intel machines (Nehalem, Dunnington, and Harpertown) reveal that our cache-aware tile scheduling brings about 27.9% reduction in cache misses, and on average, 13.5% improvement in execution times over an alternate method tested.

AB - Iteration space tiling and scheduling is an important technique for optimizing loops that constitute a large fraction of execution times in computation kernels of both scientific codes and embedded applications. While tiling has been studied extensively in the context of both uniprocessor and multiprocessor platforms, prior research has paid less attention to tile scheduling, especially when targeting multicore machines with deep on-chip cache hierarchies. In this paper, we propose a cache hierarchy-aware tile scheduling algorithm for multicore machines, with the purpose of maximizing both horizontal and vertical data reuses in on-chip caches, and balancing the workloads across different cores. This scheduling algorithm is one of the key components in a source-to-source translation tool that we developed for automatic loop parallelization and multithreaded code generation from sequential codes. To the best of our knowledge, this is the first effort that develops a fully-automated tile scheduling strategy customized for on-chip cache topologies of multicore machines. The experimental results collected by executing twelve application programs on three commercial Intel machines (Nehalem, Dunnington, and Harpertown) reveal that our cache-aware tile scheduling brings about 27.9% reduction in cache misses, and on average, 13.5% improvement in execution times over an alternate method tested.

UR - http://www.scopus.com/inward/record.url?scp=79957454903&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79957454903&partnerID=8YFLogxK

U2 - 10.1109/CGO.2011.5764684

DO - 10.1109/CGO.2011.5764684

M3 - Conference contribution

SN - 9781612843551

T3 - Proceedings - International Symposium on Code Generation and Optimization, CGO 2011

SP - 161

EP - 170

BT - Proceedings - International Symposium on Code Generation and Optimization, CGO 2011

ER -

Liu J, Zhang Y, Ding W, Kandemir M. On-chip cache hierarchy-aware tile scheduling for multicore machines. In Proceedings - International Symposium on Code Generation and Optimization, CGO 2011. 2011. p. 161-170. 5764684. (Proceedings - International Symposium on Code Generation and Optimization, CGO 2011). https://doi.org/10.1109/CGO.2011.5764684