TY - GEN
T1 - Optimizing data layouts for parallel computation on multicores
AU - Zhang, Yuanrui
AU - Ding, Wei
AU - Liu, Jun
AU - Kandemir, Mahmut
PY - 2011/12/1
Y1 - 2011/12/1
N2 - The emergence of multicore platforms offers several opportunities for boosting application performance. These opportunities, which include parallelism and data locality benefits, require strong support from compilers as well as operating systems. Current compiler research targeting multicores mostly focuses on code restructuring and mapping. In this work, we explore automatic data layout transformation targeting multithreaded applications running on multicores. Our transformation considers both data access patterns exhibited by different threads of a multithreaded application and the onchip cache topology of the target multicore architecture. It automatically determines a customized memory layout for each target array to minimize potential cache conflicts across threads. Our experiments show that, our optimization brings significant benefits over state-of-the-art data locality optimization strategies when tested using 30 benchmark programs on an Intel multicore machine. The results also indicate that this strategy is able to scale to larger core counts and it performs better with increased data set sizes.
AB - The emergence of multicore platforms offers several opportunities for boosting application performance. These opportunities, which include parallelism and data locality benefits, require strong support from compilers as well as operating systems. Current compiler research targeting multicores mostly focuses on code restructuring and mapping. In this work, we explore automatic data layout transformation targeting multithreaded applications running on multicores. Our transformation considers both data access patterns exhibited by different threads of a multithreaded application and the onchip cache topology of the target multicore architecture. It automatically determines a customized memory layout for each target array to minimize potential cache conflicts across threads. Our experiments show that, our optimization brings significant benefits over state-of-the-art data locality optimization strategies when tested using 30 benchmark programs on an Intel multicore machine. The results also indicate that this strategy is able to scale to larger core counts and it performs better with increased data set sizes.
UR - http://www.scopus.com/inward/record.url?scp=84863057721&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863057721&partnerID=8YFLogxK
U2 - 10.1109/PACT.2011.20
DO - 10.1109/PACT.2011.20
M3 - Conference contribution
AN - SCOPUS:84863057721
SN - 9780769545660
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 143
EP - 154
BT - Proceedings - 2011 International Conference on Parallel Architectures and Compilation Techniques, PACT 2011
T2 - 20th International Conference on Parallel Architectures and Compilation Techniques, PACT 2011
Y2 - 10 October 2011 through 14 October 2011
ER -