TY - GEN
T1 - Neighborhood-aware data locality optimization for NoC-based multicores
AU - Kandemir, Mahmut
AU - Zhang, Yuanrui
AU - Liu, Jun
AU - Yemliha, Taylan
PY - 2011
Y1 - 2011
N2 - Data locality optimization is a critical issue for NoC (network-on-chip) based multicore systems. In this paper, focusing on a two-dimensional NoC-based multicore and dataintensive multithreaded applications, we first discuss a data locality aware scheduling algorithm for any given computation-to-core mapping, and then propose an integrated mapping+scheduling algorithm that performs both tasks together. Both our algorithms consider temporal (time-wise) and spatial (neighborhood-aware) data reuse, and try to minimize distance-to-data in on-chip cache accesses. We test the effectiveness of our compiler algorithms using a set of twelve application programs. Our experiments indicate that the proposed algorithms achieve significant improvements in data access latencies (42.7% on average) and overall execution times (24.1% on average). We also conduct a sensitivity analysis where we change the number of cores, on-chip cache capacities, and data movement (migration) strategies. These experiments show that our proposed algorithms generate consistently good results.
AB - Data locality optimization is a critical issue for NoC (network-on-chip) based multicore systems. In this paper, focusing on a two-dimensional NoC-based multicore and dataintensive multithreaded applications, we first discuss a data locality aware scheduling algorithm for any given computation-to-core mapping, and then propose an integrated mapping+scheduling algorithm that performs both tasks together. Both our algorithms consider temporal (time-wise) and spatial (neighborhood-aware) data reuse, and try to minimize distance-to-data in on-chip cache accesses. We test the effectiveness of our compiler algorithms using a set of twelve application programs. Our experiments indicate that the proposed algorithms achieve significant improvements in data access latencies (42.7% on average) and overall execution times (24.1% on average). We also conduct a sensitivity analysis where we change the number of cores, on-chip cache capacities, and data movement (migration) strategies. These experiments show that our proposed algorithms generate consistently good results.
UR - http://www.scopus.com/inward/record.url?scp=79957447964&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79957447964&partnerID=8YFLogxK
U2 - 10.1109/CGO.2011.5764687
DO - 10.1109/CGO.2011.5764687
M3 - Conference contribution
AN - SCOPUS:79957447964
SN - 9781612843551
T3 - Proceedings - International Symposium on Code Generation and Optimization, CGO 2011
SP - 191
EP - 200
BT - Proceedings - International Symposium on Code Generation and Optimization, CGO 2011
T2 - 9th International Symposium on Code Generation and Optimization, CGO 2011
Y2 - 2 April 2011 through 6 April 2011
ER -