TY - JOUR
T1 - Quantifying Data Locality in Dynamic Parallelism in GPUs
AU - Tang, Xulong
AU - Pattnaik, Ashutosh
AU - Kayiran, Onur
AU - Jog, Adwait
AU - Kandemir, Mahmut Taylan
AU - Das, Chita
N1 - Funding Information:
We thank Ganesh Ananthanarayanan for shepherding our paper. We also thank the anonymous reviewers for their constructive feedback. This research is supported in part by NSF grants #1526750, #1763681, #1439057, #1439021, #1629129, #1409095, #1626251, #1629915, #1657336 and #1750667. AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identiication purposes only and may be trademarks of their respective companies. OpenCL is a trademark of Apple Inc. used by permission by Khronos Group, Inc.
Publisher Copyright:
© 2019 Copyright is held by the owner/author(s).
PY - 2019/12/17
Y1 - 2019/12/17
N2 - Dynamic parallelism (DP) is a new feature of emerging GPUs that allows new kernels to be generated and scheduled from the deviceside (GPU) without the host-side (CPU) intervention. To eiciently support DP, one of the major challenges is to saturate the GPU processing elements and provide them with the required data in a timely fashion. In this paper, we irst conduct a limit study on the performance improvements that can be achieved by hardware schedulers that are provided with accurate data reuse information. We next propose LASER, a Locality-Aware SchedulER, where the hardware schedulers employ data reuse monitors to help make scheduling decisions to improve data locality at runtime. Experimental results on 16 benchmarks show that LASER, on an average, can improve performance by 11.3%.
AB - Dynamic parallelism (DP) is a new feature of emerging GPUs that allows new kernels to be generated and scheduled from the deviceside (GPU) without the host-side (CPU) intervention. To eiciently support DP, one of the major challenges is to saturate the GPU processing elements and provide them with the required data in a timely fashion. In this paper, we irst conduct a limit study on the performance improvements that can be achieved by hardware schedulers that are provided with accurate data reuse information. We next propose LASER, a Locality-Aware SchedulER, where the hardware schedulers employ data reuse monitors to help make scheduling decisions to improve data locality at runtime. Experimental results on 16 benchmarks show that LASER, on an average, can improve performance by 11.3%.
UR - http://www.scopus.com/inward/record.url?scp=85086498625&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85086498625&partnerID=8YFLogxK
U2 - 10.1145/3309697.3331473
DO - 10.1145/3309697.3331473
M3 - Article
AN - SCOPUS:85086498625
VL - 47
SP - 25
EP - 26
JO - Performance Evaluation Review
JF - Performance Evaluation Review
SN - 0163-5999
IS - 1
ER -