TY - JOUR
T1 - Data Convection
T2 - A GPU-Driven Case Study for Thermal-Aware Data Placement in 3D DRAMs
AU - Khadirsharbiyani, Soheil
AU - Kotra, Jagadish
AU - Rao, Karthik
AU - Kandemir, Mahmut Taylan
N1 - Funding Information:
While our Data Convection enables thermally aware data migration, we observed that thermal overheads alone may not deliver ideal performance. Because migrating frequently requested data to a colder part of stacked DRAM might cause contention to bandwidth and degrade performance. To reduce thermal overheads, our Data Convection algorithm migrates data in an access-parallelism aware manner, eliminating artificial bottlenecks. Further details on our algorithms can be found in Section-4 of our paper. 2 KEY RESULTS AND CONTRIBUTIONS We evaluated Data Convection over 196 combinations of 14 workloads using a heavily modified GPGPUSim integrated with Ramula-tor simulators (along with GPUWatch and Hotspot models). More details on our evaluation setup, workloads and baseline can be found in the Experimental Setup and Methodology section of the paper. Figure 2b shows the performance comparison between Data Convection and an optimized baseline containing both 3D and 2.5 DRAM. The results show that in most cases, Data Convection outperforms the baseline, resulting in an average performance improvement of 1.8%, 11.7%, and 14.4% for Intra-layer, Inter-layer and Intra + Inter-layer Data Convection. Our results also show that Data Convection can achieve up to 9.3% decrease in energy consumption as a result of reducing the overall execution time. ACKNOWLEDGMENTS The authors would like to thank the anonymous SIGMETRICS reviewers, for their constructive feedback, and Kaveh Razavi, for shepherding the paper. The material presented in this paper is based upon work supported by the National Science Foundation under Grant Numbers 2119236, 2122155, 2028929, 1931531, and 1763681. The content of this paper is the responsibility of the authors and does not necessarily represent the official views of NSF or AMD.
Publisher Copyright:
© 2022 Owner/Author.
PY - 2022/6
Y1 - 2022/6
N2 - Stacked DRAMs have been studied and productized in the last decade. The large available bandwidth they offer makes them an attractive choice, particularly, in high-performance computing (HPC) environments. Consequently, many prior research efforts have studied and evaluated 3D stacked DRAM-based designs. Despite offering high bandwidth, stacked DRAMs are severely constrained by the overall memory capacity offered. In this paper, we study and evaluate integrating stacked DRAM on top of a GPU in a 3D manner which in tandem with the 2.5D stacked DRAM boosts the capacity and the bandwidth without increasing the package size. It also helps meet the capacity needs of emergent workloads like deep learning. However, the bandwidth given by these 3D stacked DRAMs is significantly constrained by the GPU's heat production. Our investigations on a cycle-level simulator show that the 3D stacked DRAM portions closest to the GPU have shorter retention times than the layers further away. Depending on the retention period, certain regions of 3D stacked DRAM are refreshed more frequently than others, leading to thermally-induced NUMA paradigms. Our proposed approach attempts to place the most frequently requested data in a thermally conscious manner, taking into consideration both bank-level parallelism and channel-level parallelism. The results collected with a cycle-level GPU simulator indicate that the three implementations of our proposed approach lead to 1.8%, 11.7%, and 14.4% performance improvements, over a baseline that already includes 3D+2.5D stacked DRAMs.
AB - Stacked DRAMs have been studied and productized in the last decade. The large available bandwidth they offer makes them an attractive choice, particularly, in high-performance computing (HPC) environments. Consequently, many prior research efforts have studied and evaluated 3D stacked DRAM-based designs. Despite offering high bandwidth, stacked DRAMs are severely constrained by the overall memory capacity offered. In this paper, we study and evaluate integrating stacked DRAM on top of a GPU in a 3D manner which in tandem with the 2.5D stacked DRAM boosts the capacity and the bandwidth without increasing the package size. It also helps meet the capacity needs of emergent workloads like deep learning. However, the bandwidth given by these 3D stacked DRAMs is significantly constrained by the GPU's heat production. Our investigations on a cycle-level simulator show that the 3D stacked DRAM portions closest to the GPU have shorter retention times than the layers further away. Depending on the retention period, certain regions of 3D stacked DRAM are refreshed more frequently than others, leading to thermally-induced NUMA paradigms. Our proposed approach attempts to place the most frequently requested data in a thermally conscious manner, taking into consideration both bank-level parallelism and channel-level parallelism. The results collected with a cycle-level GPU simulator indicate that the three implementations of our proposed approach lead to 1.8%, 11.7%, and 14.4% performance improvements, over a baseline that already includes 3D+2.5D stacked DRAMs.
UR - http://www.scopus.com/inward/record.url?scp=85133965790&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85133965790&partnerID=8YFLogxK
U2 - 10.1145/3489048.3522647
DO - 10.1145/3489048.3522647
M3 - Article
AN - SCOPUS:85133965790
SN - 0163-5999
VL - 50
SP - 37
EP - 38
JO - Performance Evaluation Review
JF - Performance Evaluation Review
IS - 1
ER -