TY - GEN
T1 - Quantifying the Potential Benefits of On-chip Near-Data Computing in Manycore Processors
AU - Kotra, Jagadish B.
AU - Guttman, Diana
AU - Chidamabaram, Nachiappan
AU - Kandemir, Mahmut T.
AU - Das, Chita R.
N1 - Funding Information:
This material is based upon work supported by the National Science Foundation under Grants 1439021, 1439057, 1213052, 1409095, and 1629129.
Funding Information:
Our approach instead of offloading the computations to DRAM, offload the computations to service cores closer to the memory controllers. As a result, our approach targets the on-chip latency instead of the memory bandwidth problem unlike the prior approaches. As a result, our approach allows data accessed by the host-core to cache the data in on-chip caches. As the data accessed by the service core will be taken care by the on-chip cache coherency. Similarly, the on-chip coherent TLBs does not necessitate enhanced virtual-memory support nor the static pinning of memory pages in main memory. By addressing these overall-system design challenges imposed by the prior near-data computing techniques, our proposed hardware-techniques can be readily adapted by industry to mitigate some of the on-chip data movement costs. IX. CONCLUSION In this paper we look at the problem of near-data computing from the perspective of a manycore system. This work is the first to our knowledge which evaluated the potential benefits of on-chip near data computing by employing service cores closer to memory controllers. More specifically, we (i) quantify the performance benefits of three different incarnations of near-data computing and show that the most effective of them can bring as high as 75% performance improvements, and (ii) discuss and experimentally evaluate three different implementations that can achieve some of the potential benefits of perfect near-data computing. Our future work includes (i) investigation of service core placement, (ii) testing the benefits of our approach in multi-node configurations, and (iii) collecting experimental data on emerging enterprise applications. ACKNOWLEDGMENT We thank our shepherd Prof. Jason Liu for his valuable comments and feedback. This material is based upon work supported by the National Science Foundation under Grants 1439021, 1439057, 1213052, 1409095, and 1629129.
Publisher Copyright:
© 2017 IEEE.
PY - 2017/11/13
Y1 - 2017/11/13
N2 - Increasing data set sizes motivate for a shift of focus from computation-centric systems to data-centric systems, where data movement is treated as a first-class optimization metric. An example of this emerging paradigm is in-situ computing in largescale computing systems. Observing that data movement costs are increasing at an exponential rate even at a node level (as a node itself is fast-becoming a large manycore system), this paper provides a limit study of near-data computing within a manycore chip. Specifically, it makes the following two contributions. First, it quantifies the potential performance benefits of three incarnations of the near-data computing paradigm under the assumption of zero on-chip network latency and an infinite number of extra cores for offloading computations close to data they require. Our detailed experimental evaluation indicates that the most successful of these incarnations can boost the performance of the original execution by as much as 75%. The second contribution of this paper is an investigation of more realistic schemes that can approximate the potential savings achieved by perfect near-data computing. Our results demonstrate performance improvements ranging between 44% and 52%, over the original execution. We also discuss the pros and cons of each of these realistic schemes, and point to further research directions.
AB - Increasing data set sizes motivate for a shift of focus from computation-centric systems to data-centric systems, where data movement is treated as a first-class optimization metric. An example of this emerging paradigm is in-situ computing in largescale computing systems. Observing that data movement costs are increasing at an exponential rate even at a node level (as a node itself is fast-becoming a large manycore system), this paper provides a limit study of near-data computing within a manycore chip. Specifically, it makes the following two contributions. First, it quantifies the potential performance benefits of three incarnations of the near-data computing paradigm under the assumption of zero on-chip network latency and an infinite number of extra cores for offloading computations close to data they require. Our detailed experimental evaluation indicates that the most successful of these incarnations can boost the performance of the original execution by as much as 75%. The second contribution of this paper is an investigation of more realistic schemes that can approximate the potential savings achieved by perfect near-data computing. Our results demonstrate performance improvements ranging between 44% and 52%, over the original execution. We also discuss the pros and cons of each of these realistic schemes, and point to further research directions.
UR - http://www.scopus.com/inward/record.url?scp=85040531134&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85040531134&partnerID=8YFLogxK
U2 - 10.1109/MASCOTS.2017.26
DO - 10.1109/MASCOTS.2017.26
M3 - Conference contribution
AN - SCOPUS:85040531134
T3 - Proceedings - 25th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2017
SP - 198
EP - 209
BT - Proceedings - 25th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS 2017
Y2 - 20 September 2017 through 22 September 2017
ER -