Decentralized Offload-based Execution on Memory-centric Compute Cores

Saambhavi Baskaran, Jack Sampson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the end of Dennard scaling, power constraints have led to increasing compute specialization in the form of differently specialized accelerators integrated at various levels of the general-purpose system hierarchy. The result is that the most common general-purpose computing platform is now a heterogeneous mix of architectures even within a single die. Consequently, mapping application code regions into available execution engines has become a challenge due to different interfaces and increased software complexity. At the same time, the energy costs of data movement have become increasingly dominant relative to computation energy. This has inspired a move towards data-centric systems, where computation is brought to data, in contrast to traditional processing-centric models. However, enabling compute nearer memory entails its own challenges, including the interactions between distance-specialization and compute-specialization. The granularity of any offload to near(er) memory logic would impact the potential data transmission reduction, as smaller offloads will not be able to amortize the transmission costs of invocation and data return, while very large offloads can only be mapped onto logic that can support all of the necessary operations within kernel-scale codes, which exacerbates both area and power constraints. For better energy efficiency, each set of related operations should be mapped onto the execution engine that, among those capable of running the set of operations, best balances the data movement and the degree of compute specialization of that engine for this code. Further, this offload should proceed in a decentralized way that keeps both the data and control movement low for all transitions among engines and transmissions of operands and results. To enable such a decentralized offload model, we propose an architecture interface that enables a common offload model for accelerators across the memory hierarchy and a tool chain to automatically identify (in a distance-aware fashion) and map profitable code regions on specialized execution engines. We evaluate the proposed architecture for a wide range of workloads and show energy reduction compared to an energy-efficient in-order core. We also demonstrate better area efficiency compared to kernel-scale offloads.

Original languageEnglish (US)
Title of host publicationMEMSYS 2020 - Proceedings of the International Symposium on Memory Systems
PublisherAssociation for Computing Machinery
Pages61-76
Number of pages16
ISBN (Electronic)9781450388993
DOIs
StatePublished - Sep 28 2020
Event2020 International Symposium on Memory Systems, MEMSYS 2020 - Washington, United States
Duration: Sep 28 2020Oct 1 2020

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2020 International Symposium on Memory Systems, MEMSYS 2020
Country/TerritoryUnited States
CityWashington
Period9/28/2010/1/20

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Decentralized Offload-based Execution on Memory-centric Compute Cores'. Together they form a unique fingerprint.

Cite this