TY - GEN
T1 - CAPE
T2 - 27th Annual IEEE International Symposium on High Performance Computer Architecture, HPCA 2021
AU - Caminal, Helena
AU - Yang, Kailin
AU - Srinivasa, Srivatsa
AU - Ramanathan, Akshay Krishna
AU - Al-Hawaj, Khalid
AU - Wu, Tianshu
AU - Narayanan, Vijaykrishnan
AU - Batten, Christopher
AU - Martinez, Jose F.
N1 - Funding Information:
This work was supported in part by the Semiconductor Research Corporation (SRC) through the Center for Research on Intelligent Storage and Processing-in-memory (CRISP) and the Center for Applications Driving Architectures (ADA), two of six centers of the JUMP program co-sponsored by DARPA; by SRC and NSF through an E2CDA NSF Award #1740136, by NSF Award #2008365, and by NSF SHF Award #2008471. We thank Olalekan Afuye and Alyssa Apsel for early discussions and help with circuit design; Angela Jin, Ysabel Tan, and Socrates Wong for their help with experiments; Giacomo Gabrielli and Giacomo Travaglini for their help with some of the Arm tools; Michael Woodson for his technical support; and Ameen Akel and Sean Eilert from Micron Technology for their advice.
Publisher Copyright:
© 2021 IEEE.
PY - 2021/2
Y1 - 2021/2
N2 - Processing-in-memory (PIM) architectures attempt to overcome the von Neumann bottleneck by combining computation and storage logic into a single component. The content-Addressable parallel processing paradigm (CAPP) from the seventies is an in-situ PIM architecture that leverages content-Addressable memories to realize bit-serial arithmetic and logic operations, via sequences of search and update operations over multiple memory rows in parallel. In this paper, we set out to investigate whether the concepts behind classic CAPP can be used successfully to build an entirely CMOS-based, general-purpose microarchitecture that can deliver manyfold speedups while remaining highly programmable. We conduct a full-stack design of a Content-Addressable Processing Engine (CAPE), built out of dense push-rule 6T SRAM arrays. CAPE is programmable using the RISC-V ISA with standard vector extensions. Our experiments show that CAPE achieves an average speedup of 14 (up to 254) over an area-equivalent (slightly under 9 mm2 at 7 nm) out-of-order processor core with three levels of caches.
AB - Processing-in-memory (PIM) architectures attempt to overcome the von Neumann bottleneck by combining computation and storage logic into a single component. The content-Addressable parallel processing paradigm (CAPP) from the seventies is an in-situ PIM architecture that leverages content-Addressable memories to realize bit-serial arithmetic and logic operations, via sequences of search and update operations over multiple memory rows in parallel. In this paper, we set out to investigate whether the concepts behind classic CAPP can be used successfully to build an entirely CMOS-based, general-purpose microarchitecture that can deliver manyfold speedups while remaining highly programmable. We conduct a full-stack design of a Content-Addressable Processing Engine (CAPE), built out of dense push-rule 6T SRAM arrays. CAPE is programmable using the RISC-V ISA with standard vector extensions. Our experiments show that CAPE achieves an average speedup of 14 (up to 254) over an area-equivalent (slightly under 9 mm2 at 7 nm) out-of-order processor core with three levels of caches.
UR - http://www.scopus.com/inward/record.url?scp=85105022896&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105022896&partnerID=8YFLogxK
U2 - 10.1109/HPCA51647.2021.00054
DO - 10.1109/HPCA51647.2021.00054
M3 - Conference contribution
AN - SCOPUS:85105022896
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 557
EP - 569
BT - Proceeding - 27th IEEE International Symposium on High Performance Computer Architecture, HPCA 2021
PB - IEEE Computer Society
Y2 - 27 February 2021 through 1 March 2021
ER -