TY - GEN
T1 - Look-up table based energy efficient processing in cache support for neural network acceleration
AU - Ramanathan, Akshay Krishna
AU - Kalsi, Gurpreet S.
AU - Srinivasa, Srivatsa
AU - Chandran, Tarun Makesh
AU - Pillai, Kamlesh R.
AU - Omer, Om J.
AU - Narayanan, Vijaykrishnan
AU - Subramoney, Sreenivas
N1 - Funding Information:
*This work was done as part of internship at Processor Architecture Research Lab, Intel Labs, Bangalore, KA, India. This work was supported in part by Semiconductor Research Corporation (SRC) Center for Research in Intelligent Storage and Processing in Memory (CRISP).
Publisher Copyright:
© 2020 IEEE.
PY - 2020/10
Y1 - 2020/10
N2 - This paper presents a Look-Up Table (LUT) based Processing-In-Memory (PIM) technique with the potential for running Neural Network inference tasks. We implement a bitline computing free technique to avoid frequent bitline accesses to the cache sub-arrays and thereby considerably reducing the memory access energy overhead. LUT in conjunction with the compute engines enables sub-array level parallelism while executing complex operations through data lookup which otherwise requires multiple cycles. Sub-array level parallelism and systolic input data flow ensure data movement to be confined to the SRAM slice.Our proposed LUT based PIM methodology exploits substantial parallelism using look-up tables, which does not alter the memory structure/organization, that is, preserving the bit-cell and peripherals of the existing SRAM monolithic arrays. Our solution achieves 1.72x higher performance and 3.14x lower energy as compared to a state-of-the-art processing-in-cache solution. Sub-array level design modifications to incorporate LUT along with the compute engines will increase the overall cache area by 5.6%. We achieve 3.97x speedup w.r.t neural network systolic accelerator with a similar area. The re-configurable nature of the compute engines enables various neural network operations and thereby supporting sequential networks (RNNs) and transformer models. Our quantitative analysis demonstrates 101x, 3x faster execution and 91x, 11x energy efficient than CPU and GPU respectively while running the transformer model, BERT-Base.
AB - This paper presents a Look-Up Table (LUT) based Processing-In-Memory (PIM) technique with the potential for running Neural Network inference tasks. We implement a bitline computing free technique to avoid frequent bitline accesses to the cache sub-arrays and thereby considerably reducing the memory access energy overhead. LUT in conjunction with the compute engines enables sub-array level parallelism while executing complex operations through data lookup which otherwise requires multiple cycles. Sub-array level parallelism and systolic input data flow ensure data movement to be confined to the SRAM slice.Our proposed LUT based PIM methodology exploits substantial parallelism using look-up tables, which does not alter the memory structure/organization, that is, preserving the bit-cell and peripherals of the existing SRAM monolithic arrays. Our solution achieves 1.72x higher performance and 3.14x lower energy as compared to a state-of-the-art processing-in-cache solution. Sub-array level design modifications to incorporate LUT along with the compute engines will increase the overall cache area by 5.6%. We achieve 3.97x speedup w.r.t neural network systolic accelerator with a similar area. The re-configurable nature of the compute engines enables various neural network operations and thereby supporting sequential networks (RNNs) and transformer models. Our quantitative analysis demonstrates 101x, 3x faster execution and 91x, 11x energy efficient than CPU and GPU respectively while running the transformer model, BERT-Base.
UR - http://www.scopus.com/inward/record.url?scp=85097329311&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85097329311&partnerID=8YFLogxK
U2 - 10.1109/MICRO50266.2020.00020
DO - 10.1109/MICRO50266.2020.00020
M3 - Conference contribution
AN - SCOPUS:85097329311
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 88
EP - 101
BT - Proceedings - 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2020
PB - IEEE Computer Society
T2 - 53rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2020
Y2 - 17 October 2020 through 21 October 2020
ER -