TY - GEN
T1 - Characterizing diverse handheld apps for customized hardware acceleration
AU - Rengasamy, Prasanna Venkatesh
AU - Zhang, Haibo
AU - Nachiappan, Nachiappan Chidambaram
AU - Zhao, Shulin
AU - Sivasubramaniam, Anand
AU - Kandemir, Mahmut T.
AU - Das, Chita R.
N1 - Funding Information:
This research is supported in part by NSF grants 1213052, 1302557, 1317560, 1320478, 1409095, 1439021, 1439057, 1626251, 1629129, 1629915, 1714389, 1526750, and Intel. We would also like to thank Jack Sampson for his feedback on this paper.
PY - 2017/12/5
Y1 - 2017/12/5
N2 - Current handhelds incorporate a variety of acceler-ators/IPs for improving their performance and energy efficiency. While these IPs are extremely useful for accelerating parts of a computation, the CPU still expends a significant amount of time and energy in the overall execution. Coarse grain customized hardware of Android APIs and methods, though widely useful, is also not an option due to the high hardware costs. Instead, we propose a fine-grain sequence of instructions, called a Load-to-Store (LOST) sequence, for hardware customization. A LOST sequence starts with a load and ends with a store, including dependent instructions in between. Unlike prior approaches to customization, a LOST sequence is defined based on a sequence of opcodes rather than a sequence of PC addresses or operands. We identify such commonly occurring LOST sequences within and across several popular apps and propose a design to integrate these customized hardware sequences as macro functional units into the CPU data-path. Detailed evaluation shows that such customized LOST sequences can provide an average of 25% CPU speedup, or 12% speedup for the entire system.
AB - Current handhelds incorporate a variety of acceler-ators/IPs for improving their performance and energy efficiency. While these IPs are extremely useful for accelerating parts of a computation, the CPU still expends a significant amount of time and energy in the overall execution. Coarse grain customized hardware of Android APIs and methods, though widely useful, is also not an option due to the high hardware costs. Instead, we propose a fine-grain sequence of instructions, called a Load-to-Store (LOST) sequence, for hardware customization. A LOST sequence starts with a load and ends with a store, including dependent instructions in between. Unlike prior approaches to customization, a LOST sequence is defined based on a sequence of opcodes rather than a sequence of PC addresses or operands. We identify such commonly occurring LOST sequences within and across several popular apps and propose a design to integrate these customized hardware sequences as macro functional units into the CPU data-path. Detailed evaluation shows that such customized LOST sequences can provide an average of 25% CPU speedup, or 12% speedup for the entire system.
UR - http://www.scopus.com/inward/record.url?scp=85045392279&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85045392279&partnerID=8YFLogxK
U2 - 10.1109/IISWC.2017.8167776
DO - 10.1109/IISWC.2017.8167776
M3 - Conference contribution
AN - SCOPUS:85045392279
T3 - Proceedings of the 2017 IEEE International Symposium on Workload Characterization, IISWC 2017
SP - 187
EP - 196
BT - Proceedings of the 2017 IEEE International Symposium on Workload Characterization, IISWC 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2017 IEEE International Symposium on Workload Characterization, IISWC 2017
Y2 - 1 October 2017 through 3 October 2017
ER -