TY - JOUR
T1 - HL-PCM
T2 - MLC PCM Main Memory with Accelerated Read
AU - Arjomand, Mohammad
AU - Jadidi, Amin
AU - Kandemir, Mahmut T.
AU - Sivasubramaniam, Anand
AU - Das, Chita R.
N1 - Funding Information:
We thank the reviewers for their valuable suggestions. This work is supported in part by NSF grants 1302557, 1213052, 1439021, 1302225, 1629129, 1526750, and 1629915 and a grant from Intel.
Publisher Copyright:
© 2017 IEEE.
PY - 2017/11/1
Y1 - 2017/11/1
N2 - Multi-Level Cell Phase Change Memory (MLC PCM) is a promising candidate technology for DRAM replacement in main memory of modern computers. Despite of its high density and low power advantages, this technology seriously suffers from slow read and write operations. While prior works extensively studied the problem of slow write, this paper targets high read latency problem in MLC PCM and introduces an architecture mechanism to overcome it. To this end, we rely on the fact that reading different bits from an MLC cell takes different latencies, i.e., for a 2-bit MLC, reading its Most-Significant Bit (MSB) is fast, while reading its Least-Significant Bits (LSBs) is slower. We then propose Half-Line PCM (HL-PCM), a novel memory architecture that leverages this non-uniformity in reading MLC PCM's content to send a requested memory block to the processor in different cycles-it sends half of a memory block to the processor ahead of the other half. If the processor requested a word belonging to the first half, it can resume its execution on receiving the first half, while the other half has not sent yet and scheduled to be received by the memory controller later. HL-PCM is easy and simple to implement, i.e., it needs minor modifications at memory controller, the search/evict policies at last level cache, as well as data layout in main memory. Our experimental results show that the proposed design improves the average memory access latency by 33-43 percent and program's execution time by 23 percent, on average, while incurring negligible overhead at memory controller and PCM DIMM, in a 16-core chip multiprocessor (CMP) running memory-intensive benchmarks.
AB - Multi-Level Cell Phase Change Memory (MLC PCM) is a promising candidate technology for DRAM replacement in main memory of modern computers. Despite of its high density and low power advantages, this technology seriously suffers from slow read and write operations. While prior works extensively studied the problem of slow write, this paper targets high read latency problem in MLC PCM and introduces an architecture mechanism to overcome it. To this end, we rely on the fact that reading different bits from an MLC cell takes different latencies, i.e., for a 2-bit MLC, reading its Most-Significant Bit (MSB) is fast, while reading its Least-Significant Bits (LSBs) is slower. We then propose Half-Line PCM (HL-PCM), a novel memory architecture that leverages this non-uniformity in reading MLC PCM's content to send a requested memory block to the processor in different cycles-it sends half of a memory block to the processor ahead of the other half. If the processor requested a word belonging to the first half, it can resume its execution on receiving the first half, while the other half has not sent yet and scheduled to be received by the memory controller later. HL-PCM is easy and simple to implement, i.e., it needs minor modifications at memory controller, the search/evict policies at last level cache, as well as data layout in main memory. Our experimental results show that the proposed design improves the average memory access latency by 33-43 percent and program's execution time by 23 percent, on average, while incurring negligible overhead at memory controller and PCM DIMM, in a 16-core chip multiprocessor (CMP) running memory-intensive benchmarks.
UR - http://www.scopus.com/inward/record.url?scp=85032448068&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85032448068&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2017.2705125
DO - 10.1109/TPDS.2017.2705125
M3 - Article
AN - SCOPUS:85032448068
SN - 1045-9219
VL - 28
SP - 3188
EP - 3200
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 11
M1 - 7930492
ER -