Synergistic TLBs for high performance address translation in Chip Multiprocessors

Shekhar Srikantaiah, Mahmut Kandemir

Research output: Chapter in Book/Report/Conference proceedingConference contribution

30 Citations (Scopus)

Abstract

Translation Look-aside Buffers (TLBs) are vital hardware support for virtual memory management in high performance computer systems and have a momentous influence on overall system performance. Numerous techniques to reduce TLB miss latencies including the impact of TLB size, associativity, multilevel hierarchies, super pages, and prefetching have been well studied in the context of uniprocessors. However, with Chip Multiprocessors (CMPs) becoming the standard design point of processor architectures, it is imperative that we review the design and organization of TLBs in the context of CMPs. In this paper, we propose to improve system performance by means of a novel way of organizing TLBs called Synergistic TLBs. Synergistic TLB is different from per-core private TLB organization in three ways: (i) it provides capacity sharing of TLBs by facilitating storing of victim translations from one TLB in another to emulate a distributed shared TLB (DST); (ii) it supports translation migration for maximizing the utilization of TLB capacity; and (iii) it supports translation replication to avoid excess latency for remote TLB accesses. We explore all the design points in this design space and find that an optimal point exists for high performance address translation. Our evaluation with both multiprogrammed (SPEC 2006 applications) and multithreaded workloads (PARSEC applications) shows that Synergistic TLBs can eliminate, respectively, 44.3% and 31.2% of the TLB misses, on average. It also improves the weighted speedup of multiprogrammed application mixes by 25.1% and performance of multithreaded applications by 27.3%, on average.

Original languageEnglish (US)
Title of host publicationProceedings - 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2010
Pages313-324
Number of pages12
DOIs
StatePublished - Dec 1 2010
Event43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2010 - Atlanta, GA, United States
Duration: Dec 4 2010Dec 8 2010

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
ISSN (Print)1072-4451

Other

Other43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2010
CountryUnited States
CityAtlanta, GA
Period12/4/1012/8/10

Fingerprint

Computer hardware
Computer systems
Data storage equipment

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture

Cite this

Srikantaiah, S., & Kandemir, M. (2010). Synergistic TLBs for high performance address translation in Chip Multiprocessors. In Proceedings - 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2010 (pp. 313-324). [5695546] (Proceedings of the Annual International Symposium on Microarchitecture, MICRO). https://doi.org/10.1109/MICRO.2010.26
Srikantaiah, Shekhar ; Kandemir, Mahmut. / Synergistic TLBs for high performance address translation in Chip Multiprocessors. Proceedings - 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2010. 2010. pp. 313-324 (Proceedings of the Annual International Symposium on Microarchitecture, MICRO).
@inproceedings{8a0290e31f4e4478b9b278d43432c877,
title = "Synergistic TLBs for high performance address translation in Chip Multiprocessors",
abstract = "Translation Look-aside Buffers (TLBs) are vital hardware support for virtual memory management in high performance computer systems and have a momentous influence on overall system performance. Numerous techniques to reduce TLB miss latencies including the impact of TLB size, associativity, multilevel hierarchies, super pages, and prefetching have been well studied in the context of uniprocessors. However, with Chip Multiprocessors (CMPs) becoming the standard design point of processor architectures, it is imperative that we review the design and organization of TLBs in the context of CMPs. In this paper, we propose to improve system performance by means of a novel way of organizing TLBs called Synergistic TLBs. Synergistic TLB is different from per-core private TLB organization in three ways: (i) it provides capacity sharing of TLBs by facilitating storing of victim translations from one TLB in another to emulate a distributed shared TLB (DST); (ii) it supports translation migration for maximizing the utilization of TLB capacity; and (iii) it supports translation replication to avoid excess latency for remote TLB accesses. We explore all the design points in this design space and find that an optimal point exists for high performance address translation. Our evaluation with both multiprogrammed (SPEC 2006 applications) and multithreaded workloads (PARSEC applications) shows that Synergistic TLBs can eliminate, respectively, 44.3{\%} and 31.2{\%} of the TLB misses, on average. It also improves the weighted speedup of multiprogrammed application mixes by 25.1{\%} and performance of multithreaded applications by 27.3{\%}, on average.",
author = "Shekhar Srikantaiah and Mahmut Kandemir",
year = "2010",
month = "12",
day = "1",
doi = "10.1109/MICRO.2010.26",
language = "English (US)",
isbn = "9780769542997",
series = "Proceedings of the Annual International Symposium on Microarchitecture, MICRO",
pages = "313--324",
booktitle = "Proceedings - 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2010",

}

Srikantaiah, S & Kandemir, M 2010, Synergistic TLBs for high performance address translation in Chip Multiprocessors. in Proceedings - 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2010., 5695546, Proceedings of the Annual International Symposium on Microarchitecture, MICRO, pp. 313-324, 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2010, Atlanta, GA, United States, 12/4/10. https://doi.org/10.1109/MICRO.2010.26

Synergistic TLBs for high performance address translation in Chip Multiprocessors. / Srikantaiah, Shekhar; Kandemir, Mahmut.

Proceedings - 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2010. 2010. p. 313-324 5695546 (Proceedings of the Annual International Symposium on Microarchitecture, MICRO).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Synergistic TLBs for high performance address translation in Chip Multiprocessors

AU - Srikantaiah, Shekhar

AU - Kandemir, Mahmut

PY - 2010/12/1

Y1 - 2010/12/1

N2 - Translation Look-aside Buffers (TLBs) are vital hardware support for virtual memory management in high performance computer systems and have a momentous influence on overall system performance. Numerous techniques to reduce TLB miss latencies including the impact of TLB size, associativity, multilevel hierarchies, super pages, and prefetching have been well studied in the context of uniprocessors. However, with Chip Multiprocessors (CMPs) becoming the standard design point of processor architectures, it is imperative that we review the design and organization of TLBs in the context of CMPs. In this paper, we propose to improve system performance by means of a novel way of organizing TLBs called Synergistic TLBs. Synergistic TLB is different from per-core private TLB organization in three ways: (i) it provides capacity sharing of TLBs by facilitating storing of victim translations from one TLB in another to emulate a distributed shared TLB (DST); (ii) it supports translation migration for maximizing the utilization of TLB capacity; and (iii) it supports translation replication to avoid excess latency for remote TLB accesses. We explore all the design points in this design space and find that an optimal point exists for high performance address translation. Our evaluation with both multiprogrammed (SPEC 2006 applications) and multithreaded workloads (PARSEC applications) shows that Synergistic TLBs can eliminate, respectively, 44.3% and 31.2% of the TLB misses, on average. It also improves the weighted speedup of multiprogrammed application mixes by 25.1% and performance of multithreaded applications by 27.3%, on average.

AB - Translation Look-aside Buffers (TLBs) are vital hardware support for virtual memory management in high performance computer systems and have a momentous influence on overall system performance. Numerous techniques to reduce TLB miss latencies including the impact of TLB size, associativity, multilevel hierarchies, super pages, and prefetching have been well studied in the context of uniprocessors. However, with Chip Multiprocessors (CMPs) becoming the standard design point of processor architectures, it is imperative that we review the design and organization of TLBs in the context of CMPs. In this paper, we propose to improve system performance by means of a novel way of organizing TLBs called Synergistic TLBs. Synergistic TLB is different from per-core private TLB organization in three ways: (i) it provides capacity sharing of TLBs by facilitating storing of victim translations from one TLB in another to emulate a distributed shared TLB (DST); (ii) it supports translation migration for maximizing the utilization of TLB capacity; and (iii) it supports translation replication to avoid excess latency for remote TLB accesses. We explore all the design points in this design space and find that an optimal point exists for high performance address translation. Our evaluation with both multiprogrammed (SPEC 2006 applications) and multithreaded workloads (PARSEC applications) shows that Synergistic TLBs can eliminate, respectively, 44.3% and 31.2% of the TLB misses, on average. It also improves the weighted speedup of multiprogrammed application mixes by 25.1% and performance of multithreaded applications by 27.3%, on average.

UR - http://www.scopus.com/inward/record.url?scp=79951714115&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79951714115&partnerID=8YFLogxK

U2 - 10.1109/MICRO.2010.26

DO - 10.1109/MICRO.2010.26

M3 - Conference contribution

SN - 9780769542997

T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO

SP - 313

EP - 324

BT - Proceedings - 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2010

ER -

Srikantaiah S, Kandemir M. Synergistic TLBs for high performance address translation in Chip Multiprocessors. In Proceedings - 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2010. 2010. p. 313-324. 5695546. (Proceedings of the Annual International Symposium on Microarchitecture, MICRO). https://doi.org/10.1109/MICRO.2010.26