Trifecta: A nonspeculative scheme to exploit common, data-dependent subcritical paths

Patrick Ndai, Nauman Rafique, Mithuna Thottethodi, Swaroop Ghosh, Swarup Bhunia, Kaushik Roy

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

Pipelined processor cores are conventionally designed to accommodate the critical paths in the critical pipeline stage(s) in a single clock cycle, to ensure correctness. Such conservative design is wasteful in many cases since critical paths are rarely exercised. Thus, configuring the pipeline to operate correctly for rarely used critical paths targets the uncommon case instead of optimizing for the common case. In this study, we describe Trifecta - an architectural technique that completes common-case, subcritical path operations in a single cycle but uses two cycles when the critical path is exercised. This increases slack for both single- and twocycle operations and offers a unique advantage under process variation. In contrast with existing mechanisms that trade power or performance for yield, Trifecta improves the yield while preserving performance and power. We applied this technique to the critical pipeline stages of a superscalar out-of-order (OoO) and a single issue in-order processor, namely instruction issue and execute, respectively. Our experiments show that the rare two-cycle operations result in a small decrease (5% for integer and 2% for floating-point benchmarks of SPEC2000) in instructions per cycle. However, the increased delay slack causes an improvement in yieldadjusted-throughput by 20% (12.7%) for an in-order (InO) processor configuration.

Original languageEnglish (US)
Article number4895686
Pages (from-to)53-65
Number of pages13
JournalIEEE Transactions on Very Large Scale Integration (VLSI) Systems
Volume18
Issue number1
DOIs
StatePublished - Jan 1 2010

Fingerprint

Pipelines
Clocks
Throughput
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Electrical and Electronic Engineering

Cite this

Ndai, Patrick ; Rafique, Nauman ; Thottethodi, Mithuna ; Ghosh, Swaroop ; Bhunia, Swarup ; Roy, Kaushik. / Trifecta : A nonspeculative scheme to exploit common, data-dependent subcritical paths. In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 2010 ; Vol. 18, No. 1. pp. 53-65.
@article{a46cca874ef44544bbd7b6cc30efa5e4,
title = "Trifecta: A nonspeculative scheme to exploit common, data-dependent subcritical paths",
abstract = "Pipelined processor cores are conventionally designed to accommodate the critical paths in the critical pipeline stage(s) in a single clock cycle, to ensure correctness. Such conservative design is wasteful in many cases since critical paths are rarely exercised. Thus, configuring the pipeline to operate correctly for rarely used critical paths targets the uncommon case instead of optimizing for the common case. In this study, we describe Trifecta - an architectural technique that completes common-case, subcritical path operations in a single cycle but uses two cycles when the critical path is exercised. This increases slack for both single- and twocycle operations and offers a unique advantage under process variation. In contrast with existing mechanisms that trade power or performance for yield, Trifecta improves the yield while preserving performance and power. We applied this technique to the critical pipeline stages of a superscalar out-of-order (OoO) and a single issue in-order processor, namely instruction issue and execute, respectively. Our experiments show that the rare two-cycle operations result in a small decrease (5{\%} for integer and 2{\%} for floating-point benchmarks of SPEC2000) in instructions per cycle. However, the increased delay slack causes an improvement in yieldadjusted-throughput by 20{\%} (12.7{\%}) for an in-order (InO) processor configuration.",
author = "Patrick Ndai and Nauman Rafique and Mithuna Thottethodi and Swaroop Ghosh and Swarup Bhunia and Kaushik Roy",
year = "2010",
month = "1",
day = "1",
doi = "10.1109/TVLSI.2008.2007491",
language = "English (US)",
volume = "18",
pages = "53--65",
journal = "IEEE Transactions on Very Large Scale Integration (VLSI) Systems",
issn = "1063-8210",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "1",

}

Trifecta : A nonspeculative scheme to exploit common, data-dependent subcritical paths. / Ndai, Patrick; Rafique, Nauman; Thottethodi, Mithuna; Ghosh, Swaroop; Bhunia, Swarup; Roy, Kaushik.

In: IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 18, No. 1, 4895686, 01.01.2010, p. 53-65.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Trifecta

T2 - A nonspeculative scheme to exploit common, data-dependent subcritical paths

AU - Ndai, Patrick

AU - Rafique, Nauman

AU - Thottethodi, Mithuna

AU - Ghosh, Swaroop

AU - Bhunia, Swarup

AU - Roy, Kaushik

PY - 2010/1/1

Y1 - 2010/1/1

N2 - Pipelined processor cores are conventionally designed to accommodate the critical paths in the critical pipeline stage(s) in a single clock cycle, to ensure correctness. Such conservative design is wasteful in many cases since critical paths are rarely exercised. Thus, configuring the pipeline to operate correctly for rarely used critical paths targets the uncommon case instead of optimizing for the common case. In this study, we describe Trifecta - an architectural technique that completes common-case, subcritical path operations in a single cycle but uses two cycles when the critical path is exercised. This increases slack for both single- and twocycle operations and offers a unique advantage under process variation. In contrast with existing mechanisms that trade power or performance for yield, Trifecta improves the yield while preserving performance and power. We applied this technique to the critical pipeline stages of a superscalar out-of-order (OoO) and a single issue in-order processor, namely instruction issue and execute, respectively. Our experiments show that the rare two-cycle operations result in a small decrease (5% for integer and 2% for floating-point benchmarks of SPEC2000) in instructions per cycle. However, the increased delay slack causes an improvement in yieldadjusted-throughput by 20% (12.7%) for an in-order (InO) processor configuration.

AB - Pipelined processor cores are conventionally designed to accommodate the critical paths in the critical pipeline stage(s) in a single clock cycle, to ensure correctness. Such conservative design is wasteful in many cases since critical paths are rarely exercised. Thus, configuring the pipeline to operate correctly for rarely used critical paths targets the uncommon case instead of optimizing for the common case. In this study, we describe Trifecta - an architectural technique that completes common-case, subcritical path operations in a single cycle but uses two cycles when the critical path is exercised. This increases slack for both single- and twocycle operations and offers a unique advantage under process variation. In contrast with existing mechanisms that trade power or performance for yield, Trifecta improves the yield while preserving performance and power. We applied this technique to the critical pipeline stages of a superscalar out-of-order (OoO) and a single issue in-order processor, namely instruction issue and execute, respectively. Our experiments show that the rare two-cycle operations result in a small decrease (5% for integer and 2% for floating-point benchmarks of SPEC2000) in instructions per cycle. However, the increased delay slack causes an improvement in yieldadjusted-throughput by 20% (12.7%) for an in-order (InO) processor configuration.

UR - http://www.scopus.com/inward/record.url?scp=73249132776&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=73249132776&partnerID=8YFLogxK

U2 - 10.1109/TVLSI.2008.2007491

DO - 10.1109/TVLSI.2008.2007491

M3 - Article

AN - SCOPUS:73249132776

VL - 18

SP - 53

EP - 65

JO - IEEE Transactions on Very Large Scale Integration (VLSI) Systems

JF - IEEE Transactions on Very Large Scale Integration (VLSI) Systems

SN - 1063-8210

IS - 1

M1 - 4895686

ER -