A complexity-effective approach to alu bandwidth enhancement for instruction-level temporal redundancy

Angshuman Parashar, Sudhanva Gurumurthi, Anand Sivasubramaniam

Research output: Contribution to journalConference article

42 Citations (Scopus)

Abstract

Previous proposals for implementing instruction-level temporal redundancy in out-of-order cores have reported a performance degradation of upto 45% in certain applications compared to an execution which does not have any temporal redundancy. An important contributor to this problem is the insufficient number of ALUs for handling the amplified load injected into the core. At the same time, increasing the number of ALUs can increase the complexity of the issue logic, which has been pointed out to be one of the most timing critical components of the processor. This paper proposes a novel extension of a prior idea on instruction reuse to ease ALU bandwidth requirements in a complexity-effective way by exploiting certain interesting properties of a dual (temporally redundant) instruction stream. We present microarchitectural extensions necessary for implementing an instruction reuse buffer (IRB) and integrating this with the issue logic of a dual instruction stream superscalar core, and conduct extensive evaluations to demonstrate how well it can alleviate the ALU bandwidth problem. We show that on the average we can gain back nearly 50% of the IPC loss that occurred due to ALU bandwidth limitations for an instruction-level temporally redundant superscalar execution, and 23% of the overall IPC loss.

Original languageEnglish (US)
Pages (from-to)376-386
Number of pages11
JournalConference Proceedings - Annual International Symposium on Computer Architecture, ISCA
Volume31
StatePublished - Oct 8 2004
EventProceedings -31st Annual International Symposium on Computer Architecture - Munich, Germany
Duration: Jun 19 2004Jun 23 2004

Fingerprint

Redundancy
Bandwidth
Degradation

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture

Cite this

@article{a706c66fa7244dbc9e2481514f156d35,
title = "A complexity-effective approach to alu bandwidth enhancement for instruction-level temporal redundancy",
abstract = "Previous proposals for implementing instruction-level temporal redundancy in out-of-order cores have reported a performance degradation of upto 45{\%} in certain applications compared to an execution which does not have any temporal redundancy. An important contributor to this problem is the insufficient number of ALUs for handling the amplified load injected into the core. At the same time, increasing the number of ALUs can increase the complexity of the issue logic, which has been pointed out to be one of the most timing critical components of the processor. This paper proposes a novel extension of a prior idea on instruction reuse to ease ALU bandwidth requirements in a complexity-effective way by exploiting certain interesting properties of a dual (temporally redundant) instruction stream. We present microarchitectural extensions necessary for implementing an instruction reuse buffer (IRB) and integrating this with the issue logic of a dual instruction stream superscalar core, and conduct extensive evaluations to demonstrate how well it can alleviate the ALU bandwidth problem. We show that on the average we can gain back nearly 50{\%} of the IPC loss that occurred due to ALU bandwidth limitations for an instruction-level temporally redundant superscalar execution, and 23{\%} of the overall IPC loss.",
author = "Angshuman Parashar and Sudhanva Gurumurthi and Anand Sivasubramaniam",
year = "2004",
month = "10",
day = "8",
language = "English (US)",
volume = "31",
pages = "376--386",
journal = "Proceedings - International Symposium on Computer Architecture",
issn = "1063-6897",

}

A complexity-effective approach to alu bandwidth enhancement for instruction-level temporal redundancy. / Parashar, Angshuman; Gurumurthi, Sudhanva; Sivasubramaniam, Anand.

In: Conference Proceedings - Annual International Symposium on Computer Architecture, ISCA, Vol. 31, 08.10.2004, p. 376-386.

Research output: Contribution to journalConference article

TY - JOUR

T1 - A complexity-effective approach to alu bandwidth enhancement for instruction-level temporal redundancy

AU - Parashar, Angshuman

AU - Gurumurthi, Sudhanva

AU - Sivasubramaniam, Anand

PY - 2004/10/8

Y1 - 2004/10/8

N2 - Previous proposals for implementing instruction-level temporal redundancy in out-of-order cores have reported a performance degradation of upto 45% in certain applications compared to an execution which does not have any temporal redundancy. An important contributor to this problem is the insufficient number of ALUs for handling the amplified load injected into the core. At the same time, increasing the number of ALUs can increase the complexity of the issue logic, which has been pointed out to be one of the most timing critical components of the processor. This paper proposes a novel extension of a prior idea on instruction reuse to ease ALU bandwidth requirements in a complexity-effective way by exploiting certain interesting properties of a dual (temporally redundant) instruction stream. We present microarchitectural extensions necessary for implementing an instruction reuse buffer (IRB) and integrating this with the issue logic of a dual instruction stream superscalar core, and conduct extensive evaluations to demonstrate how well it can alleviate the ALU bandwidth problem. We show that on the average we can gain back nearly 50% of the IPC loss that occurred due to ALU bandwidth limitations for an instruction-level temporally redundant superscalar execution, and 23% of the overall IPC loss.

AB - Previous proposals for implementing instruction-level temporal redundancy in out-of-order cores have reported a performance degradation of upto 45% in certain applications compared to an execution which does not have any temporal redundancy. An important contributor to this problem is the insufficient number of ALUs for handling the amplified load injected into the core. At the same time, increasing the number of ALUs can increase the complexity of the issue logic, which has been pointed out to be one of the most timing critical components of the processor. This paper proposes a novel extension of a prior idea on instruction reuse to ease ALU bandwidth requirements in a complexity-effective way by exploiting certain interesting properties of a dual (temporally redundant) instruction stream. We present microarchitectural extensions necessary for implementing an instruction reuse buffer (IRB) and integrating this with the issue logic of a dual instruction stream superscalar core, and conduct extensive evaluations to demonstrate how well it can alleviate the ALU bandwidth problem. We show that on the average we can gain back nearly 50% of the IPC loss that occurred due to ALU bandwidth limitations for an instruction-level temporally redundant superscalar execution, and 23% of the overall IPC loss.

UR - http://www.scopus.com/inward/record.url?scp=4644301626&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=4644301626&partnerID=8YFLogxK

M3 - Conference article

VL - 31

SP - 376

EP - 386

JO - Proceedings - International Symposium on Computer Architecture

JF - Proceedings - International Symposium on Computer Architecture

SN - 1063-6897

ER -