Address code and arithmetic optimizations for embedded systems

J. Ramanujam, S. Krishnamurthy, J. Hong, Mahmut Kandemir

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

An important class of problems used widely in both the embedded systems and scientific domains perform memory intensive computations on large data sets. These data sets get to be typically stored in main memory, which means that the compiler needs to generate the address of a memory location in order to store these data elements and generate the same address again when they are subsequently retrieved. This memory address computation is quite expensive, and if it is not performed efficiently, the performance degrades significantly. In this paper, we have developed a new compiler approach for optimizing the memory performance of subscripted or array variables and their address generation in stencil problems that are common in embedded image processing and other applications. Our approach makes use of the observation that in all these stencils, most of the elements accessed are stored close to one other in memory. We try to optimize the stencil codes with a view of reducing both the arithmetic and the address computation overhead. The regularity of the access pattern and the reuse of data elements between successive iterations of the loop body means that there is a common sub-expression between any two successive iterations; these common sub-expressions are difficult to detect using state-of-the-art compiler technology. If we were to store the value of the common sub-expression in a scalar, then for the next iteration, the value in this scalar could be used instead of performing the computation all over again. This greatly reduces the arithmetic overhead. Since we store only one scalar in a register, there is almost no register pressure. Also all array accesses are now replaced by pointer dereferences, where the pointers are incremented after each iteration. This reduces the address computation overhead. Our solution is the only one so far to exploit both scalar conversion and common sub-expressions. Extensive experimental results on several codes show that our approach performs better than the other approaches.

Original languageEnglish (US)
Title of host publicationProceedings - 7th Asia and South Pacific Design Automation Conference, 15th International Conference on VLSI Design, ASP-DAC/VLSI Design 2002
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages619-624
Number of pages6
ISBN (Electronic)0769514413, 9780769514413
DOIs
StatePublished - Jan 1 2002
Event7th Asia and South Pacific Design Automation Conference, 15th International Conference on VLSI Design, ASP-DAC/VLSI Design 2002 - Bangalore, India
Duration: Jan 7 2002Jan 11 2002

Publication series

NameProceedings - 7th Asia and South Pacific Design Automation Conference, 15th International Conference on VLSI Design, ASP-DAC/VLSI Design 2002

Other

Other7th Asia and South Pacific Design Automation Conference, 15th International Conference on VLSI Design, ASP-DAC/VLSI Design 2002
CountryIndia
CityBangalore
Period1/7/021/11/02

Fingerprint

Embedded systems
Data storage equipment
Image processing

All Science Journal Classification (ASJC) codes

  • Computer Graphics and Computer-Aided Design
  • Hardware and Architecture
  • Electrical and Electronic Engineering

Cite this

Ramanujam, J., Krishnamurthy, S., Hong, J., & Kandemir, M. (2002). Address code and arithmetic optimizations for embedded systems. In Proceedings - 7th Asia and South Pacific Design Automation Conference, 15th International Conference on VLSI Design, ASP-DAC/VLSI Design 2002 (pp. 619-624). [995005] (Proceedings - 7th Asia and South Pacific Design Automation Conference, 15th International Conference on VLSI Design, ASP-DAC/VLSI Design 2002). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ASPDAC.2002.995005
Ramanujam, J. ; Krishnamurthy, S. ; Hong, J. ; Kandemir, Mahmut. / Address code and arithmetic optimizations for embedded systems. Proceedings - 7th Asia and South Pacific Design Automation Conference, 15th International Conference on VLSI Design, ASP-DAC/VLSI Design 2002. Institute of Electrical and Electronics Engineers Inc., 2002. pp. 619-624 (Proceedings - 7th Asia and South Pacific Design Automation Conference, 15th International Conference on VLSI Design, ASP-DAC/VLSI Design 2002).
@inproceedings{d747c01f97d64296bcebee79ebe57d1a,
title = "Address code and arithmetic optimizations for embedded systems",
abstract = "An important class of problems used widely in both the embedded systems and scientific domains perform memory intensive computations on large data sets. These data sets get to be typically stored in main memory, which means that the compiler needs to generate the address of a memory location in order to store these data elements and generate the same address again when they are subsequently retrieved. This memory address computation is quite expensive, and if it is not performed efficiently, the performance degrades significantly. In this paper, we have developed a new compiler approach for optimizing the memory performance of subscripted or array variables and their address generation in stencil problems that are common in embedded image processing and other applications. Our approach makes use of the observation that in all these stencils, most of the elements accessed are stored close to one other in memory. We try to optimize the stencil codes with a view of reducing both the arithmetic and the address computation overhead. The regularity of the access pattern and the reuse of data elements between successive iterations of the loop body means that there is a common sub-expression between any two successive iterations; these common sub-expressions are difficult to detect using state-of-the-art compiler technology. If we were to store the value of the common sub-expression in a scalar, then for the next iteration, the value in this scalar could be used instead of performing the computation all over again. This greatly reduces the arithmetic overhead. Since we store only one scalar in a register, there is almost no register pressure. Also all array accesses are now replaced by pointer dereferences, where the pointers are incremented after each iteration. This reduces the address computation overhead. Our solution is the only one so far to exploit both scalar conversion and common sub-expressions. Extensive experimental results on several codes show that our approach performs better than the other approaches.",
author = "J. Ramanujam and S. Krishnamurthy and J. Hong and Mahmut Kandemir",
year = "2002",
month = "1",
day = "1",
doi = "10.1109/ASPDAC.2002.995005",
language = "English (US)",
series = "Proceedings - 7th Asia and South Pacific Design Automation Conference, 15th International Conference on VLSI Design, ASP-DAC/VLSI Design 2002",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "619--624",
booktitle = "Proceedings - 7th Asia and South Pacific Design Automation Conference, 15th International Conference on VLSI Design, ASP-DAC/VLSI Design 2002",
address = "United States",

}

Ramanujam, J, Krishnamurthy, S, Hong, J & Kandemir, M 2002, Address code and arithmetic optimizations for embedded systems. in Proceedings - 7th Asia and South Pacific Design Automation Conference, 15th International Conference on VLSI Design, ASP-DAC/VLSI Design 2002., 995005, Proceedings - 7th Asia and South Pacific Design Automation Conference, 15th International Conference on VLSI Design, ASP-DAC/VLSI Design 2002, Institute of Electrical and Electronics Engineers Inc., pp. 619-624, 7th Asia and South Pacific Design Automation Conference, 15th International Conference on VLSI Design, ASP-DAC/VLSI Design 2002, Bangalore, India, 1/7/02. https://doi.org/10.1109/ASPDAC.2002.995005

Address code and arithmetic optimizations for embedded systems. / Ramanujam, J.; Krishnamurthy, S.; Hong, J.; Kandemir, Mahmut.

Proceedings - 7th Asia and South Pacific Design Automation Conference, 15th International Conference on VLSI Design, ASP-DAC/VLSI Design 2002. Institute of Electrical and Electronics Engineers Inc., 2002. p. 619-624 995005 (Proceedings - 7th Asia and South Pacific Design Automation Conference, 15th International Conference on VLSI Design, ASP-DAC/VLSI Design 2002).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Address code and arithmetic optimizations for embedded systems

AU - Ramanujam, J.

AU - Krishnamurthy, S.

AU - Hong, J.

AU - Kandemir, Mahmut

PY - 2002/1/1

Y1 - 2002/1/1

N2 - An important class of problems used widely in both the embedded systems and scientific domains perform memory intensive computations on large data sets. These data sets get to be typically stored in main memory, which means that the compiler needs to generate the address of a memory location in order to store these data elements and generate the same address again when they are subsequently retrieved. This memory address computation is quite expensive, and if it is not performed efficiently, the performance degrades significantly. In this paper, we have developed a new compiler approach for optimizing the memory performance of subscripted or array variables and their address generation in stencil problems that are common in embedded image processing and other applications. Our approach makes use of the observation that in all these stencils, most of the elements accessed are stored close to one other in memory. We try to optimize the stencil codes with a view of reducing both the arithmetic and the address computation overhead. The regularity of the access pattern and the reuse of data elements between successive iterations of the loop body means that there is a common sub-expression between any two successive iterations; these common sub-expressions are difficult to detect using state-of-the-art compiler technology. If we were to store the value of the common sub-expression in a scalar, then for the next iteration, the value in this scalar could be used instead of performing the computation all over again. This greatly reduces the arithmetic overhead. Since we store only one scalar in a register, there is almost no register pressure. Also all array accesses are now replaced by pointer dereferences, where the pointers are incremented after each iteration. This reduces the address computation overhead. Our solution is the only one so far to exploit both scalar conversion and common sub-expressions. Extensive experimental results on several codes show that our approach performs better than the other approaches.

AB - An important class of problems used widely in both the embedded systems and scientific domains perform memory intensive computations on large data sets. These data sets get to be typically stored in main memory, which means that the compiler needs to generate the address of a memory location in order to store these data elements and generate the same address again when they are subsequently retrieved. This memory address computation is quite expensive, and if it is not performed efficiently, the performance degrades significantly. In this paper, we have developed a new compiler approach for optimizing the memory performance of subscripted or array variables and their address generation in stencil problems that are common in embedded image processing and other applications. Our approach makes use of the observation that in all these stencils, most of the elements accessed are stored close to one other in memory. We try to optimize the stencil codes with a view of reducing both the arithmetic and the address computation overhead. The regularity of the access pattern and the reuse of data elements between successive iterations of the loop body means that there is a common sub-expression between any two successive iterations; these common sub-expressions are difficult to detect using state-of-the-art compiler technology. If we were to store the value of the common sub-expression in a scalar, then for the next iteration, the value in this scalar could be used instead of performing the computation all over again. This greatly reduces the arithmetic overhead. Since we store only one scalar in a register, there is almost no register pressure. Also all array accesses are now replaced by pointer dereferences, where the pointers are incremented after each iteration. This reduces the address computation overhead. Our solution is the only one so far to exploit both scalar conversion and common sub-expressions. Extensive experimental results on several codes show that our approach performs better than the other approaches.

UR - http://www.scopus.com/inward/record.url?scp=84962226456&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84962226456&partnerID=8YFLogxK

U2 - 10.1109/ASPDAC.2002.995005

DO - 10.1109/ASPDAC.2002.995005

M3 - Conference contribution

T3 - Proceedings - 7th Asia and South Pacific Design Automation Conference, 15th International Conference on VLSI Design, ASP-DAC/VLSI Design 2002

SP - 619

EP - 624

BT - Proceedings - 7th Asia and South Pacific Design Automation Conference, 15th International Conference on VLSI Design, ASP-DAC/VLSI Design 2002

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Ramanujam J, Krishnamurthy S, Hong J, Kandemir M. Address code and arithmetic optimizations for embedded systems. In Proceedings - 7th Asia and South Pacific Design Automation Conference, 15th International Conference on VLSI Design, ASP-DAC/VLSI Design 2002. Institute of Electrical and Electronics Engineers Inc. 2002. p. 619-624. 995005. (Proceedings - 7th Asia and South Pacific Design Automation Conference, 15th International Conference on VLSI Design, ASP-DAC/VLSI Design 2002). https://doi.org/10.1109/ASPDAC.2002.995005