Interprocedural optimizations for improving data cache performance of array-intensive embedded applications

W. Zhang, G. Chen, Mahmut Kandemir, M. Karakoy

Research output: Contribution to journalConference article

4 Citations (Scopus)

Abstract

As datasets processed by embedded processors increase in size and complexity, the management of higher levels of memory hierarchy (e.g., caches) is becoming an important issue. A major limitation of most of the cache locality optimization techniques proposed by previous research is that they handle a single procedure at a time. This prevents compilers from capturing the data access interactions between procedures and may result in poor performance. In this paper, we look at loop and data transformations from a different angle and use them in an interprocedural optimization framework. Employing the call graph representation of a given application, the proposed technique visits each node of this graph twice and uses loop and data transformations in a systematic way for optimizing array layouts whole program wide. Our experimental results show that this interprocedural locality optimization strategy is much more effective than the previous locality-based techniques that handle each procedure in isolation.

Original languageEnglish (US)
Pages (from-to)887-892
Number of pages6
JournalProceedings - Design Automation Conference
StatePublished - Aug 18 2003
EventProceedings of the 40th Design Automation Conference - Anaheim, CA, United States
Duration: Jun 2 2003Jun 6 2003

Fingerprint

Data storage equipment

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture
  • Control and Systems Engineering

Cite this

@article{932db78f08d247e2a8006f765fdd34e3,
title = "Interprocedural optimizations for improving data cache performance of array-intensive embedded applications",
abstract = "As datasets processed by embedded processors increase in size and complexity, the management of higher levels of memory hierarchy (e.g., caches) is becoming an important issue. A major limitation of most of the cache locality optimization techniques proposed by previous research is that they handle a single procedure at a time. This prevents compilers from capturing the data access interactions between procedures and may result in poor performance. In this paper, we look at loop and data transformations from a different angle and use them in an interprocedural optimization framework. Employing the call graph representation of a given application, the proposed technique visits each node of this graph twice and uses loop and data transformations in a systematic way for optimizing array layouts whole program wide. Our experimental results show that this interprocedural locality optimization strategy is much more effective than the previous locality-based techniques that handle each procedure in isolation.",
author = "W. Zhang and G. Chen and Mahmut Kandemir and M. Karakoy",
year = "2003",
month = "8",
day = "18",
language = "English (US)",
pages = "887--892",
journal = "Proceedings - Design Automation Conference",
issn = "0738-100X",

}

Interprocedural optimizations for improving data cache performance of array-intensive embedded applications. / Zhang, W.; Chen, G.; Kandemir, Mahmut; Karakoy, M.

In: Proceedings - Design Automation Conference, 18.08.2003, p. 887-892.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Interprocedural optimizations for improving data cache performance of array-intensive embedded applications

AU - Zhang, W.

AU - Chen, G.

AU - Kandemir, Mahmut

AU - Karakoy, M.

PY - 2003/8/18

Y1 - 2003/8/18

N2 - As datasets processed by embedded processors increase in size and complexity, the management of higher levels of memory hierarchy (e.g., caches) is becoming an important issue. A major limitation of most of the cache locality optimization techniques proposed by previous research is that they handle a single procedure at a time. This prevents compilers from capturing the data access interactions between procedures and may result in poor performance. In this paper, we look at loop and data transformations from a different angle and use them in an interprocedural optimization framework. Employing the call graph representation of a given application, the proposed technique visits each node of this graph twice and uses loop and data transformations in a systematic way for optimizing array layouts whole program wide. Our experimental results show that this interprocedural locality optimization strategy is much more effective than the previous locality-based techniques that handle each procedure in isolation.

AB - As datasets processed by embedded processors increase in size and complexity, the management of higher levels of memory hierarchy (e.g., caches) is becoming an important issue. A major limitation of most of the cache locality optimization techniques proposed by previous research is that they handle a single procedure at a time. This prevents compilers from capturing the data access interactions between procedures and may result in poor performance. In this paper, we look at loop and data transformations from a different angle and use them in an interprocedural optimization framework. Employing the call graph representation of a given application, the proposed technique visits each node of this graph twice and uses loop and data transformations in a systematic way for optimizing array layouts whole program wide. Our experimental results show that this interprocedural locality optimization strategy is much more effective than the previous locality-based techniques that handle each procedure in isolation.

UR - http://www.scopus.com/inward/record.url?scp=0041633587&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0041633587&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:0041633587

SP - 887

EP - 892

JO - Proceedings - Design Automation Conference

JF - Proceedings - Design Automation Conference

SN - 0738-100X

ER -