Abstract
As datasets processed by embedded processors increase in size and complexity, the management of higher levels of memory hierarchy (e.g., caches) is becoming an important issue. A major limitation of most of the cache locality optimization techniques proposed by previous research is that they handle a single procedure at a time. This prevents compilers from capturing the data access interactions between procedures and may result in poor performance. In this paper, we look at loop and data transformations from a different angle and use them in an interprocedural optimization framework. Employing the call graph representation of a given application, the proposed technique visits each node of this graph twice and uses loop and data transformations in a systematic way for optimizing array layouts whole program wide. Our experimental results show that this interprocedural locality optimization strategy is much more effective than the previous locality-based techniques that handle each procedure in isolation.
Original language | English (US) |
---|---|
Pages (from-to) | 887-892 |
Number of pages | 6 |
Journal | Proceedings - Design Automation Conference |
DOIs | |
State | Published - 2003 |
Event | Proceedings of the 40th Design Automation Conference - Anaheim, CA, United States Duration: Jun 2 2003 → Jun 6 2003 |
All Science Journal Classification (ASJC) codes
- Hardware and Architecture
- Control and Systems Engineering