Unified compiler algorithm for optimizing locality, parallelism and communication in out-of-core computations

Mahmut Kandemir, A. Choudhary, J. Ramanujam, M. Kandaswamy

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

This paper presents compiler algorithms to optimize out-of-core programs. These algorithms consider loop and data layout transformations in a unified framework. The performance of an out-of-core loop nest containing many references can be improved by a combination of restructuring the loops and file layouts. This approach considers array references one-by-one and attempts to optimize each reference for parallelism and locality. When there are references for which parallelism optimizations do not work, communication is vectorized so that data transfer can be performed before the innermost tiling loop. Preliminary results from hand-compiles on IBM SP-2 and Intel Paragon show that this approach reduces the execution time, improves the bandwidth speedup and overall speedup. In addition, we extend the base algorithm to work with file layout constraints and show how it can be used for optimizing programs consisting of multiple loop nests.

Original languageEnglish (US)
Title of host publicationProceedings of the Annual Workshop on I/O in Parallel and Distributed Systems, IOPADS
PublisherACM
Pages79-92
Number of pages14
StatePublished - 1997
EventProceedings of the 1997 5th Workshop on I/O in Parallel and Distributed Systems - San Jose, CA, USA
Duration: Nov 17 1997Nov 17 1997

Other

OtherProceedings of the 1997 5th Workshop on I/O in Parallel and Distributed Systems
CitySan Jose, CA, USA
Period11/17/9711/17/97

Fingerprint

Communication
Data transfer
Bandwidth

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Cite this

Kandemir, M., Choudhary, A., Ramanujam, J., & Kandaswamy, M. (1997). Unified compiler algorithm for optimizing locality, parallelism and communication in out-of-core computations. In Proceedings of the Annual Workshop on I/O in Parallel and Distributed Systems, IOPADS (pp. 79-92). ACM.
Kandemir, Mahmut ; Choudhary, A. ; Ramanujam, J. ; Kandaswamy, M. / Unified compiler algorithm for optimizing locality, parallelism and communication in out-of-core computations. Proceedings of the Annual Workshop on I/O in Parallel and Distributed Systems, IOPADS. ACM, 1997. pp. 79-92
@inproceedings{37551303683149e9b7f3e224b8de4778,
title = "Unified compiler algorithm for optimizing locality, parallelism and communication in out-of-core computations",
abstract = "This paper presents compiler algorithms to optimize out-of-core programs. These algorithms consider loop and data layout transformations in a unified framework. The performance of an out-of-core loop nest containing many references can be improved by a combination of restructuring the loops and file layouts. This approach considers array references one-by-one and attempts to optimize each reference for parallelism and locality. When there are references for which parallelism optimizations do not work, communication is vectorized so that data transfer can be performed before the innermost tiling loop. Preliminary results from hand-compiles on IBM SP-2 and Intel Paragon show that this approach reduces the execution time, improves the bandwidth speedup and overall speedup. In addition, we extend the base algorithm to work with file layout constraints and show how it can be used for optimizing programs consisting of multiple loop nests.",
author = "Mahmut Kandemir and A. Choudhary and J. Ramanujam and M. Kandaswamy",
year = "1997",
language = "English (US)",
pages = "79--92",
booktitle = "Proceedings of the Annual Workshop on I/O in Parallel and Distributed Systems, IOPADS",
publisher = "ACM",

}

Kandemir, M, Choudhary, A, Ramanujam, J & Kandaswamy, M 1997, Unified compiler algorithm for optimizing locality, parallelism and communication in out-of-core computations. in Proceedings of the Annual Workshop on I/O in Parallel and Distributed Systems, IOPADS. ACM, pp. 79-92, Proceedings of the 1997 5th Workshop on I/O in Parallel and Distributed Systems, San Jose, CA, USA, 11/17/97.

Unified compiler algorithm for optimizing locality, parallelism and communication in out-of-core computations. / Kandemir, Mahmut; Choudhary, A.; Ramanujam, J.; Kandaswamy, M.

Proceedings of the Annual Workshop on I/O in Parallel and Distributed Systems, IOPADS. ACM, 1997. p. 79-92.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Unified compiler algorithm for optimizing locality, parallelism and communication in out-of-core computations

AU - Kandemir, Mahmut

AU - Choudhary, A.

AU - Ramanujam, J.

AU - Kandaswamy, M.

PY - 1997

Y1 - 1997

N2 - This paper presents compiler algorithms to optimize out-of-core programs. These algorithms consider loop and data layout transformations in a unified framework. The performance of an out-of-core loop nest containing many references can be improved by a combination of restructuring the loops and file layouts. This approach considers array references one-by-one and attempts to optimize each reference for parallelism and locality. When there are references for which parallelism optimizations do not work, communication is vectorized so that data transfer can be performed before the innermost tiling loop. Preliminary results from hand-compiles on IBM SP-2 and Intel Paragon show that this approach reduces the execution time, improves the bandwidth speedup and overall speedup. In addition, we extend the base algorithm to work with file layout constraints and show how it can be used for optimizing programs consisting of multiple loop nests.

AB - This paper presents compiler algorithms to optimize out-of-core programs. These algorithms consider loop and data layout transformations in a unified framework. The performance of an out-of-core loop nest containing many references can be improved by a combination of restructuring the loops and file layouts. This approach considers array references one-by-one and attempts to optimize each reference for parallelism and locality. When there are references for which parallelism optimizations do not work, communication is vectorized so that data transfer can be performed before the innermost tiling loop. Preliminary results from hand-compiles on IBM SP-2 and Intel Paragon show that this approach reduces the execution time, improves the bandwidth speedup and overall speedup. In addition, we extend the base algorithm to work with file layout constraints and show how it can be used for optimizing programs consisting of multiple loop nests.

UR - http://www.scopus.com/inward/record.url?scp=0031345652&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0031345652&partnerID=8YFLogxK

M3 - Conference contribution

SP - 79

EP - 92

BT - Proceedings of the Annual Workshop on I/O in Parallel and Distributed Systems, IOPADS

PB - ACM

ER -

Kandemir M, Choudhary A, Ramanujam J, Kandaswamy M. Unified compiler algorithm for optimizing locality, parallelism and communication in out-of-core computations. In Proceedings of the Annual Workshop on I/O in Parallel and Distributed Systems, IOPADS. ACM. 1997. p. 79-92