SPM conscious loop scheduling for embedded chip multiprocessors

Liping Xue, Mahmut Kandemir, Guangyu Chen, Taylan Yemliha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

One of the major factors that can potentially slow down widespread use of embedded chip multiprocessors is lack of efficient software support. In particular, automated code parallelizers are badly needed since it is not realistic to expect an average programmer to parallelize a large complex embedded application over multiple processors, taking into account several factors at the same time such as code density, data locality, performance, power and code resilience. Especially, increasing use of software-managed SPM (scratch-pad memory) components in embedded systems require an SPM conscious code parallelization. Motivated by this observation, this paper proposes a novel compiler-based SPM conscious loop scheduling strategy for array/loop based embedded applications. This strategy tries to achieve two objectives. First, the sets of loop iterations assigned to different processors should approximately take the same amount of time to finish. Second, the set of iterations assigned to a processor should exhibit high data reuse. Satisfying these two objectives help us to minimize parallel execution time of the application at hand. The specific method adopted by our scheduling strategy to achieve these objectives is to distribute loop iterations across parallel processors in an SPM conscious manner. In this strategy, the compiler analyzes the loop, identifies the potential SPM hits and misses, and distributes loop iterations over processors such that the processors have more or less the same execution time. Our experimental results so far indicate that the proposed approach generates much better results than existing loop schedulers. Specifically, it brings 18.9%, 22.4%, and 11.1% improvements in parallel execution time (with a chip multiprocessor of 8 cores) over a previously proposed static scheduler, a dynamic scheduler, and an alternate locality-conscious scheduler, respectively.

Original languageEnglish (US)
Title of host publicationProceedings - 12th International Conference on Parallel and Distributed Systems, ICPADS 2006
Pages391-398
Number of pages8
DOIs
StatePublished - Dec 1 2006
Event12th International Conference on Parallel and Distributed Systems, ICPADS 2006 - Minneapolis, MN, United States
Duration: Jul 12 2006Jul 15 2006

Publication series

NameProceedings of the International Conference on Parallel and Distributed Systems - ICPADS
Volume1
ISSN (Print)1521-9097

Other

Other12th International Conference on Parallel and Distributed Systems, ICPADS 2006
CountryUnited States
CityMinneapolis, MN
Period7/12/067/15/06

Fingerprint

Scheduling
Data storage equipment
Embedded systems

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture

Cite this

Xue, L., Kandemir, M., Chen, G., & Yemliha, T. (2006). SPM conscious loop scheduling for embedded chip multiprocessors. In Proceedings - 12th International Conference on Parallel and Distributed Systems, ICPADS 2006 (pp. 391-398). [1655685] (Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS; Vol. 1). https://doi.org/10.1109/ICPADS.2006.102
Xue, Liping ; Kandemir, Mahmut ; Chen, Guangyu ; Yemliha, Taylan. / SPM conscious loop scheduling for embedded chip multiprocessors. Proceedings - 12th International Conference on Parallel and Distributed Systems, ICPADS 2006. 2006. pp. 391-398 (Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS).
@inproceedings{690aa6f680634da9a83552798b9063c0,
title = "SPM conscious loop scheduling for embedded chip multiprocessors",
abstract = "One of the major factors that can potentially slow down widespread use of embedded chip multiprocessors is lack of efficient software support. In particular, automated code parallelizers are badly needed since it is not realistic to expect an average programmer to parallelize a large complex embedded application over multiple processors, taking into account several factors at the same time such as code density, data locality, performance, power and code resilience. Especially, increasing use of software-managed SPM (scratch-pad memory) components in embedded systems require an SPM conscious code parallelization. Motivated by this observation, this paper proposes a novel compiler-based SPM conscious loop scheduling strategy for array/loop based embedded applications. This strategy tries to achieve two objectives. First, the sets of loop iterations assigned to different processors should approximately take the same amount of time to finish. Second, the set of iterations assigned to a processor should exhibit high data reuse. Satisfying these two objectives help us to minimize parallel execution time of the application at hand. The specific method adopted by our scheduling strategy to achieve these objectives is to distribute loop iterations across parallel processors in an SPM conscious manner. In this strategy, the compiler analyzes the loop, identifies the potential SPM hits and misses, and distributes loop iterations over processors such that the processors have more or less the same execution time. Our experimental results so far indicate that the proposed approach generates much better results than existing loop schedulers. Specifically, it brings 18.9{\%}, 22.4{\%}, and 11.1{\%} improvements in parallel execution time (with a chip multiprocessor of 8 cores) over a previously proposed static scheduler, a dynamic scheduler, and an alternate locality-conscious scheduler, respectively.",
author = "Liping Xue and Mahmut Kandemir and Guangyu Chen and Taylan Yemliha",
year = "2006",
month = "12",
day = "1",
doi = "10.1109/ICPADS.2006.102",
language = "English (US)",
isbn = "0769526128",
series = "Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS",
pages = "391--398",
booktitle = "Proceedings - 12th International Conference on Parallel and Distributed Systems, ICPADS 2006",

}

Xue, L, Kandemir, M, Chen, G & Yemliha, T 2006, SPM conscious loop scheduling for embedded chip multiprocessors. in Proceedings - 12th International Conference on Parallel and Distributed Systems, ICPADS 2006., 1655685, Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS, vol. 1, pp. 391-398, 12th International Conference on Parallel and Distributed Systems, ICPADS 2006, Minneapolis, MN, United States, 7/12/06. https://doi.org/10.1109/ICPADS.2006.102

SPM conscious loop scheduling for embedded chip multiprocessors. / Xue, Liping; Kandemir, Mahmut; Chen, Guangyu; Yemliha, Taylan.

Proceedings - 12th International Conference on Parallel and Distributed Systems, ICPADS 2006. 2006. p. 391-398 1655685 (Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS; Vol. 1).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - SPM conscious loop scheduling for embedded chip multiprocessors

AU - Xue, Liping

AU - Kandemir, Mahmut

AU - Chen, Guangyu

AU - Yemliha, Taylan

PY - 2006/12/1

Y1 - 2006/12/1

N2 - One of the major factors that can potentially slow down widespread use of embedded chip multiprocessors is lack of efficient software support. In particular, automated code parallelizers are badly needed since it is not realistic to expect an average programmer to parallelize a large complex embedded application over multiple processors, taking into account several factors at the same time such as code density, data locality, performance, power and code resilience. Especially, increasing use of software-managed SPM (scratch-pad memory) components in embedded systems require an SPM conscious code parallelization. Motivated by this observation, this paper proposes a novel compiler-based SPM conscious loop scheduling strategy for array/loop based embedded applications. This strategy tries to achieve two objectives. First, the sets of loop iterations assigned to different processors should approximately take the same amount of time to finish. Second, the set of iterations assigned to a processor should exhibit high data reuse. Satisfying these two objectives help us to minimize parallel execution time of the application at hand. The specific method adopted by our scheduling strategy to achieve these objectives is to distribute loop iterations across parallel processors in an SPM conscious manner. In this strategy, the compiler analyzes the loop, identifies the potential SPM hits and misses, and distributes loop iterations over processors such that the processors have more or less the same execution time. Our experimental results so far indicate that the proposed approach generates much better results than existing loop schedulers. Specifically, it brings 18.9%, 22.4%, and 11.1% improvements in parallel execution time (with a chip multiprocessor of 8 cores) over a previously proposed static scheduler, a dynamic scheduler, and an alternate locality-conscious scheduler, respectively.

AB - One of the major factors that can potentially slow down widespread use of embedded chip multiprocessors is lack of efficient software support. In particular, automated code parallelizers are badly needed since it is not realistic to expect an average programmer to parallelize a large complex embedded application over multiple processors, taking into account several factors at the same time such as code density, data locality, performance, power and code resilience. Especially, increasing use of software-managed SPM (scratch-pad memory) components in embedded systems require an SPM conscious code parallelization. Motivated by this observation, this paper proposes a novel compiler-based SPM conscious loop scheduling strategy for array/loop based embedded applications. This strategy tries to achieve two objectives. First, the sets of loop iterations assigned to different processors should approximately take the same amount of time to finish. Second, the set of iterations assigned to a processor should exhibit high data reuse. Satisfying these two objectives help us to minimize parallel execution time of the application at hand. The specific method adopted by our scheduling strategy to achieve these objectives is to distribute loop iterations across parallel processors in an SPM conscious manner. In this strategy, the compiler analyzes the loop, identifies the potential SPM hits and misses, and distributes loop iterations over processors such that the processors have more or less the same execution time. Our experimental results so far indicate that the proposed approach generates much better results than existing loop schedulers. Specifically, it brings 18.9%, 22.4%, and 11.1% improvements in parallel execution time (with a chip multiprocessor of 8 cores) over a previously proposed static scheduler, a dynamic scheduler, and an alternate locality-conscious scheduler, respectively.

UR - http://www.scopus.com/inward/record.url?scp=34047222785&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34047222785&partnerID=8YFLogxK

U2 - 10.1109/ICPADS.2006.102

DO - 10.1109/ICPADS.2006.102

M3 - Conference contribution

SN - 0769526128

SN - 9780769526126

T3 - Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS

SP - 391

EP - 398

BT - Proceedings - 12th International Conference on Parallel and Distributed Systems, ICPADS 2006

ER -

Xue L, Kandemir M, Chen G, Yemliha T. SPM conscious loop scheduling for embedded chip multiprocessors. In Proceedings - 12th International Conference on Parallel and Distributed Systems, ICPADS 2006. 2006. p. 391-398. 1655685. (Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS). https://doi.org/10.1109/ICPADS.2006.102