Locality-conscious workload assignment for array-based computations in MPSOC architectures

Feihui Li, Mahmut Kandemir

Research output: Contribution to journalConference article

13 Citations (Scopus)

Abstract

While the past research discussed several advantages of multiprocessor-system-on-a-chip (MPSOC) architectures from both area utilization and design verification perspectives over complex single core based systems, compilation issues for these architectures have relatively received less attention. Programming MPSOCs can be challenging as several potentially conflicting issues such as data locality, parallelism and load balance across processors should be considered simultaneously. Most of the compilation techniques discussed in the literature for parallel architectures (not necessarily for MPSOCs) are loop based, i.e., they consider each loop nest in isolation. However, one key problem associated with such loop based techniques is that they fail to capture the interactions between the different loop nests in the application. This paper takes a more global approach to the problem and proposes a compiler-driven data locality optimization strategy in the context of embedded MPSOCs. An important characteristic of the proposed approach is that, in deciding the workloads of the processors (i.e., in parallelizing the application) it considers all the loop nests in the application simultaneously. Our experimental evaluation with eight embedded applications shows that the global scheme brings significant power/performance benefits over the conventional loop based scheme.

Original languageEnglish (US)
Article number8.1
Pages (from-to)95-100
Number of pages6
JournalProceedings - Design Automation Conference
StatePublished - Dec 1 2005
Event42nd Design Automation Conference, DAC 2005 - Anaheim, CA, United States
Duration: Jun 13 2005Jun 17 2005

Fingerprint

Parallel architectures

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture
  • Control and Systems Engineering

Cite this

@article{0e60561f82a5482989c06890d9f60c7f,
title = "Locality-conscious workload assignment for array-based computations in MPSOC architectures",
abstract = "While the past research discussed several advantages of multiprocessor-system-on-a-chip (MPSOC) architectures from both area utilization and design verification perspectives over complex single core based systems, compilation issues for these architectures have relatively received less attention. Programming MPSOCs can be challenging as several potentially conflicting issues such as data locality, parallelism and load balance across processors should be considered simultaneously. Most of the compilation techniques discussed in the literature for parallel architectures (not necessarily for MPSOCs) are loop based, i.e., they consider each loop nest in isolation. However, one key problem associated with such loop based techniques is that they fail to capture the interactions between the different loop nests in the application. This paper takes a more global approach to the problem and proposes a compiler-driven data locality optimization strategy in the context of embedded MPSOCs. An important characteristic of the proposed approach is that, in deciding the workloads of the processors (i.e., in parallelizing the application) it considers all the loop nests in the application simultaneously. Our experimental evaluation with eight embedded applications shows that the global scheme brings significant power/performance benefits over the conventional loop based scheme.",
author = "Feihui Li and Mahmut Kandemir",
year = "2005",
month = "12",
day = "1",
language = "English (US)",
pages = "95--100",
journal = "Proceedings - Design Automation Conference",
issn = "0738-100X",

}

Locality-conscious workload assignment for array-based computations in MPSOC architectures. / Li, Feihui; Kandemir, Mahmut.

In: Proceedings - Design Automation Conference, 01.12.2005, p. 95-100.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Locality-conscious workload assignment for array-based computations in MPSOC architectures

AU - Li, Feihui

AU - Kandemir, Mahmut

PY - 2005/12/1

Y1 - 2005/12/1

N2 - While the past research discussed several advantages of multiprocessor-system-on-a-chip (MPSOC) architectures from both area utilization and design verification perspectives over complex single core based systems, compilation issues for these architectures have relatively received less attention. Programming MPSOCs can be challenging as several potentially conflicting issues such as data locality, parallelism and load balance across processors should be considered simultaneously. Most of the compilation techniques discussed in the literature for parallel architectures (not necessarily for MPSOCs) are loop based, i.e., they consider each loop nest in isolation. However, one key problem associated with such loop based techniques is that they fail to capture the interactions between the different loop nests in the application. This paper takes a more global approach to the problem and proposes a compiler-driven data locality optimization strategy in the context of embedded MPSOCs. An important characteristic of the proposed approach is that, in deciding the workloads of the processors (i.e., in parallelizing the application) it considers all the loop nests in the application simultaneously. Our experimental evaluation with eight embedded applications shows that the global scheme brings significant power/performance benefits over the conventional loop based scheme.

AB - While the past research discussed several advantages of multiprocessor-system-on-a-chip (MPSOC) architectures from both area utilization and design verification perspectives over complex single core based systems, compilation issues for these architectures have relatively received less attention. Programming MPSOCs can be challenging as several potentially conflicting issues such as data locality, parallelism and load balance across processors should be considered simultaneously. Most of the compilation techniques discussed in the literature for parallel architectures (not necessarily for MPSOCs) are loop based, i.e., they consider each loop nest in isolation. However, one key problem associated with such loop based techniques is that they fail to capture the interactions between the different loop nests in the application. This paper takes a more global approach to the problem and proposes a compiler-driven data locality optimization strategy in the context of embedded MPSOCs. An important characteristic of the proposed approach is that, in deciding the workloads of the processors (i.e., in parallelizing the application) it considers all the loop nests in the application simultaneously. Our experimental evaluation with eight embedded applications shows that the global scheme brings significant power/performance benefits over the conventional loop based scheme.

UR - http://www.scopus.com/inward/record.url?scp=27944449256&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=27944449256&partnerID=8YFLogxK

M3 - Conference article

SP - 95

EP - 100

JO - Proceedings - Design Automation Conference

JF - Proceedings - Design Automation Conference

SN - 0738-100X

M1 - 8.1

ER -