Particle simulation on the Cell BE architecture

Betul Demiroz, Haluk R. Topcuoglu, Mahmut Kandemir, Oguz Tosun

Research output: Contribution to journalArticle

Abstract

This paper presents two parallel formulations for the Barnes-Hut algorithm on the Cell architecture, which differ in tree distribution and construction phases of the algorithm. In the initial parallelization, the domains are dynamically partitioned and assigned to the synergistic processing elements (SPEs), and SPEs construct local trees of the sub-domains in parallel. The enhanced parallelization scheme provides better clustering of the particles by sequentially constructing the global tree of the entire work space in the power processing element (PPE) and by partitioning the tree into sub-trees that can fit in the Local Store. SPEs operate on the sub-tree data and construct local trees in parallel. Our experimental evaluation indicates that this application performs much faster on the Cell BE compared to the Intel Xeon based system. Specifically, our first and second methods on the Cell BE outperform Intel Xeon by a factor of 5.8 and 7.1 for 8192 particles, respectively.

Original languageEnglish (US)
Pages (from-to)419-432
Number of pages14
JournalCluster Computing
Volume14
Issue number4
DOIs
StatePublished - Dec 1 2011

Fingerprint

Processing

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Networks and Communications

Cite this

Demiroz, Betul ; Topcuoglu, Haluk R. ; Kandemir, Mahmut ; Tosun, Oguz. / Particle simulation on the Cell BE architecture. In: Cluster Computing. 2011 ; Vol. 14, No. 4. pp. 419-432.
@article{fa05240a266742a9ba8f3d19b05dff31,
title = "Particle simulation on the Cell BE architecture",
abstract = "This paper presents two parallel formulations for the Barnes-Hut algorithm on the Cell architecture, which differ in tree distribution and construction phases of the algorithm. In the initial parallelization, the domains are dynamically partitioned and assigned to the synergistic processing elements (SPEs), and SPEs construct local trees of the sub-domains in parallel. The enhanced parallelization scheme provides better clustering of the particles by sequentially constructing the global tree of the entire work space in the power processing element (PPE) and by partitioning the tree into sub-trees that can fit in the Local Store. SPEs operate on the sub-tree data and construct local trees in parallel. Our experimental evaluation indicates that this application performs much faster on the Cell BE compared to the Intel Xeon based system. Specifically, our first and second methods on the Cell BE outperform Intel Xeon by a factor of 5.8 and 7.1 for 8192 particles, respectively.",
author = "Betul Demiroz and Topcuoglu, {Haluk R.} and Mahmut Kandemir and Oguz Tosun",
year = "2011",
month = "12",
day = "1",
doi = "10.1007/s10586-011-0169-4",
language = "English (US)",
volume = "14",
pages = "419--432",
journal = "Cluster Computing",
issn = "1386-7857",
publisher = "Kluwer Academic Publishers",
number = "4",

}

Particle simulation on the Cell BE architecture. / Demiroz, Betul; Topcuoglu, Haluk R.; Kandemir, Mahmut; Tosun, Oguz.

In: Cluster Computing, Vol. 14, No. 4, 01.12.2011, p. 419-432.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Particle simulation on the Cell BE architecture

AU - Demiroz, Betul

AU - Topcuoglu, Haluk R.

AU - Kandemir, Mahmut

AU - Tosun, Oguz

PY - 2011/12/1

Y1 - 2011/12/1

N2 - This paper presents two parallel formulations for the Barnes-Hut algorithm on the Cell architecture, which differ in tree distribution and construction phases of the algorithm. In the initial parallelization, the domains are dynamically partitioned and assigned to the synergistic processing elements (SPEs), and SPEs construct local trees of the sub-domains in parallel. The enhanced parallelization scheme provides better clustering of the particles by sequentially constructing the global tree of the entire work space in the power processing element (PPE) and by partitioning the tree into sub-trees that can fit in the Local Store. SPEs operate on the sub-tree data and construct local trees in parallel. Our experimental evaluation indicates that this application performs much faster on the Cell BE compared to the Intel Xeon based system. Specifically, our first and second methods on the Cell BE outperform Intel Xeon by a factor of 5.8 and 7.1 for 8192 particles, respectively.

AB - This paper presents two parallel formulations for the Barnes-Hut algorithm on the Cell architecture, which differ in tree distribution and construction phases of the algorithm. In the initial parallelization, the domains are dynamically partitioned and assigned to the synergistic processing elements (SPEs), and SPEs construct local trees of the sub-domains in parallel. The enhanced parallelization scheme provides better clustering of the particles by sequentially constructing the global tree of the entire work space in the power processing element (PPE) and by partitioning the tree into sub-trees that can fit in the Local Store. SPEs operate on the sub-tree data and construct local trees in parallel. Our experimental evaluation indicates that this application performs much faster on the Cell BE compared to the Intel Xeon based system. Specifically, our first and second methods on the Cell BE outperform Intel Xeon by a factor of 5.8 and 7.1 for 8192 particles, respectively.

UR - http://www.scopus.com/inward/record.url?scp=81355147579&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=81355147579&partnerID=8YFLogxK

U2 - 10.1007/s10586-011-0169-4

DO - 10.1007/s10586-011-0169-4

M3 - Article

AN - SCOPUS:81355147579

VL - 14

SP - 419

EP - 432

JO - Cluster Computing

JF - Cluster Computing

SN - 1386-7857

IS - 4

ER -