Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers

Bei Wang, Stephane Ethier, William Tang, Khaled Z. Ibrahim, Kamesh Madduri, Samuel Williams, Leonid Oliker

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

The gyrokinetic toroidal code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5-D Vlasov–Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P’s multiple levels of parallelism, including internode 2-D domain decomposition and particle decomposition, as well as intranode shared memory partition and vectorization, have enabled pushing the scalability of the PIC method to extreme computational scales. In this article, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) coprocessors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of ion–temperature–gradient driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects, and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.

Original languageEnglish (US)
Pages (from-to)169-188
Number of pages20
JournalInternational Journal of High Performance Computing Applications
Volume33
Issue number1
DOIs
StatePublished - Jan 1 2019

Fingerprint

Supercomputers
Supercomputer
Fusion
Fusion reactions
Plasma
Decomposition
Plasmas
Cell
Cells
Data storage equipment
Performance Comparison
Domain Decomposition
Simulation
Computer architecture
Cache
Particle accelerators
Scalability
Turbulence
Physics
Genes

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture

Cite this

Wang, Bei ; Ethier, Stephane ; Tang, William ; Ibrahim, Khaled Z. ; Madduri, Kamesh ; Williams, Samuel ; Oliker, Leonid. / Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers. In: International Journal of High Performance Computing Applications. 2019 ; Vol. 33, No. 1. pp. 169-188.
@article{82853c106eda43e19a340ff6da57d321,
title = "Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers",
abstract = "The gyrokinetic toroidal code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5-D Vlasov–Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P’s multiple levels of parallelism, including internode 2-D domain decomposition and particle decomposition, as well as intranode shared memory partition and vectorization, have enabled pushing the scalability of the PIC method to extreme computational scales. In this article, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) coprocessors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of ion–temperature–gradient driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects, and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.",
author = "Bei Wang and Stephane Ethier and William Tang and Ibrahim, {Khaled Z.} and Kamesh Madduri and Samuel Williams and Leonid Oliker",
year = "2019",
month = "1",
day = "1",
doi = "10.1177/1094342017712059",
language = "English (US)",
volume = "33",
pages = "169--188",
journal = "International Journal of High Performance Computing Applications",
issn = "1094-3420",
publisher = "SAGE Publications Inc.",
number = "1",

}

Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers. / Wang, Bei; Ethier, Stephane; Tang, William; Ibrahim, Khaled Z.; Madduri, Kamesh; Williams, Samuel; Oliker, Leonid.

In: International Journal of High Performance Computing Applications, Vol. 33, No. 1, 01.01.2019, p. 169-188.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Modern gyrokinetic particle-in-cell simulation of fusion plasmas on top supercomputers

AU - Wang, Bei

AU - Ethier, Stephane

AU - Tang, William

AU - Ibrahim, Khaled Z.

AU - Madduri, Kamesh

AU - Williams, Samuel

AU - Oliker, Leonid

PY - 2019/1/1

Y1 - 2019/1/1

N2 - The gyrokinetic toroidal code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5-D Vlasov–Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P’s multiple levels of parallelism, including internode 2-D domain decomposition and particle decomposition, as well as intranode shared memory partition and vectorization, have enabled pushing the scalability of the PIC method to extreme computational scales. In this article, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) coprocessors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of ion–temperature–gradient driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects, and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.

AB - The gyrokinetic toroidal code at Princeton (GTC-P) is a highly scalable and portable particle-in-cell (PIC) code. It solves the 5-D Vlasov–Poisson equation featuring efficient utilization of modern parallel computer architectures at the petascale and beyond. Motivated by the goal of developing a modern code capable of dealing with the physics challenge of increasing problem size with sufficient resolution, new thread-level optimizations have been introduced as well as a key additional domain decomposition. GTC-P’s multiple levels of parallelism, including internode 2-D domain decomposition and particle decomposition, as well as intranode shared memory partition and vectorization, have enabled pushing the scalability of the PIC method to extreme computational scales. In this article, we describe the methods developed to build a highly parallelized PIC code across a broad range of supercomputer designs. This particularly includes implementations on heterogeneous systems using NVIDIA GPU accelerators and Intel Xeon Phi (MIC) coprocessors and performance comparisons with state-of-the-art homogeneous HPC systems such as Blue Gene/Q. New discovery science capabilities in the magnetic fusion energy application domain are enabled, including investigations of ion–temperature–gradient driven turbulence simulations with unprecedented spatial resolution and long temporal duration. Performance studies with realistic fusion experimental parameters are carried out on multiple supercomputing systems spanning a wide range of cache capacities, cache-sharing configurations, memory bandwidth, interconnects, and network topologies. These performance comparisons using a realistic discovery-science-capable domain application code provide valuable insights on optimization techniques across one of the broadest sets of current high-end computing platforms worldwide.

UR - http://www.scopus.com/inward/record.url?scp=85041549508&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85041549508&partnerID=8YFLogxK

U2 - 10.1177/1094342017712059

DO - 10.1177/1094342017712059

M3 - Article

AN - SCOPUS:85041549508

VL - 33

SP - 169

EP - 188

JO - International Journal of High Performance Computing Applications

JF - International Journal of High Performance Computing Applications

SN - 1094-3420

IS - 1

ER -