The Gyrokinetic Toroidal Code (GTC) uses the particle-in-cell method to efficiently simulate plasma microturbulence. This work presents novel analysis and optimization techniques to enhance the performance of GTC on large-scale machines. We introduce cell access analysis to better manage locality vs. synchronization tradeoffs on CPU and GPU-based architectures. Our optimized hybrid parallel implementation of GTC uses MPI, OpenMP, and NVIDIA CUDA, achieves up to a 2× speedup over the reference Fortran version on multiple parallel systems, and scales efficiently to tens of thousands of cores.
|Original language||English (US)|
|Number of pages||20|
|Journal||International Journal of High Performance Computing Applications|
|State||Published - Nov 2013|
All Science Journal Classification (ASJC) codes
- Theoretical Computer Science
- Hardware and Architecture