A CPU-GPU hybrid implementation and model-driven scheduling of the fast multipole method

Jee Choi, Aparna Chandramowlishwaran, Kamesh Madduri, Richard Vuduc

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations

Abstract

This paper presents an optimized CPU-GPU hybrid implementation and a GPU performance model for the kernelindependent fast multipole method (FMM). We implement an optimized kernel-independent FMM for GPUs, and combine it with our previous CPU implementation to create a hybrid CPU+GPU FMM kernel. When compared to another highly optimized GPU implementation, our implementation achieves as much as a 1.9× speedup. We then extend our previous lower bound analyses of FMM for CPUs to include GPUs. This yields a model for predicting the execution times of the different phases of FMM. Using this information, we estimate the execution times of a set of static hybrid schedules on a given system, which allows us to automatically choose the schedule that yields the best performance. In the best case, we achieve a speedup of 1.5× compared to our GPU-only implementation, despite the large difference in computational powers of CPUs and GPUs. We comment on one consequence of having such performance models, which is to enable speculative predictions about FMM scalability on future systems.

Original languageEnglish (US)
Title of host publicationProceedings of the 7th Workshop on General Purpose Processing Using Graphics Processing Units, GPGPU 2014
PublisherAssociation for Computing Machinery
Pages64-71
Number of pages8
ISBN (Print)9781450327664
DOIs
StatePublished - Jan 1 2014
Event7th Workshop on General Purpose Processing Using Graphics Processing Units, GPGPU 2014 - Salt Lake City, UT, United States
Duration: Mar 1 2014Mar 1 2014

Publication series

NameACM International Conference Proceeding Series

Other

Other7th Workshop on General Purpose Processing Using Graphics Processing Units, GPGPU 2014
CountryUnited States
CitySalt Lake City, UT
Period3/1/143/1/14

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Cite this

Choi, J., Chandramowlishwaran, A., Madduri, K., & Vuduc, R. (2014). A CPU-GPU hybrid implementation and model-driven scheduling of the fast multipole method. In Proceedings of the 7th Workshop on General Purpose Processing Using Graphics Processing Units, GPGPU 2014 (pp. 64-71). (ACM International Conference Proceeding Series). Association for Computing Machinery. https://doi.org/10.1145/2576779.2576787