The nonuniform FFT (NuFFT) is widely used in many applications. Focusing on the most time-consuming part of the NuFFT computation, the data translation step, in this paper, we develop an automatic parallel code generation tool for data translation targeting emerging multicores. The key components of this tool are two scalable parallelization strategies, namely, the source-driven parallelization and the target-driven parallelization. Both these strategies employ equally sized geometric tiling and binning to improve data locality while trying to balance workloads across the cores through dynamic task allocation. They differ in the partitioning and scheduling schemes used to guarantee mutual exclusion in data updates. This tool also consists of a code generator and a code optimizer for the data translation. We evaluated our tool on a commercial multicore machine for both 2D and 3D inputs under different sample distributions with large data set sizes. The results indicate that both parallelization strategies have good scalability as the number of cores and the number of dimensions of data space increase. In particular, the target-driven parallelization outperforms the other when samples are nonuniformly distributed. The experiments also show that our code optimizations can bring about 32%43% performance improvement to the data translation step of NuFFT.
All Science Journal Classification (ASJC) codes
- Hardware and Architecture
- Electrical and Electronic Engineering