This chapter looks at methods to improve prefetching effectiveness, and therefore increase performance of applications, through the use of the superior knowledge of the programmer. It is known that prefetching is extremely important for good performance on in-order architectures like the Intel Xeon Phi coprocessor however, the authors surprised even themselves by being able to expose techniques which show value on out-of-order cores as well. Often simply tuning the compiler prefetching distance is an easy way for application developers to get better performance without having to rewrite their code. In some cases, the more labor-intensive method of adding intrinsics for prefetching may be worthwhile.
|Original language||English (US)|
|Title of host publication||High Performance Parallelism Pearls|
|Subtitle of host publication||Multicore and Many-core Programming Approaches|
|Number of pages||19|
|State||Published - Jul 23 2015|
All Science Journal Classification (ASJC) codes
- Computer Science(all)