Performance and energy evaluation of data prefetching on intel Xeon Phi

Diana Guttman, Mahmut Kandemir, Meenakshi Arunachalamy, Vlad Calina

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

There is an urgent need to evaluate the existing parallelism and data locality-oriented techniques on emerging manycore machines using multithreaded applications. Data prefetching is a well-known latency hiding technique that comes with various hardware- and software-based implementations in almost all commercial machines. A well-tuned prefetcher can reduce the observed data access latencies significantly by bringing the soonto- be-requested data into the cache ahead of time, eventually improving application execution time. Motivated by this, we present in this paper a detailed performance and power characterization of software (compiler-guided) and hardware data prefetching on an Intel Xeon Phi-based system. Our main contributions are (i) an analysis of the interactions between hardware and software prefetching, showing how hardware prefetching can throttle itself in response to software; (ii) results on the power and energy behavior of prefetching, showing how performance and energy gains outweigh the increased power cost of prefetching; and (iii) an evaluation of the use of intrinsic prefetch instructions to prefetch for applications with difficult-to-detect access patterns.

Original languageEnglish (US)
Title of host publicationISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages288-297
Number of pages10
ISBN (Electronic)9781479919567
DOIs
StatePublished - Apr 27 2015
Event2015 15th IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2015 - Philidelphia, United States
Duration: Mar 29 2015Mar 31 2015

Publication series

NameISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software

Other

Other2015 15th IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2015
CountryUnited States
CityPhilidelphia
Period3/29/153/31/15

Fingerprint

Hardware
Costs

All Science Journal Classification (ASJC) codes

  • Software

Cite this

Guttman, D., Kandemir, M., Arunachalamy, M., & Calina, V. (2015). Performance and energy evaluation of data prefetching on intel Xeon Phi. In ISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software (pp. 288-297). [7095814] (ISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISPASS.2015.7095814
Guttman, Diana ; Kandemir, Mahmut ; Arunachalamy, Meenakshi ; Calina, Vlad. / Performance and energy evaluation of data prefetching on intel Xeon Phi. ISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 288-297 (ISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software).
@inproceedings{73b0bbe96d4440139d22a033f9b0589a,
title = "Performance and energy evaluation of data prefetching on intel Xeon Phi",
abstract = "There is an urgent need to evaluate the existing parallelism and data locality-oriented techniques on emerging manycore machines using multithreaded applications. Data prefetching is a well-known latency hiding technique that comes with various hardware- and software-based implementations in almost all commercial machines. A well-tuned prefetcher can reduce the observed data access latencies significantly by bringing the soonto- be-requested data into the cache ahead of time, eventually improving application execution time. Motivated by this, we present in this paper a detailed performance and power characterization of software (compiler-guided) and hardware data prefetching on an Intel Xeon Phi-based system. Our main contributions are (i) an analysis of the interactions between hardware and software prefetching, showing how hardware prefetching can throttle itself in response to software; (ii) results on the power and energy behavior of prefetching, showing how performance and energy gains outweigh the increased power cost of prefetching; and (iii) an evaluation of the use of intrinsic prefetch instructions to prefetch for applications with difficult-to-detect access patterns.",
author = "Diana Guttman and Mahmut Kandemir and Meenakshi Arunachalamy and Vlad Calina",
year = "2015",
month = "4",
day = "27",
doi = "10.1109/ISPASS.2015.7095814",
language = "English (US)",
series = "ISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "288--297",
booktitle = "ISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software",
address = "United States",

}

Guttman, D, Kandemir, M, Arunachalamy, M & Calina, V 2015, Performance and energy evaluation of data prefetching on intel Xeon Phi. in ISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software., 7095814, ISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software, Institute of Electrical and Electronics Engineers Inc., pp. 288-297, 2015 15th IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2015, Philidelphia, United States, 3/29/15. https://doi.org/10.1109/ISPASS.2015.7095814

Performance and energy evaluation of data prefetching on intel Xeon Phi. / Guttman, Diana; Kandemir, Mahmut; Arunachalamy, Meenakshi; Calina, Vlad.

ISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software. Institute of Electrical and Electronics Engineers Inc., 2015. p. 288-297 7095814 (ISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Performance and energy evaluation of data prefetching on intel Xeon Phi

AU - Guttman, Diana

AU - Kandemir, Mahmut

AU - Arunachalamy, Meenakshi

AU - Calina, Vlad

PY - 2015/4/27

Y1 - 2015/4/27

N2 - There is an urgent need to evaluate the existing parallelism and data locality-oriented techniques on emerging manycore machines using multithreaded applications. Data prefetching is a well-known latency hiding technique that comes with various hardware- and software-based implementations in almost all commercial machines. A well-tuned prefetcher can reduce the observed data access latencies significantly by bringing the soonto- be-requested data into the cache ahead of time, eventually improving application execution time. Motivated by this, we present in this paper a detailed performance and power characterization of software (compiler-guided) and hardware data prefetching on an Intel Xeon Phi-based system. Our main contributions are (i) an analysis of the interactions between hardware and software prefetching, showing how hardware prefetching can throttle itself in response to software; (ii) results on the power and energy behavior of prefetching, showing how performance and energy gains outweigh the increased power cost of prefetching; and (iii) an evaluation of the use of intrinsic prefetch instructions to prefetch for applications with difficult-to-detect access patterns.

AB - There is an urgent need to evaluate the existing parallelism and data locality-oriented techniques on emerging manycore machines using multithreaded applications. Data prefetching is a well-known latency hiding technique that comes with various hardware- and software-based implementations in almost all commercial machines. A well-tuned prefetcher can reduce the observed data access latencies significantly by bringing the soonto- be-requested data into the cache ahead of time, eventually improving application execution time. Motivated by this, we present in this paper a detailed performance and power characterization of software (compiler-guided) and hardware data prefetching on an Intel Xeon Phi-based system. Our main contributions are (i) an analysis of the interactions between hardware and software prefetching, showing how hardware prefetching can throttle itself in response to software; (ii) results on the power and energy behavior of prefetching, showing how performance and energy gains outweigh the increased power cost of prefetching; and (iii) an evaluation of the use of intrinsic prefetch instructions to prefetch for applications with difficult-to-detect access patterns.

UR - http://www.scopus.com/inward/record.url?scp=84937485000&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84937485000&partnerID=8YFLogxK

U2 - 10.1109/ISPASS.2015.7095814

DO - 10.1109/ISPASS.2015.7095814

M3 - Conference contribution

AN - SCOPUS:84937485000

T3 - ISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software

SP - 288

EP - 297

BT - ISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Guttman D, Kandemir M, Arunachalamy M, Calina V. Performance and energy evaluation of data prefetching on intel Xeon Phi. In ISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software. Institute of Electrical and Electronics Engineers Inc. 2015. p. 288-297. 7095814. (ISPASS 2015 - IEEE International Symposium on Performance Analysis of Systems and Software). https://doi.org/10.1109/ISPASS.2015.7095814