Efficient K nearest neighbor algorithm implementations for throughput-oriented architectures

Jihyun Ryoo, Meena Arunachalam, Rahul Khanna, Mahmut Kandemir

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Scores of emerging and domain-specific applications need the ability to acquire and augment new knowledge from offline training-sets and online user interactions. This requires an underlying computing platform that can host machine learning (ML) kernels. This in turn entails one to have efficient implementations of the frequently-used ML kernels on state-of-the-art multicores and many-cores, to act as high-performance accelerators. Motivated by this observation, this paper focuses on one such ML kernel, namely, K Nearest Neighbor (KNN), and conducts a comprehensive comparison of its behavior on two alternate accelerator-based systems: NVIDIA GPU and Intel Xeon Phi (both KNC and KNL architectures). More explicitly, we discuss and experimentally evaluate various optimizations that can be applied to both GPU and Xeon Phi, as well as optimizations that are specific to either GPU or Xeon Phi. Furthermore, we implement different versions of KNN on these candidate accelerators and collect experimental data using various inputs. Our experimental evaluations suggest that, by using both general purpose and accelerator specific optimizations, one can achieve average speedups ranging 0.49x-3.48x (training) and 1.43x-9.41x (classification) on Xeon Phi series, compared to 0.05x-0.60x (training), 1.61x-6.32x (classification) achieved by the GPU version, both over the standard host-only system.

Original languageEnglish (US)
Title of host publication2018 19th International Symposium on Quality Electronic Design, ISQED 2018
PublisherIEEE Computer Society
Pages144-150
Number of pages7
Volume2018-March
ISBN (Electronic)9781538612149
DOIs
StatePublished - May 9 2018
Event19th International Symposium on Quality Electronic Design, ISQED 2018 - Santa Clara, United States
Duration: Mar 13 2018Mar 14 2018

Other

Other19th International Symposium on Quality Electronic Design, ISQED 2018
CountryUnited States
CitySanta Clara
Period3/13/183/14/18

Fingerprint

Particle accelerators
Throughput
Learning systems
Graphics processing unit

All Science Journal Classification (ASJC) codes

  • Hardware and Architecture
  • Electrical and Electronic Engineering
  • Safety, Risk, Reliability and Quality

Cite this

Ryoo, J., Arunachalam, M., Khanna, R., & Kandemir, M. (2018). Efficient K nearest neighbor algorithm implementations for throughput-oriented architectures. In 2018 19th International Symposium on Quality Electronic Design, ISQED 2018 (Vol. 2018-March, pp. 144-150). IEEE Computer Society. https://doi.org/10.1109/ISQED.2018.8357279
Ryoo, Jihyun ; Arunachalam, Meena ; Khanna, Rahul ; Kandemir, Mahmut. / Efficient K nearest neighbor algorithm implementations for throughput-oriented architectures. 2018 19th International Symposium on Quality Electronic Design, ISQED 2018. Vol. 2018-March IEEE Computer Society, 2018. pp. 144-150
@inproceedings{8db3093f9acf4307a9443e80a518e1a0,
title = "Efficient K nearest neighbor algorithm implementations for throughput-oriented architectures",
abstract = "Scores of emerging and domain-specific applications need the ability to acquire and augment new knowledge from offline training-sets and online user interactions. This requires an underlying computing platform that can host machine learning (ML) kernels. This in turn entails one to have efficient implementations of the frequently-used ML kernels on state-of-the-art multicores and many-cores, to act as high-performance accelerators. Motivated by this observation, this paper focuses on one such ML kernel, namely, K Nearest Neighbor (KNN), and conducts a comprehensive comparison of its behavior on two alternate accelerator-based systems: NVIDIA GPU and Intel Xeon Phi (both KNC and KNL architectures). More explicitly, we discuss and experimentally evaluate various optimizations that can be applied to both GPU and Xeon Phi, as well as optimizations that are specific to either GPU or Xeon Phi. Furthermore, we implement different versions of KNN on these candidate accelerators and collect experimental data using various inputs. Our experimental evaluations suggest that, by using both general purpose and accelerator specific optimizations, one can achieve average speedups ranging 0.49x-3.48x (training) and 1.43x-9.41x (classification) on Xeon Phi series, compared to 0.05x-0.60x (training), 1.61x-6.32x (classification) achieved by the GPU version, both over the standard host-only system.",
author = "Jihyun Ryoo and Meena Arunachalam and Rahul Khanna and Mahmut Kandemir",
year = "2018",
month = "5",
day = "9",
doi = "10.1109/ISQED.2018.8357279",
language = "English (US)",
volume = "2018-March",
pages = "144--150",
booktitle = "2018 19th International Symposium on Quality Electronic Design, ISQED 2018",
publisher = "IEEE Computer Society",
address = "United States",

}

Ryoo, J, Arunachalam, M, Khanna, R & Kandemir, M 2018, Efficient K nearest neighbor algorithm implementations for throughput-oriented architectures. in 2018 19th International Symposium on Quality Electronic Design, ISQED 2018. vol. 2018-March, IEEE Computer Society, pp. 144-150, 19th International Symposium on Quality Electronic Design, ISQED 2018, Santa Clara, United States, 3/13/18. https://doi.org/10.1109/ISQED.2018.8357279

Efficient K nearest neighbor algorithm implementations for throughput-oriented architectures. / Ryoo, Jihyun; Arunachalam, Meena; Khanna, Rahul; Kandemir, Mahmut.

2018 19th International Symposium on Quality Electronic Design, ISQED 2018. Vol. 2018-March IEEE Computer Society, 2018. p. 144-150.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Efficient K nearest neighbor algorithm implementations for throughput-oriented architectures

AU - Ryoo, Jihyun

AU - Arunachalam, Meena

AU - Khanna, Rahul

AU - Kandemir, Mahmut

PY - 2018/5/9

Y1 - 2018/5/9

N2 - Scores of emerging and domain-specific applications need the ability to acquire and augment new knowledge from offline training-sets and online user interactions. This requires an underlying computing platform that can host machine learning (ML) kernels. This in turn entails one to have efficient implementations of the frequently-used ML kernels on state-of-the-art multicores and many-cores, to act as high-performance accelerators. Motivated by this observation, this paper focuses on one such ML kernel, namely, K Nearest Neighbor (KNN), and conducts a comprehensive comparison of its behavior on two alternate accelerator-based systems: NVIDIA GPU and Intel Xeon Phi (both KNC and KNL architectures). More explicitly, we discuss and experimentally evaluate various optimizations that can be applied to both GPU and Xeon Phi, as well as optimizations that are specific to either GPU or Xeon Phi. Furthermore, we implement different versions of KNN on these candidate accelerators and collect experimental data using various inputs. Our experimental evaluations suggest that, by using both general purpose and accelerator specific optimizations, one can achieve average speedups ranging 0.49x-3.48x (training) and 1.43x-9.41x (classification) on Xeon Phi series, compared to 0.05x-0.60x (training), 1.61x-6.32x (classification) achieved by the GPU version, both over the standard host-only system.

AB - Scores of emerging and domain-specific applications need the ability to acquire and augment new knowledge from offline training-sets and online user interactions. This requires an underlying computing platform that can host machine learning (ML) kernels. This in turn entails one to have efficient implementations of the frequently-used ML kernels on state-of-the-art multicores and many-cores, to act as high-performance accelerators. Motivated by this observation, this paper focuses on one such ML kernel, namely, K Nearest Neighbor (KNN), and conducts a comprehensive comparison of its behavior on two alternate accelerator-based systems: NVIDIA GPU and Intel Xeon Phi (both KNC and KNL architectures). More explicitly, we discuss and experimentally evaluate various optimizations that can be applied to both GPU and Xeon Phi, as well as optimizations that are specific to either GPU or Xeon Phi. Furthermore, we implement different versions of KNN on these candidate accelerators and collect experimental data using various inputs. Our experimental evaluations suggest that, by using both general purpose and accelerator specific optimizations, one can achieve average speedups ranging 0.49x-3.48x (training) and 1.43x-9.41x (classification) on Xeon Phi series, compared to 0.05x-0.60x (training), 1.61x-6.32x (classification) achieved by the GPU version, both over the standard host-only system.

UR - http://www.scopus.com/inward/record.url?scp=85047936348&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85047936348&partnerID=8YFLogxK

U2 - 10.1109/ISQED.2018.8357279

DO - 10.1109/ISQED.2018.8357279

M3 - Conference contribution

AN - SCOPUS:85047936348

VL - 2018-March

SP - 144

EP - 150

BT - 2018 19th International Symposium on Quality Electronic Design, ISQED 2018

PB - IEEE Computer Society

ER -

Ryoo J, Arunachalam M, Khanna R, Kandemir M. Efficient K nearest neighbor algorithm implementations for throughput-oriented architectures. In 2018 19th International Symposium on Quality Electronic Design, ISQED 2018. Vol. 2018-March. IEEE Computer Society. 2018. p. 144-150 https://doi.org/10.1109/ISQED.2018.8357279