Cache Hierarchy-Aware Query Mapping on Emerging Multicore Architectures

Ozcan Ozturk, Umut Orhan, Ding Wei, Praveen Yedlapalli, Mahmut Taylan Kandemir

Research output: Contribution to journalArticle

Abstract

One of the important characteristics of emerging multicores/manycores is the existence of 'shared on-chip caches,' through which different threads/processes can share data (help each other) or displace each other's data (hurt each other). Most of current commercial multicore systems on the market have on-chip cache hierarchies with multiple layers (typically, in the form of L1, L2 and L3, the last two being either fully or partially shared). In the context of database workloads, exploiting full potential of these caches can be critical. Motivated by this observation, our main contribution in this work is to present and experimentally evaluate a cache hierarchy-aware query mapping scheme targeting workloads that consist of batch queries to be executed on emerging multicores. Our proposed scheme distributes a given batch of queries across the cores of a target multicore architecture based on the affinity relations among the queries. The primary goal behind this scheme is to maximize the utilization of the underlying on-chip cache hierarchy while keeping the load nearly balanced across domain affinities. Each domain affinity in this context corresponds to a cache structure bounded by a particular level of the cache hierarchy. A graph partitioning-based method is employed to distribute queries across cores, and an integer linear programming (ILP) formulation is used to address locality and load balancing concerns. We evaluate our scheme using the TPC-H benchmarks on an Intel Xeon based multicore. Our solution achieves up to 25 percent improvement in individual query execution times and 15-19 percent improvement in throughput over the default Linux-based process scheduler.

Original languageEnglish (US)
Article number7559783
Pages (from-to)403-415
Number of pages13
JournalIEEE Transactions on Computers
Volume66
Issue number3
DOIs
StatePublished - Mar 1 2017

Fingerprint

Linear programming
Cache
Resource allocation
Throughput
Query
Affine transformation
Chip
Percent
Batch
Workload
Graph Partitioning
Many-core
Evaluate
Integer Linear Programming
Linux
Architecture
Hierarchy
Scheduler
Load Balancing
Locality

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computational Theory and Mathematics

Cite this

Ozturk, Ozcan ; Orhan, Umut ; Wei, Ding ; Yedlapalli, Praveen ; Kandemir, Mahmut Taylan. / Cache Hierarchy-Aware Query Mapping on Emerging Multicore Architectures. In: IEEE Transactions on Computers. 2017 ; Vol. 66, No. 3. pp. 403-415.
@article{6692fb04b5d94ae09c896485f6d4f8bc,
title = "Cache Hierarchy-Aware Query Mapping on Emerging Multicore Architectures",
abstract = "One of the important characteristics of emerging multicores/manycores is the existence of 'shared on-chip caches,' through which different threads/processes can share data (help each other) or displace each other's data (hurt each other). Most of current commercial multicore systems on the market have on-chip cache hierarchies with multiple layers (typically, in the form of L1, L2 and L3, the last two being either fully or partially shared). In the context of database workloads, exploiting full potential of these caches can be critical. Motivated by this observation, our main contribution in this work is to present and experimentally evaluate a cache hierarchy-aware query mapping scheme targeting workloads that consist of batch queries to be executed on emerging multicores. Our proposed scheme distributes a given batch of queries across the cores of a target multicore architecture based on the affinity relations among the queries. The primary goal behind this scheme is to maximize the utilization of the underlying on-chip cache hierarchy while keeping the load nearly balanced across domain affinities. Each domain affinity in this context corresponds to a cache structure bounded by a particular level of the cache hierarchy. A graph partitioning-based method is employed to distribute queries across cores, and an integer linear programming (ILP) formulation is used to address locality and load balancing concerns. We evaluate our scheme using the TPC-H benchmarks on an Intel Xeon based multicore. Our solution achieves up to 25 percent improvement in individual query execution times and 15-19 percent improvement in throughput over the default Linux-based process scheduler.",
author = "Ozcan Ozturk and Umut Orhan and Ding Wei and Praveen Yedlapalli and Kandemir, {Mahmut Taylan}",
year = "2017",
month = "3",
day = "1",
doi = "10.1109/TC.2016.2605682",
language = "English (US)",
volume = "66",
pages = "403--415",
journal = "IEEE Transactions on Computers",
issn = "0018-9340",
publisher = "IEEE Computer Society",
number = "3",

}

Cache Hierarchy-Aware Query Mapping on Emerging Multicore Architectures. / Ozturk, Ozcan; Orhan, Umut; Wei, Ding; Yedlapalli, Praveen; Kandemir, Mahmut Taylan.

In: IEEE Transactions on Computers, Vol. 66, No. 3, 7559783, 01.03.2017, p. 403-415.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Cache Hierarchy-Aware Query Mapping on Emerging Multicore Architectures

AU - Ozturk, Ozcan

AU - Orhan, Umut

AU - Wei, Ding

AU - Yedlapalli, Praveen

AU - Kandemir, Mahmut Taylan

PY - 2017/3/1

Y1 - 2017/3/1

N2 - One of the important characteristics of emerging multicores/manycores is the existence of 'shared on-chip caches,' through which different threads/processes can share data (help each other) or displace each other's data (hurt each other). Most of current commercial multicore systems on the market have on-chip cache hierarchies with multiple layers (typically, in the form of L1, L2 and L3, the last two being either fully or partially shared). In the context of database workloads, exploiting full potential of these caches can be critical. Motivated by this observation, our main contribution in this work is to present and experimentally evaluate a cache hierarchy-aware query mapping scheme targeting workloads that consist of batch queries to be executed on emerging multicores. Our proposed scheme distributes a given batch of queries across the cores of a target multicore architecture based on the affinity relations among the queries. The primary goal behind this scheme is to maximize the utilization of the underlying on-chip cache hierarchy while keeping the load nearly balanced across domain affinities. Each domain affinity in this context corresponds to a cache structure bounded by a particular level of the cache hierarchy. A graph partitioning-based method is employed to distribute queries across cores, and an integer linear programming (ILP) formulation is used to address locality and load balancing concerns. We evaluate our scheme using the TPC-H benchmarks on an Intel Xeon based multicore. Our solution achieves up to 25 percent improvement in individual query execution times and 15-19 percent improvement in throughput over the default Linux-based process scheduler.

AB - One of the important characteristics of emerging multicores/manycores is the existence of 'shared on-chip caches,' through which different threads/processes can share data (help each other) or displace each other's data (hurt each other). Most of current commercial multicore systems on the market have on-chip cache hierarchies with multiple layers (typically, in the form of L1, L2 and L3, the last two being either fully or partially shared). In the context of database workloads, exploiting full potential of these caches can be critical. Motivated by this observation, our main contribution in this work is to present and experimentally evaluate a cache hierarchy-aware query mapping scheme targeting workloads that consist of batch queries to be executed on emerging multicores. Our proposed scheme distributes a given batch of queries across the cores of a target multicore architecture based on the affinity relations among the queries. The primary goal behind this scheme is to maximize the utilization of the underlying on-chip cache hierarchy while keeping the load nearly balanced across domain affinities. Each domain affinity in this context corresponds to a cache structure bounded by a particular level of the cache hierarchy. A graph partitioning-based method is employed to distribute queries across cores, and an integer linear programming (ILP) formulation is used to address locality and load balancing concerns. We evaluate our scheme using the TPC-H benchmarks on an Intel Xeon based multicore. Our solution achieves up to 25 percent improvement in individual query execution times and 15-19 percent improvement in throughput over the default Linux-based process scheduler.

UR - http://www.scopus.com/inward/record.url?scp=85027353711&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85027353711&partnerID=8YFLogxK

U2 - 10.1109/TC.2016.2605682

DO - 10.1109/TC.2016.2605682

M3 - Article

AN - SCOPUS:85027353711

VL - 66

SP - 403

EP - 415

JO - IEEE Transactions on Computers

JF - IEEE Transactions on Computers

SN - 0018-9340

IS - 3

M1 - 7559783

ER -