Improved cache utilization and preconditioner efficiency through use of a space-filling curve mesh element- and vertex-reordering technique

Shankar P. Sastry, Emre Kultursay, Suzanne M. Shontz, Mahmut Kandemir

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Solving partial differential equations using finite element (FE) methods for unstructured meshes that contain billions of elements is computationally a very challenging task. While parallel implementations can deliver a solution in a reasonable amount of time, they suffer from low cache utilization due to unstructured data access patterns. In this work, we reorder the way the mesh vertices and elements are stored in memory using Hilbert space-filling curves to improve cache utilization in FE methods for unstructured meshes. This reordering technique enumerates the mesh elements such that parallel threads access shared vertices at different time intervals, reducing the time wasted waiting to acquire locks guarding atomic regions. Further, when the linear system resulting from the FE analysis is solved using the preconditioned conjugate gradient method, the performance of the block-Jacobi preconditioner also improves, as more nonzeros are present near the stiffness matrix diagonal. Our results show that our reordering reduces the L1 and L2 cache miss-rates in the stiffness matrix assembly step by about 50 and 10 %, respectively, on a single-core processor. We also reduce the number of iterations required to solve the linear system by about 5 %. Overall, our reordering reduces the time to assemble the stiffness matrix and to solve the linear system on a 4-socket, 48-core multi-processor by about 20 %.

Original languageEnglish (US)
Pages (from-to)535-547
Number of pages13
JournalEngineering with Computers
Volume30
Issue number4
DOIs
StatePublished - Jan 1 2014

Fingerprint

Space-filling Curves
Reordering
Stiffness matrix
Stiffness Matrix
Preconditioner
Cache
Linear systems
Unstructured Mesh
Linear Systems
Mesh
Finite element method
Vertex of a graph
Finite Element Method
Preconditioned Conjugate Gradient Method
Conjugate gradient method
Multi-core Processor
Hilbert spaces
Parallel Implementation
Waiting Time
Jacobi

All Science Journal Classification (ASJC) codes

  • Software
  • Modeling and Simulation
  • Engineering(all)
  • Computer Science Applications

Cite this

@article{2826b13d278f428ca887cf0439b9ffda,
title = "Improved cache utilization and preconditioner efficiency through use of a space-filling curve mesh element- and vertex-reordering technique",
abstract = "Solving partial differential equations using finite element (FE) methods for unstructured meshes that contain billions of elements is computationally a very challenging task. While parallel implementations can deliver a solution in a reasonable amount of time, they suffer from low cache utilization due to unstructured data access patterns. In this work, we reorder the way the mesh vertices and elements are stored in memory using Hilbert space-filling curves to improve cache utilization in FE methods for unstructured meshes. This reordering technique enumerates the mesh elements such that parallel threads access shared vertices at different time intervals, reducing the time wasted waiting to acquire locks guarding atomic regions. Further, when the linear system resulting from the FE analysis is solved using the preconditioned conjugate gradient method, the performance of the block-Jacobi preconditioner also improves, as more nonzeros are present near the stiffness matrix diagonal. Our results show that our reordering reduces the L1 and L2 cache miss-rates in the stiffness matrix assembly step by about 50 and 10 {\%}, respectively, on a single-core processor. We also reduce the number of iterations required to solve the linear system by about 5 {\%}. Overall, our reordering reduces the time to assemble the stiffness matrix and to solve the linear system on a 4-socket, 48-core multi-processor by about 20 {\%}.",
author = "Sastry, {Shankar P.} and Emre Kultursay and Shontz, {Suzanne M.} and Mahmut Kandemir",
year = "2014",
month = "1",
day = "1",
doi = "10.1007/s00366-014-0363-0",
language = "English (US)",
volume = "30",
pages = "535--547",
journal = "Engineering with Computers",
issn = "0177-0667",
publisher = "Springer London",
number = "4",

}

Improved cache utilization and preconditioner efficiency through use of a space-filling curve mesh element- and vertex-reordering technique. / Sastry, Shankar P.; Kultursay, Emre; Shontz, Suzanne M.; Kandemir, Mahmut.

In: Engineering with Computers, Vol. 30, No. 4, 01.01.2014, p. 535-547.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Improved cache utilization and preconditioner efficiency through use of a space-filling curve mesh element- and vertex-reordering technique

AU - Sastry, Shankar P.

AU - Kultursay, Emre

AU - Shontz, Suzanne M.

AU - Kandemir, Mahmut

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Solving partial differential equations using finite element (FE) methods for unstructured meshes that contain billions of elements is computationally a very challenging task. While parallel implementations can deliver a solution in a reasonable amount of time, they suffer from low cache utilization due to unstructured data access patterns. In this work, we reorder the way the mesh vertices and elements are stored in memory using Hilbert space-filling curves to improve cache utilization in FE methods for unstructured meshes. This reordering technique enumerates the mesh elements such that parallel threads access shared vertices at different time intervals, reducing the time wasted waiting to acquire locks guarding atomic regions. Further, when the linear system resulting from the FE analysis is solved using the preconditioned conjugate gradient method, the performance of the block-Jacobi preconditioner also improves, as more nonzeros are present near the stiffness matrix diagonal. Our results show that our reordering reduces the L1 and L2 cache miss-rates in the stiffness matrix assembly step by about 50 and 10 %, respectively, on a single-core processor. We also reduce the number of iterations required to solve the linear system by about 5 %. Overall, our reordering reduces the time to assemble the stiffness matrix and to solve the linear system on a 4-socket, 48-core multi-processor by about 20 %.

AB - Solving partial differential equations using finite element (FE) methods for unstructured meshes that contain billions of elements is computationally a very challenging task. While parallel implementations can deliver a solution in a reasonable amount of time, they suffer from low cache utilization due to unstructured data access patterns. In this work, we reorder the way the mesh vertices and elements are stored in memory using Hilbert space-filling curves to improve cache utilization in FE methods for unstructured meshes. This reordering technique enumerates the mesh elements such that parallel threads access shared vertices at different time intervals, reducing the time wasted waiting to acquire locks guarding atomic regions. Further, when the linear system resulting from the FE analysis is solved using the preconditioned conjugate gradient method, the performance of the block-Jacobi preconditioner also improves, as more nonzeros are present near the stiffness matrix diagonal. Our results show that our reordering reduces the L1 and L2 cache miss-rates in the stiffness matrix assembly step by about 50 and 10 %, respectively, on a single-core processor. We also reduce the number of iterations required to solve the linear system by about 5 %. Overall, our reordering reduces the time to assemble the stiffness matrix and to solve the linear system on a 4-socket, 48-core multi-processor by about 20 %.

UR - http://www.scopus.com/inward/record.url?scp=84920256691&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84920256691&partnerID=8YFLogxK

U2 - 10.1007/s00366-014-0363-0

DO - 10.1007/s00366-014-0363-0

M3 - Article

AN - SCOPUS:84920256691

VL - 30

SP - 535

EP - 547

JO - Engineering with Computers

JF - Engineering with Computers

SN - 0177-0667

IS - 4

ER -