TY - JOUR
T1 - Improved cache utilization and preconditioner efficiency through use of a space-filling curve mesh element- and vertex-reordering technique
AU - Sastry, Shankar P.
AU - Kultursay, Emre
AU - Shontz, Suzanne M.
AU - Kandemir, Mahmut T.
N1 - Funding Information:
The authors would like to thank Rick Schraf and Todd Fetterolf for creating the CAD model of the IVC filter domain. The work of the first author is supported in part by the NSF Grant CNS-0720749, NSF CAREER Award OCI-1054459, NIH/NIGMS Center for Integrative Biomedical Computing, 2P41 RR0112553-12, and DOE NET DE-EE0004449 grants. The work of the third author was supported in part by NSF Grant CNS-0720749 and NSF CAREER Award ACI-1330056 (formerly OCI-1054459). This work of the second and fourth authors was supported in part by NSF grants 1147388, 1152479, 1017882, 0963839, 0720645, 0811687, 0702519, and a grant from Microsoft Corporation. The authors would also like to thank the two anonymous referees for their comments which improved the paper.
Publisher Copyright:
© 2014, Springer-Verlag London.
PY - 2014/10
Y1 - 2014/10
N2 - Solving partial differential equations using finite element (FE) methods for unstructured meshes that contain billions of elements is computationally a very challenging task. While parallel implementations can deliver a solution in a reasonable amount of time, they suffer from low cache utilization due to unstructured data access patterns. In this work, we reorder the way the mesh vertices and elements are stored in memory using Hilbert space-filling curves to improve cache utilization in FE methods for unstructured meshes. This reordering technique enumerates the mesh elements such that parallel threads access shared vertices at different time intervals, reducing the time wasted waiting to acquire locks guarding atomic regions. Further, when the linear system resulting from the FE analysis is solved using the preconditioned conjugate gradient method, the performance of the block-Jacobi preconditioner also improves, as more nonzeros are present near the stiffness matrix diagonal. Our results show that our reordering reduces the L1 and L2 cache miss-rates in the stiffness matrix assembly step by about 50 and 10 %, respectively, on a single-core processor. We also reduce the number of iterations required to solve the linear system by about 5 %. Overall, our reordering reduces the time to assemble the stiffness matrix and to solve the linear system on a 4-socket, 48-core multi-processor by about 20 %.
AB - Solving partial differential equations using finite element (FE) methods for unstructured meshes that contain billions of elements is computationally a very challenging task. While parallel implementations can deliver a solution in a reasonable amount of time, they suffer from low cache utilization due to unstructured data access patterns. In this work, we reorder the way the mesh vertices and elements are stored in memory using Hilbert space-filling curves to improve cache utilization in FE methods for unstructured meshes. This reordering technique enumerates the mesh elements such that parallel threads access shared vertices at different time intervals, reducing the time wasted waiting to acquire locks guarding atomic regions. Further, when the linear system resulting from the FE analysis is solved using the preconditioned conjugate gradient method, the performance of the block-Jacobi preconditioner also improves, as more nonzeros are present near the stiffness matrix diagonal. Our results show that our reordering reduces the L1 and L2 cache miss-rates in the stiffness matrix assembly step by about 50 and 10 %, respectively, on a single-core processor. We also reduce the number of iterations required to solve the linear system by about 5 %. Overall, our reordering reduces the time to assemble the stiffness matrix and to solve the linear system on a 4-socket, 48-core multi-processor by about 20 %.
UR - http://www.scopus.com/inward/record.url?scp=84920256691&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84920256691&partnerID=8YFLogxK
U2 - 10.1007/s00366-014-0363-0
DO - 10.1007/s00366-014-0363-0
M3 - Article
AN - SCOPUS:84920256691
SN - 0177-0667
VL - 30
SP - 535
EP - 547
JO - Engineering with Computers
JF - Engineering with Computers
IS - 4
ER -