TY - GEN
T1 - Optimizing sparse matrix vector multiplication on emerging multicores
AU - Kislal, Orhan
AU - Ding, Wei
AU - Kandemir, Mahmut
AU - Demirkiran, Ilteris
PY - 2013
Y1 - 2013
N2 - After hitting the power wall, the dramatic change in computer architecture from single core to multicore/manycore brings us new challenges on high performance computing, especially for the data intensive applications. Sparse matrix-vector multiplication (SpMV) is one of the most important computations in this area, and has therefore received a lot of attention in recent decades. In contrast to the uniform/regular dense matrix computations, SpMV's irregular data access patterns with compact data structure for storage make the SpMV optimization more complex than optimizing regular/dense matrix computation. In this work, we look at the SpMV optimization problem in the context of emerging multicores from a different architecture conscious perspective, and propose an optimization strategy that has three key components: mapping, scheduling and data layout reorganization. Specifically, the mapping component derives a suitable iteration-to-core mapping; the scheduling component determines the execution order of loop iterations assigned to each core in the target multicore architecture; and finally, the data layout reorganization component prepares multiple memory layouts for the source (input) vector customized for different row patterns. A distinguishing characteristic of our approach is that it is cache hierarchy aware, that is, all three components take the underlying cache hierarchy of the target multicore architecture into account, and therefore, the derived solution is, in a sense, customized to the target architecture. We evaluate the proposed strategy using 10 sparse matrices with two different multicore systems. Our experimental evaluation reveals that the proposed optimization algorithm brings significant performance improvements (up to 26.5%) over the unoptimized case.
AB - After hitting the power wall, the dramatic change in computer architecture from single core to multicore/manycore brings us new challenges on high performance computing, especially for the data intensive applications. Sparse matrix-vector multiplication (SpMV) is one of the most important computations in this area, and has therefore received a lot of attention in recent decades. In contrast to the uniform/regular dense matrix computations, SpMV's irregular data access patterns with compact data structure for storage make the SpMV optimization more complex than optimizing regular/dense matrix computation. In this work, we look at the SpMV optimization problem in the context of emerging multicores from a different architecture conscious perspective, and propose an optimization strategy that has three key components: mapping, scheduling and data layout reorganization. Specifically, the mapping component derives a suitable iteration-to-core mapping; the scheduling component determines the execution order of loop iterations assigned to each core in the target multicore architecture; and finally, the data layout reorganization component prepares multiple memory layouts for the source (input) vector customized for different row patterns. A distinguishing characteristic of our approach is that it is cache hierarchy aware, that is, all three components take the underlying cache hierarchy of the target multicore architecture into account, and therefore, the derived solution is, in a sense, customized to the target architecture. We evaluate the proposed strategy using 10 sparse matrices with two different multicore systems. Our experimental evaluation reveals that the proposed optimization algorithm brings significant performance improvements (up to 26.5%) over the unoptimized case.
UR - http://www.scopus.com/inward/record.url?scp=84890064661&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84890064661&partnerID=8YFLogxK
U2 - 10.1109/MuCoCoS.2013.6633600
DO - 10.1109/MuCoCoS.2013.6633600
M3 - Conference contribution
AN - SCOPUS:84890064661
SN - 9781479910106
T3 - 2013 IEEE 6th International Workshop on Multi-/Many-Core Computing Systems, MuCoCoS 2013
BT - 2013 IEEE 6th International Workshop on Multi-/Many-Core Computing Systems, MuCoCoS 2013
T2 - 2013 IEEE 6th International Workshop on Multi-/Many-Core Computing Systems, MuCoCoS 2013
Y2 - 7 September 2013 through 7 September 2013
ER -