TY - JOUR

T1 - 'Short-Dot'

T2 - Computing Large Linear Transforms Distributedly Using Coded Short Dot Products

AU - Dutta, Sanghamitra

AU - Cadambe, Viveck

AU - Grover, Pulkit

N1 - Funding Information:
Manuscript received January 27, 2018; revised January 30, 2019; accepted May 5, 2019. Date of publication July 9, 2019; date of current version September 13, 2019. This work was supported in part by the Systems on Nanoscale Information fabriCs (SONIC) which is one of the six SRC STARnet Centers sponsored by MARCO and DARPA, and in part by NSF Awards under Grant 1350314, Grant 1464336, Grant 1553248, and Grant 1763657. This paper was presented in part at the 2016 Proceedings of Advances in Neural Information Processing Systems [1].

PY - 2019/10

Y1 - 2019/10

N2 - We consider the problem of computing a matrix-vector product Ax using a set of P parallel or distributed processing nodes prone to 'straggling,' i.e., unpredictable delays. Every processing node can access only a fraction ({s}/{N}) of the N -length vector x , and all processing nodes compute an equal number of dot products. We propose a novel error correcting code-that we call 'Short-Dot'-that introduces redundant, shorter dot products such that only a subset of the nodes' outputs are sufficient to compute Ax. To address the problem of straggling in computing matrix-vector products, prior work uses replication or erasure coding to encode parts of the matrix A , but the length of the dot products computed at each processing node is still N. The key novelty in our work is that instead of computing the long dot products as required in the original matrix-vector product, we construct a larger number of redundant and short dot products that only require a fraction of x to be accessed during the computation. Short-Dot is thus useful in a communication-constrained scenario as it allows for only a fraction of x to be accessed by each processing node. Further, we show that in the particular regime where the number of available processing nodes is greater than the total number of dot products, Short-Dot has lower expected computation time under straggling under an exponential model compared to existing strategies, e.g. replication, in a scaling sense. We also derive fundamental limits on the trade-off between the length of the dot products and the recovery threshold, i.e., the required number of processing nodes, showing that Short-Dot is near-optimal.

AB - We consider the problem of computing a matrix-vector product Ax using a set of P parallel or distributed processing nodes prone to 'straggling,' i.e., unpredictable delays. Every processing node can access only a fraction ({s}/{N}) of the N -length vector x , and all processing nodes compute an equal number of dot products. We propose a novel error correcting code-that we call 'Short-Dot'-that introduces redundant, shorter dot products such that only a subset of the nodes' outputs are sufficient to compute Ax. To address the problem of straggling in computing matrix-vector products, prior work uses replication or erasure coding to encode parts of the matrix A , but the length of the dot products computed at each processing node is still N. The key novelty in our work is that instead of computing the long dot products as required in the original matrix-vector product, we construct a larger number of redundant and short dot products that only require a fraction of x to be accessed during the computation. Short-Dot is thus useful in a communication-constrained scenario as it allows for only a fraction of x to be accessed by each processing node. Further, we show that in the particular regime where the number of available processing nodes is greater than the total number of dot products, Short-Dot has lower expected computation time under straggling under an exponential model compared to existing strategies, e.g. replication, in a scaling sense. We also derive fundamental limits on the trade-off between the length of the dot products and the recovery threshold, i.e., the required number of processing nodes, showing that Short-Dot is near-optimal.

UR - http://www.scopus.com/inward/record.url?scp=85077378779&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85077378779&partnerID=8YFLogxK

U2 - 10.1109/TIT.2019.2927558

DO - 10.1109/TIT.2019.2927558

M3 - Article

AN - SCOPUS:85077378779

VL - 65

SP - 6171

EP - 6193

JO - IEEE Transactions on Information Theory

JF - IEEE Transactions on Information Theory

SN - 0018-9448

IS - 10

M1 - 8758338

ER -